Predicting Cricket
One Ball at a Time
A machine learning system that predicts 4 million individual ball outcomes to forecast T20 match results with Monte Carlo simulation.
Why Traditional Prediction Fails
Direct match prediction suffers from a fundamental data problem
Traditional Approach
Predicting match winners directly gives you limited training data. Not enough to learn nuanced patterns.
Our Approach
By predicting each ball, we get 266x more training data. Rich patterns emerge from abundant examples.
The Key Insight
Each ball has an outcome: dot, single, double, boundary, six, or wicket. By simulating entire matches ball-by-ball using Monte Carlo methods, we capture the natural uncertainty of cricket while leveraging abundant training data.
How It Works
A five-step pipeline from raw data to match predictions
Data
8,341 T20 matches → 4M balls
Features
63 features per ball
Model
XGBoost classifier
Simulate
1000 Monte Carlo runs
Predict
Win probabilities
63 Features Across 6 Categories
What Drives the Predictions?
Top 10 features by XGBoost gain importance
Model Leaderboard
Four different architectures tested on T20 World Cup 2024 matches
| Rank | Model | Log Loss | Brier Score | Market Edge | Speed |
|---|---|---|---|---|---|
|
1
|
XGBoost
Gradient Boosting
|
0.655 | 0.219 | 29.4% | ~346s |
|
2
|
MLP
Neural Network
|
0.707 | 0.254 | 27.0% | ~75s Fastest |
|
3
|
LSTM
Recurrent Network
|
0.721 | 0.261 | 25.8% | ~420s |
|
4
|
Fine-tuned
LLM
Language Model
|
0.748 | 0.278 | 24.1% | ~890s |
XGBoost
BestGradient boosted trees with Optuna hyperparameter tuning. 444 estimators, max depth 10.
MLP
Fast3-layer feedforward network with BatchNorm and dropout. Focal loss for class imbalance.
LSTM
Sequence2-layer LSTM with player embeddings. Captures sequential patterns from 10-ball windows.
Fine-tuned LLM
ExperimentalTransformer-based model fine-tuned on cricket commentary and match state descriptions.
Why XGBoost Wins
Despite being a simpler architecture, XGBoost outperforms neural networks on this task. The tabular nature of cricket statistics (player averages, match state) plays to XGBoost's strengths. Neural networks like LSTM are better suited for sequential patterns, but the additional complexity doesn't translate to improved match predictions.
Tested Against Betting Markets
Evaluated on 44 T20 World Cup 2024 matches with real betting odds
What This Means
Market Disagreement
The model finds significant edge opportunities on every match, identifying where betting markets may be mispricing probabilities.
Calibration
The Brier score of 0.26 indicates reasonable probability calibration. When the model predicts 60% win probability, teams win approximately 60% of the time.
Cricket Analytics Deep Dive
Exploring patterns, strategies, and insights from cricket data
DC 2026: Fixing the Flaws
A tactical breakdown of 2025's problems and a mini-auction strategy to build a championship squad.
Moneyball in the IPL
Using cricWAR and VOMAM to identify auction steals and overpays for IPL 2025.
Benchmarking Projection Models
Comparing Marcel, IPL-Only ML, and Global ML models against 2025 actuals.
About This Project
CricML is a personal project exploring the intersection of machine learning and cricket analytics. It demonstrates production-grade ML engineering with proper temporal data handling, efficient memory management, and rigorous evaluation against real-world betting markets.