What you’ll get out of this post: Why computers usually fail to fully address the nuances of the NCAA tournament.
Estimated time: 3 minutes 15 seconds
Although I have a master’s degree in advanced analytics, I filled out my NCAA March Madness bracket in 5 minutes, just seconds before the tip-off of the first game.
Maybe you’ve heard of the model I used: it’s called “advanced gut instinct.”
Clearly, an algorithm could do better than human instinct, right? Maybe, but maybe not. Here’s why that’s a beautiful part of March Madness.
The search for the perfect bracket
With a simple Google search, you can quickly find some custom scripts for March Madness written by smart analysts in the statistical programming language of R.
On a basic level, most of the algorithms require that you supply the script with some ranking system as well as a measure of volatility that lets you control the frequency of upsets. More sophisticated systems rely on additional information to reveal potential outcomes of specific match-ups.
Ignoring the match-up dynamic for this discussion, let’s focus on using the following two inputs to create your bracket:
- Rank ordering of teams
The rank ordering of teams can be complex, but at the end of the day, I think there is more agreement than disagreement among experts when it comes to comparing the overall quality of tournament teams.
Why then, is the elusive search for the perfect bracket so, well, elusive? You guessed it: volatility.
Volatility is a measure referring to the frequency of unexpected and unpredictable outcomes. The quality of a team, as defined in the rank ordering system, is rather straightforward—it’s usually based on performance over an entire season and other inputs such as measures of raw talent and win percentage for neutral or away games.
However, when you attempt to forecast results for a single elimination events like the NCAA Tournament, things get tricky…and volatility surfaces in the form of the heralded (or maligned) “bracket busters.”
When the ‘best chance for success’ doesn’t produce a winning bracket
It’s worth a brief digression to point out a few details of a common statistical method: the Monte Carlo simulation.
Adapted for this situation, a computer randomly predicts the outcomes of each game (weighted based on the inputs you provide) for a single “run” of the tournament.
This process is then repeated thousands of times to see where agreement most frequently occurs between the runs of the simulation. With this approach, winners for each game can be predicted based on the outcomes of thousands of simulation runs, giving us a bracket with the “best chance for success.”
It’s because using an algorithm will help to create a bracket with a higher probabilistic chance of overall “success,” but it’ll do a poor job of predicting the exact volatility inherent in the tournament. And correctly predicting volatility is something you ultimately need to be crowned the singular champion of your bracket.
Differentiation through volatility
With so much volatility, it’s highly likely that if you performed 100K runs of the simulation, there will be many singular runs that were better predictors (by sheer chance, of course) than the convergence of the 100K runs of the total simulation.
It’s also very likely that everyone else is using similar data as inputs, resulting in similar predictions that will suffer equivalent disruptions from unpredictable “bracket busters.” The name of the game could be differentiation through volatility. With that, an alternative approach would be to simply base your predictions off of a single run of a simulation.
And since a single run of a simulation is roughly equivalent to the aforementioned “advanced gut instinct” method, the result will be brackets that do either really, really well or really, really poorly. And that’s okay, because there are no awards for second place.
The volatility of the tournament just isn’t something that can be accurately predicted. While that is painful for me to say as a data scientist, as a basketball fan it’s something that I can celebrate.
With that, GO HEELS and game on!
Reid Bryant is a data scientist at Brooks Bell. He uses advanced analytics and applied statistics to create data models, refine methodology, and generate deep insights from test results. Reid holds a Master of Science in analytics from the Institute for Advanced Analytics at North Carolina State University.
Brooks Bell helps top brands profit from A/B testing, through end-to-end testing, personalization, and optimization services. We work with clients to effectively leverage data, creating a better understanding of customer segments and leading to more relevant digital customer experiences while maximizing ROI for optimization programs. Find out more about our services.