Engineering a March Madness Bracket: Prediction vs. Strategy

Every March, millions of people fill out brackets. Most of them lose. Most of them also have the wrong mental model for why they lose.

The dominant bracket strategy is vibes-weighted narrative: this team is hot, that coach always chokes, these seeds never make it. It's not crazy — tournament basketball has genuine patterns. But the framing is wrong. Bracket filling isn't prediction. It's optimization under uncertainty, and those are different problems.

This post is about the engineering lens: how to formalize bracket prediction as a modeling problem, what the actual inputs are, and where simulation earns its keep.

Two Different Problems

Before any modeling starts, the framing matters.

Prediction asks: who will win this game? It's a binary classification problem, bounded by the matchup. You want a model that says "Duke wins with P=0.71."

Bracket strategy asks: given a scoring system and a field of 68 teams, what bracket maximizes expected score? This is an optimization problem. The answer depends not just on win probabilities, but on what other people are likely to pick — because in a pool, duplicating consensus picks lowers expected value even when those picks are correct.

Most bracket advice conflates these. The two hardest questions are:

What's the win probability for any given matchup?
Given those probabilities, what's the bracket that beats the pool (not just maximizes correct picks)?

This post focuses on #1. The pool-beating layer is its own combinatorial problem I won't try to fit here.

The Baseline: Seed-Based Models

Seeds are public, non-controversial, and actually informative. The selection committee sets them using a lot of information. A purely seed-based model is your floor — anything more sophisticated should outperform it, or you're adding noise.

Here's what the historical seed data looks like (first-round win probability, approximate):

Seed matchup (higher vs lower)   Historical win rate
1 vs 16                          99%  (one 16-seed win ever: UMBC 2018)
2 vs 15                          94%
3 vs 14                          85%
4 vs 13                          79%
5 vs 12                          64%  (the famous 5-12 "upset special")
6 vs 11                          62%
7 vs 10                          60%
8 vs 9                           51%  (essentially a coin flip)

This is your prior. The 5-12 game is a real phenomenon — 12-seeds win often enough that picking at least one is nearly always correct expected value in a large pool. 8-9 games are genuinely close. 1-seeds go to the Final Four about 60% of the time.

A simple model: assign win probability P(upset) = 1 - (seed_ratio_factor) calibrated to historical rates, then propagate through the bracket by multiplying path probabilities. That gives you an expected wins score for any bracket choice.

The problem with pure seed models: they're symmetric across seeds. A 5-seed with three freshmen starters and a starting center out with injury isn't the same as a 5-seed with a veteran five-out offense. The seed doesn't know that.

Adding Signal: Team Quality Metrics

The useful inputs beyond seeds:

KenPom efficiency margins. Adjusted offensive and defensive efficiency per possession, normalized to a common schedule. These are the best single-number team quality proxy available. A team with +20 adjusted efficiency margin is a 1-seed range team. At +5 you're in 8-10 seed range. These correlate better with actual win probability than seeding, especially when committee seeding has been inconsistent.

Strength of schedule + conference adjustment. A team winning the Sun Belt at 28-3 has a very different profile than a 24-7 Big Ten team. KenPom handles this automatically, but raw win-loss records don't.

Injury flags. Tournament outcomes are highly sensitive to roster availability at the margins. A team missing its primary ball handler has asymmetric impact in late-game situations — exactly when tournament games are decided. Injury information is volatile and hard to systematize, but ignoring it is a known mistake.

Recent form. The last three to four games before the tournament are somewhat predictive — both for momentum and for health status. A team that limped through their conference tournament is a different bet than one that won it.

The upgrade from seed-only to a metrics-based model looks something like:

def win_probability(team_a, team_b):
    # KenPom-derived log5 formula
    rating_a = team_a.kenpom_efficiency  # e.g., +22.4
    rating_b = team_b.kenpom_efficiency  # e.g., +18.1
 
    # Convert efficiency margin to estimated win probability
    # Approximately: P(A beats B) from margin difference
    delta = rating_a - rating_b
    # Logistic scaling: each point of margin ≈ 3% win probability
    p_a = 1 / (1 + 10 ** (-delta / 15))
 
    # Apply injury adjustment
    if team_a.injury_flag:
        p_a *= 0.93
    if team_b.injury_flag:
        p_a /= 0.93
 
    return p_a

This is a simplified sketch — real models have more careful calibration — but the shape is right. You get a win probability for any pairwise matchup that accounts for quality, not just seeding.

Upset Modeling: The Variance Problem

Tournament prediction isn't just about expected outcomes — variance is the point. A 12-seed wins not because models are wrong but because single-game samples have enormous noise. A 35% underdog wins 35% of the time.

The engineering framing: tournament results are approximately Bernoulli with the above win probabilities. One game, one draw. If you simulate the tournament 10,000 times, you get a distribution of outcomes, not a single bracket.

This distribution is what most bracket pickers ignore. They pick the bracket most likely to happen (maximize per-pick expected value) rather than thinking about the full distribution of scores.

Monte Carlo simulation makes this concrete:

def simulate_tournament(teams, win_prob_fn, n_trials=10_000):
    results = defaultdict(int)
 
    for _ in range(n_trials):
        bracket = copy.deepcopy(initial_bracket)
        winner = run_single_tournament(bracket, win_prob_fn)
        results[winner.name] += 1
 
    # Returns frequency of each team winning champion
    return {team: count / n_trials for team, count in results.items()}

Running this gives you each team's champion probability. It also gives you win probabilities for each round — not just the final — which lets you reason about bracket construction in aggregate.

The simulation also reveals a structural insight: the bracket format selects for consistency, not peak performance. A team that wins 6 games in a row needs to sustain quality across 2+ weeks. Explosive offensive teams that live and die on shooting variance are higher-ceiling but lower-floor. Defensive teams with strong rebounding and controlled pace have less variance per game — a feature, not a bug, for bracket purposes.

Expected Score vs. Correct Picks

Standard bracket pools use point systems — 1 point for round one, 2 for round two, and so on. Some weight later rounds more heavily (ESPN gives 10, 20, 40, 80, 160, 320). Under these systems, champion pick correctness matters enormously.

A Monte Carlo approach gives you champion frequency for each team. In a pool where 40% of entrants pick a 1-seed to win it all, and that 1-seed has a 25% actual win probability — every time that team wins, most of your pool gets it right. Your edge comes from finding teams in the 12-20% champion probability range that nobody is picking.

This is where bracket strategy diverges from pure prediction. The right question isn't "who is most likely to win?" but "who is undervalued relative to their actual probability given the pool's consensus picks?"

You can't do this without the simulation output. And you can't trust the simulation output without reasonable win probability estimates to feed it.

What the Model Doesn't Know

Every honest engineering analysis has a humility section.

Sample size is small. 63 games, every year, with roster turnover. That's not enough data to calibrate fine-grained features. Models trained on limited tournament data are overfit. Use them as priors, not predictions.

Hot shooting is real and unpredictable. A team shooting 47% from three in March is probably a bit lucky, and also might keep shooting that way for three more games. Single-game variance from shooting is enormous. The model cannot know.

Coaching matchup effects. Bill Self has won 33 tournament games. Some coaches are better at game-planning for elimination. This is real and hard to quantify — so most models ignore it. That's probably the right call, but it means the model is missing something genuine.

Bracket path matters. Seeding determines who you could face in subsequent rounds. A 2-seed on a difficult side of the bracket has lower expected wins than a 2-seed with an easier projected path. Good models account for this; simple ones don't.

Why This Lens Is Worth Having

I built the two bracket posts on this site — the analytical one and the chaos one — as pick exercises. They're intuition + filter application, not model output. They're honest about that.

The value of the engineering lens is different. It tells you what you're actually doing when you fill out a bracket: making probabilistic statements about 63 outcomes, under a scoring system, in competition with other humans who are also making probabilistic statements.

Once you see it as an optimization problem, the question isn't "who do I like?" It's "where does my probability estimate differ from the field's?" That's where expected value comes from. That's the edge, when there is one.

Most of the time, in a casual pool, there isn't an exploitable edge and that's fine. But having the framework means you know when an edge exists — when a genuinely good mid-major is being systematically undervalued, or when everyone is picking the same 1-seed for an 18-point path. That's where it matters.

March Madness is chaos by design. The format guarantees upsets. The engineering move is to figure out which chaos is predictable — and how much to bet on it.