Baseball Determinism

PECOTA's projected 2008 standings are out, and over at BBTF they're discussing the results.

(I'm just going to mention here that it looks like the PECOTA depth charts include some players accruing substantial playing time for the wrong teams, such as Steve Trachsel and Morgan Ensberg. That may be affecting the results. Moving on...)

The guys at BBTF are normally very good at baseball analysis, but most of them are missing the point in this thread. First, they start on an unnecessary tangent about whether last year's Yankees would have won the AL East or the World Series under a number of hypothetical scenarios. The answer: we don't know. Maybe they would have had a 35% chance at winning the World Series if they had advanced to face the Red Sox in the ALCS, but we can't say that it would happen for certain. If it was that easy, the pundits at ESPN--who are so sure what will happen ahead of time--would never have to work again, because the gambling winnings would constantly flow in.

But my main bone of contention is with the notion that the outcome of this year's AL East race will be a test of various projection systems. Obviously, all the projections have the Rays and Blue Jays battling it out...for third place. The Yankees and Red Sox are separated by only a few games in each projection, with most having New York on top.

Let's say your projections have New York beating Boston, 94 wins to 93, while another person says Boston will triumph by one game. If Boston actually wins the division, does this prove that the other guy did a better job than you? Certainly not. When it comes to such small margins, the results are dominated by variance. In fact, even if you KNOW that New York is a 94-win team and Boston is a 93-win team, the Yankees will only come out on top 54.4% of the time! With a sample size of one, this is practically a coin flip, and I rarely say that someone else is smarter than me because I lose a coin toss to him.

I highly recommend this piece on randomness in MLB standings. Even if you know a team is "supposed" to win 94 games, 15% of the time they will still finish with 87 or fewer wins. Another 15% of the time, they'll win 101 or more.

Remember, this assumes you know the exact talent level of each team. In real life, you have injuries, trades, call-ups, send-downs, and uncertain projections; all of these serve to increase the level of uncertainty. I very much doubt that anyone can consistently project the MLB standings with a standard deviation of under 8 games.

This doesn't mean you can't that argue some projections are better than others. When the White Sox hit PECOTA's 2007 projection of 72 wins on the nose--versus a Vegas over/under line of 89.5--it was extremely unlikely that this happened by chance alone. This alone doesn't prove PECOTA is a better judge of the standings than Vegas, but it's certainly a strong data point in favor of that notion.

