Sunday, April 08, 2007

Confirmation Bias, Sample Size, and the Nationals

The Nationals are bad.

We didn't need a 1-5 record to tell us this, or the record they set by starting the season with six straight games where they trailed 4-0 or worse. Washington has virtually no hope for the playoffs this year, and should consider 70 wins a success.

But they won't lose 110 games, as some pundits are predicting. Not unless they get extremely unlucky--much like they have been so far this year. More on that later.

Few teams are ever truly bad enough to win or lose 110 games in a season. Most of the time, what we really have are teams with a "true" talent level of 95-100 wins or losses, and who push the boundaries of variance to get to an extremely high or low level. The 2001 Mariners lead the way in the "fortunate" column, while the unlucky souls include teams like the 2003 Tigers, who lost 119 games but probably should have lost "only" 105.

Several studies, most notably those of Baseball Prospectus, contend that a "replacement level" team of rookies and freely available talent should win roughly 49 games in a full season. Ryan Zimmerman alone is worth more than the four wins needed to get the Nationals from those 113 losses down to 109, to say nothing of the virtues of Chad Cordero, Felipe Lopez, Austin Kearns, Ryan Church, John Patterson--all good players stuck in a bad situation. You may never have heard of starting pitchers Jason Bergmann or Shawn Hill, but they have solid minor league track records and are every bit as effective as $42 million man Jeff Suppan.

Does anyone remember a recent team that was completely written off because of their inexperience and lack of familiar names? I do, but it seems like no one else learned their lesson last year.

Furthermore, their early results aside, the Nationals are simply not playing all that badly. Let's play "Name That Team!"

Team A:

Team OPS: .710 (10th in NL)
Team OPS allowed: .859 (16th in NL)

Team B:

Team OPS: .681 (11th in NL)
Pitchers' OPS allowed: .805 (15th in NL)

Which team is the Nationals, and which is the 4-1 Atlanta Braves? Does it really matter? Both these teams have performed poorly; one has simply gotten every break and emerged with an .800 winning percentage, while the other is seeing every runner stranded in scoring position.

So why is everyone forecasting the Nats to emulate the '03 Tigers or 1899 Cleveland Spiders? Sample size and confirmation bias. The first term should be familiar to anyone with a basic knowledge of statistics: when you have a limited sample, expect the numbers to be an inaccurate representation. A-Rod won't hit 120 homers this year, and the Nationals won't lose 135 games.

The second term may be less well-known, but it's self-explanatory. When one makes a prediction and then sees it come to fruition, he labels himself a genius and assumes the trend will continue. Everyone overrates the ability of themselves and others to see the future, be they losing gamblers, stock market analysts, readers of the daily horoscope, or baseball fans and analysts.

You can't predict with any certainty what cards you will receive on the next deal or what team will cover the spread in this weekend's game. The stock pundits you see on TV do not outperform the market as a whole. If you read the horoscope regularly, you will discover that the messages are made intentionally ambiguous to allow you to apply them to your own life, the same technique John Edward uses. The baseball writers who tell you the Nationals will lose 110 games are the same ones who said Nomar Garciaparra would be a first-ballot Hall of Famer and that Carlos Pena and Ryan Anderson would be stars in the majors.

You can write off the Nationals all you want, but they're not going down in flames this year.

No comments: