Wednesday, February 25, 2009

Forecasting

It's funny how statistical forecasting is perceived. When a baseball analyst designs an ingenious method to project how the season will play out, all people can talk about is how the system is selling their team short, or how it missed on their team three years ago, so it will again this year. When that same analyst starts blogging about an ingenious method to project election outcomes, all of a sudden he's the greatest thing since sliced bread, despite limited evidence that his work represents any substantial improvement over what we were doing before.

I hope I don't sound too flippant here; I know the vast majority of Americans (unlike me) cared more about who came out on top last November than who came out on top last October. Still, Nate Silver's image makes for an interesting case study. Why is so much praise heaped upon him for his election predictions, which were essentially the same as many other forecasters', while so much scorn meets his accurate but contrarian baseball picks?

Well, the contrarian part is a good place to start: people don't like it when computers buck popular opinion or recent trends. When computers recommended a less aggressive strategy for buying real estate and stocks, people instead looked at recent upward trends and gambled their savings away. When the BCS computers suggested a different national championship matchup than the human polls, the system was changed so that the computers would agree with the pollsters; now the BCS rankings, which were built to reflect a spectrum of opinions, instead simply reinforce the old way of doing things. If the computer says Obama will win by a larger margin than anticipated, that's basically a reinforcement of the status quo; but saying the White Sox will go from first to last is uncouth and should be ridiculed.

Furthermore, there are the results: Silver was "right" about the U.S. election, but his projected MLB standings constantly "miss"--by an average of 6.54 wins per team since 2003! (More on this later.) This might be a popular way of looking at things, but I retort that an election is simply much easier to forecast than a season of Major League Baseball. It's one game instead of 2430, with much more data and smaller variables.

If you're reading this, you can probably name plenty of instances where one at-bat greatly influenced the outcome of a baseball season. Think Bill Mazeroski, Joe Carter, Bobby Thomson, etc. Outside of a bad Kevin Costner movie, can you name one major election that was similarly influenced by one voter? Compared to simulating an entire season of baseball, forecasting an election is more like picking the winner of the Super Bowl...on the day of the game.

Anyway, now that we're six paragraphs in and you haven't stopped reading, on to my thesis. How bad is it to miss by 6.54 games per team? (We're talking mean absolute error, because a typical PECOTA dissenter doesn't know what a standard deviation is.) If you're not familiar with probability and statistics, 6.54 games probably sounds like a lot, but it isn't.

Perhaps the best way of showing this is to look at an ideal league. Our team, the Average Means, is the very definition of league-average: during each plate appearance, each Means hitter winds up with a single 15.6% of the time, a walk 8.5%, etc. Every pitcher gives up 4.32 earned runs and .37 unearned runs per game in front of a league-average defense. Furthermore, the Means play such a consistently average schedule that they have exactly a 50% chance of winning every game. If we had to predict the Means' record in the upcoming season, obviously we would tab them to go 81-81, but how often would we be right?

Perhaps surprisingly, the Means would win exactly 81 games just 6.26% of the time; 90% of the time, they would win between 71 and 91 games inclusive. On average, our 81-win forecast would miss by 5.07 games. That's not very far from 6.54--and remember, that's the absolute best we can do with perfect information. In the real world, we have to deal with injuries, trades, and Andruw Jones in Dodger blue. Under those circumstances, an average miss of 6.54 wins is damn good, and the kind of thing I'll gladly take to the bank every year.

Before I go, one more comment. Some White Sox backers are especially angry with PECOTA because it failed to see their 2005 World Series title coming--like every other intelligent analyst on the planet--and has consistently predicted disappointing finishes for them since. This reflects a common error in sports analysis: identifying a team by its uniform rather than its players.

Think back to the 2008 preseason: analysts were touting the Rays as the year's surprise team, but others couldn't get the images of Ryan Rupe and Esteban Yan out of their heads. Nobody was suggesting Rupe and Yan could suddenly turn things around for Tampa; they instead believed that Matt Garza, Carlos Pena and Evan Longoria were good baseball players and thus would make for a good baseball team. Similarly, PECOTA's 2009 White Sox forecast is not an attempt to take away their World Series trophy. Only six players remain from the '05 squad, and that generously counts Jose Contreras, who may not pitch at all this year. Even if Nate Silver completely whiffed on the '05 projections for Jon Garland and Dustin Hermanson, what difference does that make for this year's Pale Hose?

If you don't like the computer forecast for your favorite team, you don't have to believe the results, or even read them at all. Just don't declare that the computer is flat wrong, unless you want to put your money where your mouth is. If you do, great, I could use a new house.

Suggested reading: Randomness in Team Standings Predictions. Since 2003, PECOTA has a standard deviation of 8.67 games against the actual results, versus an ideal of about 6.3 games.

Valuing a GM

From Rob Neyer today:

"The blog Fire Jim Bowden has done a fantastic job of rounding up information about LaCava, including this choice comment from Keith Law: "Going from Jim Bowden to Tony LaCava would be like going from Austin Kearns to Albert Pujols." I know Keith was doing an apples-and-oranges thing on purpose, but it's worth noting that the actual difference between a lousy general manager and a great general manager is significantly larger than the difference between Kearns and Pujols. In terms of wins and losses, I mean."

This is just not an accurate statement. It's easy to look back and say that Mr. X is a great GM because he ripped off Mr. Y in a trade four years ago, but it's a lot harder to predict that your team will add three wins next year because they hired Mr. A as GM instead of Mr. B. Identifying the GMs who have performed the best is easy; identifying those who will perform the best in the future is not.

Front office execs may be underpaid as a whole, but if the difference between a lousy GM and a great GM was 'significantly larger' than the seven-win gap between Pujols and Kearns, some team out there would be exploiting that by offering the top GM in the game--whoever it is--a big salary to lure him away from his current job. $5 million ought to be way more than enough, and if that's really worth more than seven wins, that would be an absolute steal in today's MLB. It says a lot that no GM is making anywhere close to that much.

For Rob Neyer's sake, I hope this was just a typo.

Friday, February 20, 2009

Slowrolled

I walked into the Wynn sportsbook today to bet some baseball futures. On the area of the big board where they normally rotate odds for the World Series and AL/NL pennants, today they had odds for the Academy Awards. Some highlights:

Best Movie
Slumdog Millionaire EV

Best Actor
Mickey Rourke 6-5

Best Actress
Kate Winslet 8-5

Best Supporting Actor
Heath Ledger 1-5

Best Supporting Actress
Penelope Cruz 5-1

Best Director
Danny Boyle 3-5

I don't watch awards shows, but I do love making money from incompetent bookies. With my mind racing over what car I should buy with my winnings, I headed to the window.

Me: "Oscars, please."
Writer: "Sorry sir, those odds are for entertainment purposes only."
Me: (stunned silence)

So, I guess this is some idiotic publicity stunt. I certainly hope it backfires on Steve Wynn; setting fake odds for a major media event and refusing to book any action is definitely one of the most douchebaggy moves I can think of.

They're On To Us

The Venetian is the first major sportsbook to post over-under numbers for MLB season wins that are actually open for betting, and I'm surprised on many levels. Firstly, I expected these to show up on BetCRIS or The Greek before any live books. What's really stunning, however, are the numbers themselves.

I copied the teams in the exact order they're listed on the betting sheet. Ordinarily, you expect some logical pattern to the teams' order: they could be sorted alphabetically, or from most wins to least, or by their 2008 records. The sheet doesn't follow any of these conventions; top-to-bottom, it resembles a set of 2009 power rankings written by an unsophisticated analyst. Notice that the win totals jump up and down almost haphazardly.

Furthermore, the win totals are inconsistent with the futures odds offered by the very same book. To wit:

Brewers
O/U wins: 84.5
Odds to win NL: 18-1
Odds to win WS: 55-1

Astros
O/U wins: 72.5
Odds to win NL: 14-1
Odds to win WS: 40-1

Now, I understand that these are two separate betting markets, and the sports betting market as a whole is certainly not efficient. Still, there's no way the same oddsmaker could come up with such inconsistent odds for two teams in the same division--unless he had a big change of heart.

So what happened here? The World Series and Pennant odds were released last October, and while they've moved in response to bets and free agency, for the most part they haven't changed much. The season wins lines, however, came out this week. During that four month gap, it appears the Venetian was influenced by, ahem, an unpaid consultant.

I don't know why the Venetian is suddenly paying attention to Baseball Prospectus this year, but the evidence is too damning to ignore. I'm interested to see how other sportsbooks will handle their season wins markets: do they post similar numbers to avoid arbitrage, or shade the lines closer to where the public feels they should be? Only time will tell.

2009 MLB Season Wins Over/Under Numbers

From the Venetian:
Yankees 96.5
Cubs 95.5
Red Sox 95.5
Phillies 88.5
Mets 89.5
Dodgers 83.5
Angels 85.5
Rays 90.5
Diamondbacks 87.5
Indians 84.5
Cardinals 81.5
Twins 80.5
White Sox 76.5
Tigers 79.5
Brewers 84.5
Athletics 81.5
Braves 85.5
Rockies 78.5
Marlins 74.5
Astros 72.5
Giants 79.5
Blue Jays 81.5
Rangers 74.5
Reds 77.5
Orioles 74.5
Mariners 71.5
Padres 72.5
Nationals 73.5
Royals 74.5
Pirates 67.5

I'll have more to say on these numbers in my next post. For now, this is the first time I think I've ever been the first to leak anything to the public, so I see no reason to delay it with analysis.

Monday, February 16, 2009

Links and Rants

It's that time of year again.

If you're upset that you won't be seeing Ichiro pitch, this might give you something to root for. Well, except for the part where Boyd compares himself to Satchel Paige.

On a similar note, while I'm sick of the A-Rod saga, I can only name one player that I really hope was clean throughout his career: Julio Franco. I don't begrudge any player for juicing when it was clearly in his best interests to do so, but the Franco story--a dozen raw eggs a day, some lean meat, and a strict workout regimen keeps him in the majors through age 48--still appeals to me.

After reading this and watching The Daily Show coverage of Obama's Florida visit, I can't help but wonder: does everyone in the USA really think Barack is their savior? If the expectations for him are this high, what does he need to accomplish as president to avoid being labeled a failure?

Friday, February 06, 2009

Quick Bites

- I know news is slow these days, but it's still unacceptable that Michael Phelps is getting more flak for the bong photo than he did for his DUI. I'm not a 420er, but one doesn't have to be to realize the difference between a victimless crime and a hazardous one.

No one was harmed by Phelps's DUI, you say? What a results-oriented way to look at things. We might as well fire a gun into a crowd of people and hope the bullet misses everyone.

- This thread, especially post 5, illustrates why sports betting forums are awesome for everything except advice you can actually take to the bank.

- The question everyone is asking: Who will be this year's Rays? The quick answer: No one. Turnarounds like that require a confluence of lots of factors: A very talented young team that was very unlucky last year, plus a smart front office that added the necessary missing pieces. Does that sound like the Pirates, Royals, Orioles or Nationals?

There are certainly teams that will improve upon last year's results; I can think of three off the top of my head that will probably add 10 or more wins each. But none of the long-suffering franchises are likely to make any noise in 2009.