Sunday, December 06, 2009

You can't predict something that is so predictable

Take it away, Sportscenter:

"History says that Brett Favre throws a lot of interceptions and that Adrian Peterson fumbles the ball. The only stat you can't predict in a football game is the turnover stat, and if there's gonna be an issue with the Minnesota Vikings--we saw glimpses of it tonight--it's gonna be they're going to turn the ball over down the stretch--maybe in the playoffs--and they may be the better team on paper, but I...'cause the turnover history, it may cost them a chance of getting in the Super Bowl."

Wow.

Friday, August 07, 2009

John Smoltz Is Not Finished

Freely available talent has a new face: a future HoF starting pitcher who has averaged nearly a strikeout per inning over the past four years with a K/BB ratio over four and reasonable G/F ratios. (This year, he's struck out 33 and walked nine over 40 IP in the AL East.)

There is nothing in John Smoltz's 2009 peripherals to suggest he can't be a solid mid-rotation starter for a contender. His 8.33 ERA is a mirage of sample size and pitching in terrible luck. Every contending team currently sports a staff with at least one starter--and several relievers--worse than Smoltz.

Will the GMs of those teams start a bidding war for Smoltz's services? Nope, they're still using ERA to evaluate pitchers. In their defense, it looks like BBTF was fooled as well.

Monday, July 27, 2009

Defining Declining Analysis

It's been a long time since I've done a full-fledged post on here, because I've been avoiding baseball-related idiocy in an attempt to reduce stress. However, sometimes I read an article that is so bad that it demands a thorough rebuttal.

That day has come, courtesy of Baseball Prospectus. Once the home of cutting-edge statistical analysis, they now misinterpret basic statistics to draw inaccurate conclusions.

Here are the article's major points:

- For pitchers with a large discrepancy between FIP and ERA in the first half of the season, the correlation coefficient (r-value) between first-half ERA and second-half ERA is .33, whereas the r-value between first-half FIP and second-half ERA is .35. Thus, ERA is "equally as likely" to indicate performance going forward.

It's true that there is little difference between .33 and .35. However, this statement alone means nothing. Let's look at a hypothetical group of pitchers:


1H ERA 1H FIP 2H ERA
SP A 1.00 4.01 3.99
SP B 1.50 4.00 4.00
SP C 2.00 3.99 4.01

The correlation between first-half ERA and second-half ERA is a perfect 1, whereas the correlation between first-half FIP and second-half ERA is a completely imperfect -1: a lower FIP actually indicates a higher ERA going forward!

Does this mean ERA is a better predictor than FIP? Of course not. Anyone can look at the above numbers and see that 2H ERA matches up very well with 1H FIP and not at all with 1H ERA. Yes, this example was contrived, but the same effect is at work with the real numbers. The lesson: Don't believe everything an r-value tells you.

- Pitchers with a discrepancy between their 1H FIP and ERA, as a group, had a 3.34 1H ERA, a 4.64 1H FIP, and a 4.60 2H ERA. This compares to a control group with a 4.40 1H ERA, 4.34 1H FIP, and 4.35 2H ERA.

Now, you might think that this means FIP is way, way better than ERA at predicting future performance. But wait...

- The 2H ERA sample has a higher standard deviation (1.42) than the 1H ERA (0.83) and the 1H/2H FIPs. This explains everything!

It explains nothing. I can't believe I have to point this out, but as the average ERA of a group increases, the standard deviation of ERAs within the group tends to increase with it.

Seidman reminds us that this is the "SAME group" of pitchers. So let's do it his way and make two groups of the SAME pitchers: Group A is every starter's ten best starts from 2008, and Group B is every starter's ten worst starts. Naturally there is going to be a huge discrepancy in group ERA--it might be something like 1.50 for Group A and 8.00 for Group B.

What about the standard deviations for the groups? Should we expect them to be equal, since these are the SAME pitchers? Of course not. Group A is going to contain a lot of ERAs between 1.00 and 2.00, while Group B will be spread more thinly between 6.00 and 11.00.

Similarly, we simply cannot expect a group with a 4.60 ERA to have the same standard deviation as a group with a 3.34 ERA, even if it is the SAME guys. (Okay, I'll stop with the caps now.)

What about the 2H ERA having a higher standard deviation than either FIP sample? ERA naturally has a higher standard deviation than FIP, because FIP has much of ERA's variance stripped from it. The reason 1H ERA has a similar standard deviation to the FIP samples is that the average 1H ERA is much lower than either group's FIP, reducing the standard deviation as we saw above.

Mason Malmuth once wrote that the real handicap of a bad poker book is that the reader cannot distinguish between good advice and bad, and as a result will develop bad habits without knowing it. If BP doesn't screen its content better than this, it's going to suffer from the same problem.

Thursday, June 25, 2009

Best Chat Answer Ever

Dave (PA)

Keith, You seem pretty jaded about the whole steroid issue, do you have any comments about the example it sets for young people who are heavily involved in athletics?

Keith Law

Well, maybe if the media would stop harping on the subject and implying that steroids make you a superstar, kids wouldn't get the idea that they're worth using.

Amen.

Wednesday, June 24, 2009

The Yankees Have Lost All Right To Ever Complain

Remember all the bitching we heard from the Bronx when Wang Chien-Ming injured himself running the bases last year? "Interleague play is killing our pitchers blah blah blah."

Fast-forward to June 2009. The Yankees are batting in the top of the ninth inning with a four-run lead and the bases loaded. Mariano Rivera, 39 years old and with one career at-bat under his belt, is sent up to hit. Why? Because he entered the game in a save situationTM and removing him now would deny him the save.

So...the Yankees want Major League Baseball to change its entire system of play to better suit their pitchers' health, but they're too engrossed by a meaningless stat to protect a guy who clearly should not be batting under any circumstance? Smooth.

Monday, April 27, 2009

Today's Pet Peeve

Yesterday, Alfonso Soriano was hit in the helmet by a pitch. Clearly, this didn't happen on purpose. Later in the game, Albert Pujols was hit in retaliation.

In retaliation for what? An accident that Pujols had nothing to do with?

If you see Lou Piniella in a bar, make sure you don't spill your drink on him; if you do, he might punch your friend in the face. That'll teach you!

Sunday, April 26, 2009

Today's Worst Headline

This one has to be seen to be believed.