Honest Wagner: Hitting ratios and sample size

Friday, April 16, 2004

Hitting ratios and sample size

Question: If Abraham Nunez is 7 for 12 against Glavine, should he start - all other things being equal? A lot of baseball sites are citing this article today.

At first I'm underwhelmed by twelve at-bats and think no. Twelve at-bats? How can that be meaningful? Closer study shows that it depends on what happens in those 12 at-bats. 10 for 12 or 12 for 12 is much more relevant and useful information than, say, 2 for 12 or 4 for 12.

To illustrate: imagine that Nunez could hypothetically bat against Glavine a thousand times and both players would have constant abilities. As in Strat-O-Matic. The margin of error on a sample size of 12 would be 28.1%. That is the pure sampling error; the margin of error describes the likely range of results from repeated surveys with n=12. By "likely" the MOE means 95% likely.

In other words, if you took 100 samples of 12 at-bats from this hypothetical population, in 95 of these samples Nunez's batting average would fall between 7-for-12 plus or minus 28.1%. That's somewhere between .302 and .864. Assuming that hitting and pitching abilities are perfectly constant, and therefore each at-bat is a sample of a much broader population of at-bats which comprise the "true" matchup, then a 7-for-12 performance means that there is a 95% chance that Abraham Nunez is at least a .302 hitter against Tom Glavine.

To repeat. Statistically speaking, then, it's not true that you throw out all samples just because they are small. Not all small sample sizes are equally meaningless. The smaller they are, the more dramatic they must be to make the sample have value. If Nunez was 4 for 12 against Glavine, the sampling error would be the same, and 95 percent of the samples of the hypothetical population of 1000 would fall between .052 and .614 (which is .333 plus or minus .281). Raul Mondesi is a .286 hitter in 56 at-bats. If this was Strat-O-Matic, that sample would indicate there is a 95% chance his "true" ability vs. Glavine lies in the range from .286 plus or minus 12.7% (the MOE for a sample size of 56) or from .159 to .413. That statistic is meaningless as an indicator of above-average ability vs. Glavine, even in Strat-O-Matic, since .286 is much too close to average to make a sample of 56 meaningful. Chris Stynes is 2 for 14 against Glavine. That is also meaningless in a Strat-O-Matic world since it means there's a 95% chance his "true" BA vs. Glavine lies between .001 (which I would believe right now) and .402 (which strains credulity, but that's what the math says). The moral of these examples: low batting averages in small samples mean much less than high batting averages in small samples - in a Strat-O-Matic world - since "average" ability is on the low end of the spectrum. (2 for 14 is much closer to average than 7 for 12.)

Here's the catch. I don't believe that hitting and pitching abilities are constant. I think a hitter's ability changes from minute to minute and from at-bat to at-bat. Much of baseball statistics is predicated on the Strat-O-Matic model. Just because Josh Fogg has never been good after the 46th pitch, doesn't mean he never will be good after the 46th pitch. Common sense suggests otherwise, but it could happen. It could start to rain on pitch 47 and his ball could start darting crazy. Or maybe he took a Claritan-D tablet before the game and it really kicks in at pitch 46 and, nostrils clear, he takes huge, healthy lungs of air and starts pitching the best game of his life. It could happen. His abilities are not etched in stone in some Platonic ideal realm of truth. Still, since we have nothing better, we have to suspend belief and agree that baseball is enough like Stat-O-Matic. Otherwise, we can have no statistical analysis at all. And that would be no fun.

This is a huge reason why you have to combine scouting with statistical analysis. For all we know, Nunez got his 12 at-bats in three games where Glavine was not the constant true ideal Glavine ability but rather a poor number in the highly-variable series of Glavine ability. Maybe all 12 at-bats took place in one extra-inning game at Coors Field when Glavine had the flu and a ten-run lead. Or, uh, maybe not. You get the idea. Before Mac should trust this sample of 12, he has to know if there is any reason to believe it was a good, random sample of Glavine in various stages of ability (and Nunez in various stages of ability).

Maybe there is something about Nunez as a hitter that matches up unusually well with Glavine - something that Mac, Nunez, Glavine, etc., can see. Some way that Glavine throws and some way that Nunez swings. Something that makes this 7 for 12 expected. If so, you have to start him. If no, you disregard the number.

Is the number meaningless then? No, because it gives you the heads-up. It tells you to investigate and see if there is any rationale for such numbers. Maybe Mac didn't realize that guys like Nunez have such a real advantage on guys like Glavine because Nunez is humble and doesn't remember that he has a lot of hits against him.

Whatever. Either way, you can't dismiss 7-for-12 just because the sample size is only 12.

... another example: Craig A. Wilson is 1 for 10 this year with RISP. The MOE for a sample of 10 of a hypothetical ideal RISP ability (say 1000 ABs) is 30.8%. That means there is a 95% chance, based on this sample of ten, that Wilson's hypothetical ideal RISP ability lies between .001 (has a hit, can't be .000) and .408. Obviously the first case is terrible and the second is Hall-of-Fame caliber. This sample of ten is pretty meaningless as an indicator of his "clutch" ability then. FWIW, his three-year splits indicate he's the same hitter with RISP as with the bases empty.

Honest Wagner

Friday, April 16, 2004

Hitting ratios and sample size

No comments:

Post a Comment