Friday, March 30, 2007

In Boston

I went to Fenway Park last night. Unlike Wrigley, you can't look in. But, at least I can say that I stood in its presence.

Monday, March 26, 2007

Terror management theory and the old ballgame

Alan Schwarz talks of revamping the schedule. (Hat tip: Dan Fox)

Bill James gives an interview over at Shea Faithful. (Hat tip: Baseball Musings)

Tango Tiger has a nice little rant on American culture and the over-romanticization of baseball.

A few comments on the Tango piece from a psychological perspective. He's right that people get dewey-eyed at the very idea of baseball and are much more shocked than they should be at the thought of baseball players behaving badly. The Catch-22 is that the problem of players behaving badly may actually increase the over-romanticization of the game.

A small introduction to the concept of terror management theory. (No, I haven't become "Baseball National Security Advisor") Terror management theory says that when someone experiences a threat (direct or indirect) to something that they hold dear, whether it is their life, their family, or their worldview, they respond by clinging on more tightly to everything they hold dear and begin to try to fashion some meaning out of it. (They "manage" the terror invoked by such a threat, not by confronting it and dealing directly with it, but by becoming more involved in other things.) The theory was developed in the mid-80s, but came to a very real demonstration with the 9/11 attacks. People were suddenly very aware of their own mortaility and security. What happened? Somewhat incongruously, everyone began buying American flags and flag lapel pins... as if that was going to stop Osama or bring back the dead. For months, all anyone wanted to talk about was how great America was, despite the fact that before the attacks, that sort of discussion wasn't really going on. Churches saw their attendance figures swell.

People were clinging to what they knew and places that could help them make meaning of life.

Fast forward to the baseball steroid scandal, the most recent big threat to the "integrity" of the game. I don't know whether individual players took steroids or not, but something about it sure doesn't pass the "smell test." Baseball fans see that and if the allegations are true, it does represent a challenge to the "purity" of the game.

Where did we get the idea of baseball as "pure?" If there is a game intertwined with the heart and soul of the American culture, it is baseball. The game has a special place in our cultural mythology that simply can't be touched by football or basketball. Perhaps the only proper parallel would be football/soccer in Europe and South America. We are a baseball culture. And, thanks to another wonderful psychological mechanism, the self-serving bias, we are the greatest thing ever put on the earth. (Doesn't matter whom you ask or in what setting. People almost always describe themselves...and people like them... as good and righteous.) By extension, baseball must be good, righteous, and pure.

So given these circumstances, when a threat to the "purity" of baseball comes along, psychology would predict not that people would become more disillusioned with some of the men who play it (confronting reality), but instead will cling more tightly to abstract ideas of "baseball as America" and celebrating the "untarnished spirit" (whatever that means...) of baseball.

Baseball, unfortunately, is not a transcendent force. It's a game played by imperfect humans. A game that has a good deal of beautiful poetry written about it and a great deal more that can be written. I love the game, but it's just a game.

Friday, March 23, 2007

Called up from AAA

Oddly enough, I've been called up to the majors. The folks over at offered me a gig doing their "Statistically Speaking" blog (along with a couple other guys, and I will take them up on the offer). I'm going to move most of my Sabermetrics content over there, although I plan to keep this open for more psychologically related topics.

There's gonna be a play at the plate

For those of you who read my post concerning the determining factors of a sacrifice fly. I had a little bit of time to look over the effect of baserunner speed on the sacrifice fly scenario. Again, I'm working with a Retrosheet data base of all flyballs caught by an outfielder with less than two outs and a runner on third from 1993-1998. I calculated the speed score using the Bill James formula (I'm not a huge fan of the measure -- here's a better one -- but James will do for now) for each runner in the particular year in question. Again, I used a binary logistic regression. Did speed predict whether or not the batter broke for home? Not really. The regression coefficient was significant, but the Nagelkerke R-squared was about 1%.

Pardon me while I yawn.

Did speed help our erstwhile runner make it home? No, although this time the R-squared value made it up to a whopping 1.1%.

I re-ran the analyses with three predictors: distance, speed, and the interaction of the two (speed * distance). In that model, the distance from home plate was the only significant predictor of whether or not the runner ran and if he did, whether he made it. So, why doesn't speed make a difference?

Consider the following: the world record in the 100m dash is currently 9.77 seconds. Given that a meter is roughly three feet, a 30m dash would likely take around three seconds. (For the record: Yes, I know that the runner has to accelerate to full speed. Stay with me, folks.) I can run 100m, presuming that there's a water break around the 70 meter mark, in around 15 seconds. At that rate, it would take me 4.5 seconds to run 90 feet. So, the difference between me and the world's greatest sprinter is all of a second and a half over 90 feet.

I'll presume that even the fastest major leaguers are slower than Mr. Powell and that even the slowest are a bit faster than me. (Tonight, on the now-defunct Baseball Network, Victor Martinez and I will run a foot race!) So, let's say that the diference between the fastest and slowest runners over 90 feet is something in the neighborhood of one second.

If I didn't have to work on my dissertation, I'd take a look at the effects of speed on the situation of a runner trying to score from 1st base on a double. Over 270 feet, that difference in speed is now a matter of several seconds.

While I'm here:

Ryan Dempster is training to be a ninja? Is Roger Clemens syndrome rubbing off on Allan H. "Bud" Selig? Did the Braves just get the steal of the century by signing Brian McCann?

Thursday, March 22, 2007

The Replacement Spring of 1995

I've toyed around with the idea for a while, and I'm looking a little more deeply into it: I'd like to write a book (or something) on the ill-fated Spring Training of 1995, including a "What If?" section. However, it's hard to find authentic resources from the era. Right now, I'm trying to track down either a copy of Benson's Baseball Monthly from April 1995 or of Stats Inc.'s Replacement Player Handbook from that year. I've called both companies and both recycled all remaining copies a few years ago. I'd also love it if the old archives were still up. (They had the old AP game stories and box scores for the games themselves!) I checked the Internet archive, but no dice. No one at their parent company is returning my phone calls.

Anybody got any leads on how I might get my hands on these (or anything like them)?

Wednesday, March 21, 2007

Runner tagging from third, here's the throw...

A few folks have been writing about measuring an outfielder's arm, most notably John Walsh at Hardball Times. Walsh's method can be found here, but the relevant information goes something like this:

There are many different kinds of plays that require an outfielder to use his arm, and it probably isn't possible to take all of them into account. To keep the analysis manageable, I've isolated five different outfield plays that I use to measure the prowess of outfield arms:

  1. S-1B: A single is hit to the OF with a runner on 1B and 2B unoccupied.
  2. S-2B: A single is hit to the OF with a runner on 2B.
  3. D-1B: A double is hit to the OF with a runner on 1B.
  4. F-3B: An OF fly is caught with a runner on 3B, fewer than 2 outs.
  5. F-2B: An OF fly is caught with a runner on 2B and 3B unoccupied, fewer than two outs.

For each play that falls into one of these categories, I classify the play into one of three possible outcomes:

  1. Kill: an assist was recorded by the outfielder
  2. Hold: the runner did not take the extra base
  3. Advance: the runner took the extra base
On an intuitive level, it makes sense. An outfielder's job in that situation is either to scare the runner so much that he doesn't dare try for the extra base or, if he does run, to throw him out. Walsh's system directly measures how often the runner was scared, and if he ran, how many times he was gunned down. The problem is that there's more to that job than just being able to throw well (i.e., far enough, and on-target).

Let's take a look at the situation of a double with a runner on first base, but this time from the perspective of the runner (or more likely, the third base coach). Whether or not he's even waved home will be a function of several factors: where the ball was hit (and how far away that is from home plate), how quickly the fielder gets to the ball, where on the basepaths the runner is when the fielder picks the ball up, how fast the runner is, and finally what the third base coach believes about the outfielder's arm, as well as how cautious/risk-seeking he is in deciding to send runners. The coach may also be considering who's on deck and whether it might be the wiser choice to stop the runner and leave it up to the next hitter. If the runner goes, then the result of the play will depend on how far the throw has to travel, the runner's speed and location when the throw is made, and the catcher's abilities in blocking the plate, as well as the outfielder's actual throwing abilities. I can grant that some of these are un-measurable (at least without donating my firstborn to Baseball Info Solutions), or that the errors are randomly distributed (i.e., over time, things even out).

The one that most concerns me is the distance that the throw has to travel, specifically because parks are shaped rather differently in the outfield. (Old Tiger Stadium and it's 440 foot CF fence come to mind.) A short left field fence not only means that right-handed pull hitters salivate, but that left fielders, have a shorter range of throws that they will have to make.

Does the distance which the throw has to travel affect whether the situation will result in a hold, kill, or advance? With so many other variables to consider, you might think it hard to dis-entangle such things. However, baseball provides us with a wonderful natural experiment. Consider the fly ball to the outfield with a runner on third and less than two outs (in other words, the would-be sacrifice fly). In every case, we can standardize how far away the batter is from home plate (90 feet) when the fielder has the ball in his hand. Thanks to Retrosheet, we have data on where the ball is (roughly). Through the use of some simple trigonometry, the Project Scoresheet hit location grid (used by Retrosheet), a little knowledge on the makeup of a baseball diamond, and the outfield dimensions of the park in question, it is possible to at least estimate how far away from home plate the central point of each zone is. It's not perfect, but it's good enough for government work. (If you'd like to know the specifics of my method, I will send them to you... or perhaps post them at a later date.)

To answer the question, I took the PBP data for 1993 to 1998, and selected out all instances in which, with a runner on third, a fly ball or line drive was hit to and caught by an outfielder with less than two out. There were 9,415 such instances. In those instances, 84% of the time (7910)the runner broke for home. 16% of the time, he stayed.

Did how deep the fly ball went have something to do with whether or not he went? I ran a binary logit regression on whether or not the runner went, with the fielder's estimated distance (in feet) from home plate as the predictor. Distance was a significant predictor (as might be expected... runners are more likely to tag on a deep fly than a shallow one), with a beta weight of .056.

Surprisingly, the Nagelkerke R-squared value was .495, suggesting that nearly half of the decision of whether or not the runner would run was based on where the ball was hit, before considerations of arm were taken into account. Also, using a cut-point of 50% probability, the model correctly predicted 90.1% of the obeserved cases. Clearly, runners think a lot about where the ball is before they try running home.

I should take a look at runner speed scores in that regression to see what that does, but I need to get to bed (perhaps this weekend).

Now, of the 7910 cases where the runner broke for home, 13 (.16%) of them resulted in a low-probability event (run down, another runner caught at another base). Of the remaining 7897, 2.9% of them (228) resulted in an out at home plate, with the rest (97.1%) resulting in the runner from third scoring. Again, runners appear to be very careful to pick their situations and seem to do a good job of it.

I ran a similar logit regression and found that distance was again a significant predictor (Beta = .038), although the Nagelkerke R-squared was a mere .172. Only 17% of the variance in whether or not the runner scored could be linked to the distance the throw would have to traverse. This leaves open the possibility that a good arm might very well be a huge part of the difference between a sac fly and a fly-out-throw-em-out double play.

The findings bring up some interesting conclusions. First off, we have evidence that runners are extremely smart and rarely take un-needed chances on the basepaths, at least when it comes to sac-flies. (I should look up how often they are gunned down in the other situations.) So, it could actually be more important to have the reputation of a good arm than an actual good arm, that the runner would think twice before trying to run. Second, if distance predicts whether or not a runner will go (and to a lesser extent if he will make it), then park dimensions matter. Players who play in smaller parks will have shorter throws to make. This could make their hold and kill rates look better, not because of their arm, but because of the fence behind them.

In Walsh's original article, the best throwing left fielder in baseball last year was Manny Ramirez, who just happens to play in front of the Green Monster in Fenway Park, with (you guessed it!) the shortest left field and left center dimensions in the majors.

While I'm here:

Did I mention I'll be in Boston next week?

Tuesday, March 20, 2007

Bob Uecker has a stalker?

Milwaukee Brewers announcer and (Mr. Belvedire co-star!) Bob Uecker apparently has a stalker.


Rating and Fleecing GMs

One of the most discussed posts in the baseball blogosphere yesterday was from U.S.S. Mariner on the proper evaluation of trades. Two models were put forward:

1) Each trade should be evaluated on what’s known at the time. If a trade turns out much better than expected, or much worse, that shouldn’t affect our opinion of the trade.

2) Each trade should be evaluated on the results of the trade. If a trade looks like it’s an amazing rip-off, even if at the time everyone acknowledges it as such, but the victim turns out the winner due to unforseen circumstances, the victim’s still the victor.

This might even be a more profound post than you might believe. The actual issue raised by the question can be argued in circles for hours on end. GM's have to make their decisions based on the information at hand when the trade is made. We, the fans, have the opportunity to Tuesday morning quarterback (sorry for mixing my metaphors) the trades.

Two interesting questions:

1) Clearly, GMs should be held responsible for what information that they have available when completing the trade, but what information can they be expected to know, assuming due diligence?

2) Why do GMs make trades that are clearly a rip-off, everyone else seems to know are a rip-off, and actually turn out to be a rip-off?

On the first question, further advancements in quantitative analyses mean that accurately modeling future performance is becoming more and more accurate. But, the nature of the game is such that they'll never be perfect, and even the best systems leave a large amount of error in their estimations. Obviously, GMs can't be held responsible for players who suffer freak career-altering/ending injuries (meteorites falling from the sky and hitting Albert Pujols in the left knee, etc.) But, I suppose we know a thing or two about which players are more likely to suffer a game-related injury. We have an idea of which players will break out based on their statistics and which are likely to suffer a quick decline. The biggest problem is that even the best predictions (indeed, anything in statistics) are just a matter of probabilities. Perhaps GMs might be considered little more than glorified professional gamblers who, instead of playing with poker chips or stocks, trade players? It's an intriguing question: What can we properly critique GMs for knowing/not knowing?

Consider: the average GM gets to make how many trades of relevance over the course of a year? Or even over five years? Two or three per year? Suppose that the GM has a true ability of getting 60% of his trades "right" and 40% "wrong." Over five years, he makes twelve trades. According to the olde binomial distribution, he's got a 33% chance of getting half or more of them wrong! The problem here, like a lot of problems in analyzing baseball, has to do with a too-small sample.

On the second question: May I introduce to you the concept of Prospect Theory by Amos Tversky and Daniel Kahneman. Consider the following scenarios:
You are arrive at the site of a massive natural disaster and are charged with evacuating 100 people from a remote village. You have a plane capable of bearing the weight of about 65 of the people safely, but if you put more people into the plane, the chances of it making the flight to safety drop. You could probably physically fit another 20 people onto the plane, but it means that the chances of making it back to safety are about 75%, with a 25% chances of crashing and everyone dying. There isn't enough time to do two runs with the plane and no backup is available.

Do you take a) the 100% chance of saving 65 people or b) the 75% chance of saving 80 people, but the 25% chance of saving no one?

Made your decision?
Another disaster strikes. You and that same plane go to another village with 100 people, and find a similar set of circumstances.

Do you allow a) a 100% chance that 35 people will die or b) take a 25% chance of having everyone die and a 75% chance of having 20 people die.

An interesting finding. First off, you might have figured out that the two scenarios are mathematically equal to each other (although the two conditions in each are not.) Oddly enough, the same people, when presented with similar scenarios often pick opposite choices. Kahneman and Tversky's explanation: humans are reward-seeking and risk-averse. This isn't a revelation. What makes it interesting is that you can get the effects simply by framing the information in a different way. In the first incarnation, the material is presented as saving lives (a reward). Many people pick the riskier option to try to save more people. However, when the material is presented in terms of people dying (a risk), people suddenly become conservative.

Your favorite GM, therefore, might pull the trigger on a bad trade thinking only of the potential rewards (perhaps egged on by his GM trading partner). This one happens more than you might think. To go over to the NFL Draft, how often do you see a team draft a player who doesn't really fill a need based on his "upside?" and how they'd hate to have passed on what could be a "very special player." (How many of you have seen friends do the same in fantasy drafts?) How many GMs have been seduced by the promise of a "young, live arm" or a "five-tool" player? How many of them would make the same decision if they were told, "He has an 85% chance of becoming a complete failure and a 15% chance of not failing?"

The other point to be made here is that the proper course of action in both disaster scenarios is the more conservative one. Consistently picking the safe bet of 65 lives saved over and over again will save more lives in the long-run. (Admit it, you were tempted to take you chances and save extra lives, even at the risk of peril to everyone). Human beings take plenty of stupid chances that rationally make no sense. If you've ever bought a lottery ticket, you are one of them. So, a GM who is thinking only of the "upside" and does not understand the basics of probability will, on average, be taken advantage of. It's as easy as exploiting simple weaknesses in human behavior.

While I'm here:

Tango Tiger's got his dollar values for roto leagues.

Monday, March 19, 2007

The Cleveland Indians and Pythagoras

In the post-mortems that followed the 2006 season, chief among the questions asked was how the Cleveland Indians, given the fact that they scored 88 more runs than they gave up, were still 6 games under .500 (78-84). According to their Pythagorean projection, such a team should have won 89 games, for a whopping discrepancy of 11 games. Were they unlucky? Poorly managed? Or is there a flaw in the Pythagorean system?

One of the most reliable predictions about humans is that they make all kinds of generalizations that they shouldn't from a small sample of observations. Eleven games seems like a big amount (especially to a town that already has a "We're cursed" mentality... and now speaking as an Indians fan, why did it have to be 11 games below what was expected?), but how big is it really?

For those unfamiliar with the Pythagorean methods of estimating, the idea started out as a derivation of the Pythagorean theorem so near and dear to the hearts of high school geometry students. The original formula was developed by Bill James (why do I always feel the need to bow whenever I write the man's name?) and looks like this:

Winning % = RS^2 / (RS^2 + RA^2)

Here, RS is runs scored and RA is runs allowed. At the time, it wasn't clear why exactly it worked, but the formula had an uncanny knack for accurately predicting a team's success. Further tinkering lead to adjusting the exponent downward to 1.82 (some say 1.81).

Coming along later, Clay Davenport over at Baseball Prospectus suggested that the exponent also vary with the parameters and devised the equation (1.5 log((RS + RA)/games) + 0.45 for the exponent, which is then placed back into the original formula. David Smyth, in a similar mode, set the equation for the exponent at ((RS+ RA)/games)^0.287). Davenport, by the way, has since endorsed Smyth's formula.

With the right data base, it's a simple matter to put each formula to the test. I took all available team-seasons with more than 100 games played (2370 seasons) and calculated the team's actual winning percentage and their projected winning percentages by all four models (Pythagorean, Exp 1.82, Davenport, and Smyth). I then took the difference between the actual result and each of the projections to get the residuals for each and looked at the properties of each.

Mean residual values:
The first measure of a good predictor is whether it has some sort of bias in its estimation. Ideally, residuals should be centered at zero. Exp 1.82, Davenport, and Smyth all check in at around .00029 and .00028 (roughly, .06 games per 162), with Smyth the winner by a hair. The Pythagorean had a MR of .00038. All four had a slight tendency to over-estimate the actual values. Given the small values, though, these biases are negligible.

Skew in residuals:
Residuals, ideally, should be normally distributed. For each of the four formulae, skew statistics again showed excellent fit to normality. Pythagorean (+.014) and Exp 1.82 (+.068) were both slightly positively skewed, as were Davenport (+.044) and Smyth (+.045).The standard criteria for violation of normality is 3.0. Also, the standard error for the skew statistic was .050, meaning that even the most skewed (Exp 1.82) was not significantly different from zero.

Standard deviation in residuals:
I often tell my students, "If mean, then standard deviation." Clearly, none of these formulae are perfect in their estimations, but is one more given to error than the other?

The results:
Pythagorean: .026866 (4.35 games per 162)
Exp 1.82: .026440
Davenport: .026095
Smyth: .026080 (4.22 games per 162)

No really clear winner here either, although again, the Smyth formula comes out ahead by a bit. It looks like the best of the formulas, although the differences among the four are small.

Now, the question of whether Mark Shapiro and Cleveland are snakebit: The Indians were predicted by the Exp 1.82 formula to have a winning percentage of .55311, while their actual winning percentage was .48148. The difference is .07163 (11.6 games), or 2.709 standard deviations away. According to the z-distribution, a difference of that magnitude (in either direction, either 11 games above what they are predicted or 11 below) would be expected about .67% of the time (roughly, 1 in 150 cases), and a difference of that magnitude in that direction about .34% of the time (roughly, 1 in 300 cases).

Given 30 MLB teams per year, we would expect that such a discrepancy would occur once every 5 years, and that it would happen in the Indians direction (winning less than expected) once every 10 years. To put it another way, over a ten year period, one team gets as un-lucky as the Indians and one team gets as lucky as the Indians were un-lucky. It's tempting to think that karma would allow the Indians to follow up last year's bad luck with a run of good luck, but karma has no basis in statistical theory.

Are the Indians (and the rest of Cleveland) cursed? If a curse is a series of low-probability events happening in sequence, then yes, the ghost of Rocky Colavito is still haunting Jacobs Field.

While I'm here:

The week in quotes from Baseball Prospectus.

Sunday, March 18, 2007

First Entry

Because what I really need in my life is yet another blog to write...

Welcome to my foray into the world of baseball blogging. After a few months of running my "other" blog, the Foreign Intelligence Files (and if there are any FIF fans clicking over, it will continue), I realized that there was no real outlet for my baseball obsession. So, against my better judgment, I decided to start a blog for that part of my life. It will alternate between a few themes:
  1. I am a practicing Sabermetrician, and I'll be posting a few of my findings here and there. The questions that most interest are ones that have a psychological bent to them. Why do players (and managers and GMs and owners and fans) make the decisions that they do?
  2. I am a graduate student of clinical psychology and I've toyed around with the idea of writing a book on explaining psychology through baseball. (Although it looks like someone beat me to it.)
  3. I also teach statistics and research methodology in the psychology department (and do some freelance statsticial consulting to a few psychological research projects). There are surely a few would-be Sabermetricians out there who need a bit of a refresher on some statistical concepts.
  4. I am a die-hard fan of the Cleveland Indians. I went to my first baseball game on June 7th, 1986. I am an unapaologetic Indians fan. I still count October 26th, 1997 as one of the saddest days of my life. For some reason, I'm on an exile in Wrigleyville. (I live on the North Side of Chicago.)
  5. I try to keep an irreverent streak about the whole thing.
In any case, please do leave a comment and let me know that you stopped by.