I've been relatively quiet around here since the season ended, but i've been working on something in my spare time that is the product of my own curiosity. I like quantifying things so that I can make comparative assessments - and I think most baseball fans do too.
So when I hear player X called "consistent" I want to have a way to measure that. I want to be able to say "player X is more consistent than player Y" and actually be able to back that up.
So I decided to try to create a stat that did this. I've done so and i've (manually, uugh) calculated it for every Giants hitter with greater than 100 PA's in 2008. I think that it inherently makes sense, but there are some places where I made little leaps of faith and I would like to get feedback from the more statistically inclilned readership as to whether my approach is valid, and any suggestions on how to make it more deterministic and more cross-comperable.
So here is what I did:
Take Player X who played in 150 games last season. I've taken his game log and (manually, uugh) parsed it out to calculate wOBA for each game. I used wOBA because it is my favorite offensive all-in-one right now, and I think it serves the purpose well.
The equation I used for wOBA is from the original source @ http://www.insidethebook.com/woba.shtml
wOBA = (0.72xNIBB + 0.75xHBP + 0.90x1B + 0.92xRBOE + 1.24x2B + 1.56x3B + 1.95xHR) / PA
Using this my wOBA's are fairly close to those @ Statcorner. I'd like to use a version that includes SB/CS, but I haven't got around to figuring that out yet.
Having wOBA for each individual game performance, I then took the standard deviation of this group for the entire season. This tells me, essentially, how clustered the day to day performance of the player is around the season end result. Theoretically, consistent players should be more clustered around their season wOBA while inconsistent players should show more deviation.
That's the thought, anyway.
So this gives me a number that looks like other baseball numbers we are comfortable with, .xxx generally in the .200-.400 range.
The problem with this number alone is that it is dependent on overall wOBA. A player with a std. deviation of .300 and a wOBA of .300 is much less consistent than one with a std. deviation of .300 and a wOBA of .400. If I want to be able to compare players, (even against themselves in other seasons) I need to fix this.
So this is where I took a leap of faith. I tried to establish an arbitrary modifier that allows me to covert .xxx into a 0-1 number, 0 being entirely inconsistent and 1 being perfectly consistent (every game wOBA = season wOBA). No longer dependent on season wOBA. Essentially express it as a percentage.
The arbitrary part was deciding to define maximum inconsistency as the standard deviation of Season wOBA * 2 and 0.
Thought being in the simple set of two hypothetical games, an excellent performance in one (season wOBA * 2) and a miserable performance in the other (wOBA of .000). The mean of these two hypothetical games gives the season wOBA. This is as bi-polar as a player can be.
Using that to establish the hypothetical "Maximum Inconsistency" I then am able to look at the players actual incosistency as a percentage of this. To make higher #'s better, I flipped it around and took 1 - (Actual Inconsistency/Maximum Inconsistency)
A score of 0 means that Actual Inconsistency = Maixmum Inconsistency, whereas a score of 1 means that Actual Inconsistency = .000.
The part where this gets a little questionable however is that a player can actually be more inconsistent than that. Using this approach, one of our beloved Giants came up negative! (Turns out Ivan Ochoa is not a good baseball player).
But there is no actual limit to how inconsistent a player can be. While 2 * wOBA in one game and .000 in another is very incosistent, 3 * wOBA in one and .000 in 2 is even less consistent.
I guess the hypothetical limit in a given season is 162 * wOBA in one game and .000 in 161. Problem with that is the std. deviations get huge to the point of dominating the data.
So that is my biggest question : Is my approach for arbitrarily defining "Maximum Inconsistency" valid? It should not be dependent on players or on their wOBA (relative to each other) as the simple set is constant.
Anyway, that got kinda complicated and wordy. Using the approach defined above, here is how our Giants did last season:
Player : Consistency Rating
Winn: .52 Rowand: .48 Lewis: .44 Ishikawa: .44 Sandoval: .41 Durham: .40 Molina: .39 Castillo: .36 Burriss: .35 Aurilia: .32 Velez: .32 Bowker: .25 Roberts: .24 Vizquel: .22 Ochoa: -.08
EDIT: New list weighted by PA's. Not 100% confident in the weighting scheme yet but it does make **some** sense.
Keep in mind that "consistent" can just as easily mean consistently BAD as consistenly good. I find this list interesting because it arguably reads in order as who are perceived as the "best" players on the team.
Part II of this will consist of a similar approach + suggested improvements for pitchers, using tRA.
I personally think that will be more interesting. The natural deviation should be a lot lower (pitchers have a higher success rate).
Oh yeah - and: This really says nothing about whether or not consistency is important or not.
My ultimate goal would be to try to use this (if it does turn out to be in any way valid) to try and look for trends in larger populations. After all, baseball statistics are primarily about predicting the future ;)
Lastly - I need to automate this. Can anyone point me @ gamelogs in .csv form? I'd love to be able to DL it for all players and then be able to look at some sorta league averages, etc.
I can use matlab for that - but I'm not going to manually copy the '08 game logs off the web for every player in MLB ;)
And then there is '07, etc...