Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Full Coverage of 2012 Coke 600

matrices/markov chains and statistics

I'm working on a project for school about how matrices and markov chains are used in baseball statistics. Most of what I've found has been very vague (and don't show any specific calculations). I just thought I'd see if any of the more stat savvy posters knew of any helpful articles on the subject.

about 1 year ago Giants_bullpen2_tiny raisingcain 2 comments 0 recs  | 

Story-email Email Printer Print

Comments

Display:

I’ve looked into Markovs a bit but haven’t sat down to study the mathematics behind them. Mark Pankin has done a lot of work with them and John Beamer (of THT) had an article in the 2008 THT Annual, I believe, that included an extensive Markov (22 MB Excel spreadsheet). I use that a lot for theoretical lineup optimizations and stuff like that.

Tango has a basic online Markov that should be a little helpful. If you go to “view page source,” Tango has left the code there for all to see.

Triples Alley: Analysis of the San Francisco Giants, Baseball, and Sabermetrics.

by JT Jordan on Mar 16, 2011 10:17 PM PDT reply actions  

I'll take a crack at this

To the best of my understanding, a Markov Chain is a type of Monte Carlo simulation, which means it uses a (pseudo) random number distribution to simulate stochastic (meaning, non-deterministic) processes. The Markov distinction is that it requires a discrete chain of outcomes.

Baseball example:
A given hitter has a 50% chance of putting the ball in play, 20% chance taking a walk 30% chance of striking out. (3 discrete states)

Randomly generate a number, n, between 0 and 1: if n<.5, in play, .5<n<.7, walk, n>.7 K.

For the sake of this example our hitter will put the ball in play. For the super simple case we’ll just use his BABIP and do the same random number generating method above. You could also simulate to what field the ball is most likely to go to, but because the places a ball can land is continuous/non-discrete, I’m not sure you’d want to do that.

The result is either a base runner or an out.

Once the outcome of this hitter’s at bat is decided you can store this current state in a matrix and simulate what happens with the next batter. You now have to account for the batter and base runner (Pickoff’s, DP, steals, etc.). The matrix is just your bookkeeping.

You would repeat process for thousands or hundreds of thousands of games and see what the most likely outcomes are. You’ll want to store you’re final results in a separate matrix and then plot them as a histogram to figure out the frequency of certain events.

And boom goes the dynamite.

by TwoBagger on Mar 17, 2011 8:33 AM PDT reply actions  

Comments For This Post Are Closed


User Tools

Welcome to the SB Nation blog about San Francisco Giants.
Yahoo_full_count

Manager

174246766_ea2fd78204_small Grant Brisbee

Moderators

Sbzito_small Natto

Fawlty_small WalrusMan

Goofus_small Goofus

Howtheyscoredcat_small howtheyscored

Det_7193_small jponry

Authors

09_small JT Jordan

Small steve S

E6dmccicon_small Every6thDay