Well, sort of. At the very least, this is inspired by some of the discussion about Matt Cain that I read over the off season, including various interesting posts about possible ways to explain Cain's exceptional BABIP and HR/FB. Many comments about the great Matt Cain FIP debate argued that after more than 1000 innings pitched in the big leagues, Cain's ERA ought to be closer to his FIP if FIP really described his true ability.
I've never seen anyone test that argument quantitatively (though I have not read exhaustively, so I will not be too surprised if the first response to this is a link to somewhere it was done better), so I set out to do that. My method was fairly simple. I went to fangraphs, downloaded a table with stats for all pitchers since 1920 (I chose to only look at the live ball era, perhaps arbitrarily), and from that table, selected all pitchers with at least 1000 IP in their career, which came to a total of 877 pitchers. Then, I just took the difference between ERA and FIP for each of those pitchers, and found the cumulative distribution function of this difference.
I started with an eyeball test. I plotted the cumulative distribution, and then plot a fit to the distribution assuming it was normal. It looks pretty good to me, but you can judge for yourself. The fit parameters I found were a mean difference between FIP and ERA of -0.04 and a standard deviation of 0.24. There are about as many pitchers with career ERA - FIP that are one standard deviation below and above the mean as would be expected for a normal distribution (~1/6 above and 1/6 below).
Uploaded with ImageShack.us
To more formally test the null hypothesis that this distribution is normally distributed, I used the Anderson-Darling test, which is a special case of the Kolmogorov-Smirnov test. The Anderson-Darling test compares the test "statistic" to a table of significance values. If the test statistic is larger than the number associated with a given significance level, you can reject the null hypothesis of normality at that significance. The result I found was that the difference between FIP and ERA being normally distributed cannot be ruled out. (For those interested in the numbers, the test statistic was 0.39, and the test statistic was 0.57 for a p-value of 0.15. That p-value corresponds to a 15% likelihood of the normal distribution being able to produce the distribution of the difference between FIP and ERA being, and since the test statistic for the difference distribution is smaller, that means a normal distribution is even harder to rule out.)
It is very important to note that this does not say anything meaningful about whether there is a skill underlying differences between FIP and ERA. Just because the distribution is not inconsistent with the normal distribution does not mean that it has to be produced by random chance. It does mean though that in spite of having pitched 1000 innings, it is entirely possible that Matt Cain's career ERA is 0.39 below his FIP due to chance.
One possible improvement to this analysis is accounting for differences between innings pitched. The standard deviation of the difference between FIP and ERA does decrease as the IP threshold increases. Another possible improvement would be a more appropriate statistical test for this set of data. I happened to choose this one because I had used it before, but it may be that there are better choices.
Random notes: Among pitchers with at least 1000 IP, Ryan Franklin is the owner of the largest outperformance of FIP, with an ERA of 4.03 and a FIP 0f 4.78. Len Barker had the biggest difference in the other direction, with an ERA of 4.34 and a FIP of 3.4.