"Torture numbers, and they'll confess to anything."
A perfect quote from the great Gregg Easterbrook, and one that is relevant now more than ever.
NFL teams have entire departments dedicated to analytics. ESPN Stats & Info is constantly delivering new, innovative data. Pro Football Focus is charting every aspect of every player on every play. The NFL is using chip technology to track each player's every move.
It's a new, exciting age for statistics, but also one that is quite convoluted. We're constantly delivering and absorbing data, but often we're unable to correctly process what is and isn't relevant to a player's prospects. We have snaps, pass routes, yards per carry, completion percentage, missed games because of injury, off-target rates, average depth of target, etc. The list goes on and on.
The challenge for NFL teams and analysts is to determine which statistics are important and which are just noise.
One of these statistics is yards after contact (YAC). This is a stat I refer to often in order to help paint a picture as to a running back's ability. Of course, the most popular stat used to evaluate backs is yards per carry (YPC), which is very hard to predict (more on this later). As we'll also learn, a player's historical YAC is more predictable and there is plenty to be learned about those who very well (and poorly) in these categories.
How predictive are YPC and YAC?
In a nutshell, not very. At least from a quantitative, statistical, projection standpoint.
My sample here is the last seven seasons, which works out to just under 86,000 carries by tailbacks. In order to determine how well YPC and YAC project over a reasonable amount of time, I split each back's carries into groups of 200. This way, we can determine how well a sample of 200 carries projects the next 200. I came up with 199 instances that fit the bill during our window.
Analysis of the data shows an r-squared of 0.10, which means that 10 percent of a back's YPC can be determined by his YPC during his previous 200 carries. That's obviously not very useful and shows the random nature of per-rush production. There is a lot more that goes into YPC, including role, blocking help, scheme, defenders in the box, game situation, etc.
If we run the same test on YAC, we learn that 19 percent of a player's YAC can be determined by his mark in the category during his previous 200 attempts. Although this still isn't much to get excited about, YAC is nearly twice as predictable as YPC.
Activity at the Extremes
The main thing that jumped out at me during my research on this topic was how players at the extremes of these categories happened to, for better or worse, sustain that level of efficiency.
Our earlier YPC analysis included 199 players. If I remove the middle 80 percent of the sample, I'm left with the 20 highest and 20 lowest YPC marks during a 200-carry stretch. Analysis of this sample shows a 0.31 r-squared. Of the top 20 backs, 13 produced an above-average YPC over their next 200 attempts. Six more were within two-tenths of a yard of league average, leaving only one that could be categorized as poor. Of the 20 lowest marks, a whopping 15 came in below average, most of which were very poor. Only four of the players went on to eclipse 4.4 YPC.
I obviously ran this same test on YAC and was rewarded with even better results. Using our sample of 40 (20 best, 20 worst), I was left with an r-squared of 0.45. Of the 20 best marks, 15 were above average during the following sample of 200 carries. Of the 20 lowest YAC marks, 17 were below average and only two could be considered "good." I also tried cutting it down further by removing the middle 90 percent. That left me with 11 on each end and an r-squared of 0.67. Not one player in the top 11 posted a below-average mark over his next 200 carries. Only one of the bottom 11 went on to post an above-average mark.
I should add here that I did tweak this study to compare samples of 500 carries, but was left with a sample of only 32 instances to work with. That said, I won't dive into the data. I will say, however, that the results were a bit more predictive than our 200-carry samples and a look at the players at the polar extremes reflected the same results we saw earlier.
What can we learn from a player's rookie season?
Although this article idea was on my to-do list for a while, I'll admit that it became a priority as a result of my work (and the constructive responses) on Jeremy Langford.
Langford, of course, was in a league of his own in terms of rookie-season struggles. His YPC and YAC production were poor. He failed to elude would-be tacklers. His catch and drop rate numbers were awful, and he struggled as a blocker. At the very minimum, Langford makes for a compelling case study.
Although the aforementioned data shows that we can't yet accurately project a player's efficiency stats, the narrative certainly changed when we took a more subjective look at players who fared very well or poorly in the categories.
Shown here are the 30 running backs who handled at least 100 carries as a rookie and at least 75 carries after their first year since 2009, including their YAC per attempt as a rookie, and then for their career thereafter. Honing in on YAC, the r-squared here is only 0.20, but the chart shows there's still something to be learned.
Of the first 16 backs on the list, only two posted a poor YAC after their rookie season. Additionally, the three best post-rookie-season YAC numbers belong to three of the top 13 backs. Of the bottom 14 backs, 10 showed up as below average in the category on post-rookie carries.
If we split the chart down the middle, the top 15 backs averaged a 2.1 YAC as rookies and 1.9 on future carries. The bottom 15 averaged a 1.6 YAC as rookies and 1.6 after. Considering that league-wide YAC sits at 1.8, we see some regression to the mean for strong producers, but consistent poor production from those low on the list.
The likes of Bernard Pierce, Mark Ingram and Le'Veon Bell are clear outliers, but there are clear signs that, at the very minimum, we can draw some conclusions from rookie-season YAC.
Applying the data
This all brings us to the intriguing 2015 rookie class.
If we were to add to our earlier chart each rookie who carried the ball at least 100 times last season, four would land on the polar ends of our spectrum. Langford (1.1 YAC) would be dead last, and Matt Jones (1.4) would be fifth from the basement. On the other end, Thomas Rawls (2.9 YAC) and David Johnson (2.2) would rank second and fifth, respectively.
Obvious sample-size concerns aside, this is pretty damning evidence against Langford and, to a slightly lesser extent, Jones. Ronnie Hillman put together a top-20 fantasy campaign last season, but can be considered fortunate to have generated so much volume. Jahvid Best's career was cut short because of injury, but his rookie-season efficiency struggles caught up to him before he moved on from the league. The jury remains out on Isaiah Crowell, but he, Alfred Blue, Tre Mason, Trent Richardson and Daniel Thomas have certainly failed to emerge as productive NFL backs.
On the other hand, this paints a promising picture for Rawls and Johnson. LeGarrette Blount's fantasy upside has been limited by a near-complete lack of work as a receiver, but it's hard to argue with his rushing efficiency. DeMarco Murray, Chris Ivory and Doug Martin each posted an RB1 campaign over the past two years. Shonn Greene and Jeremy Hill are similarly built backs who saw a dip in production, but still maintained fantasy value while managing near-or-above average YAC production. And it's fair to say the jury remains out on Hill as he enters his third year. I'd be remiss if I didn't mention that Jerick McKinnon, who just missed our carry cutoff, posted a 2.2 rookie-season YAC. Keep him stashed in dynasty.
As for other 2015 rookies, Todd Gurley's 2.1 YAC ranked ninth out of our sample of 40 backs. T.J. Yeldon, Buck Allen, Ameer Abdullah and Melvin Gordon all came in at 1.7, which is just below average. Duke Johnson ranked 30th (1.6).
The Findings
Although projection nerds like myself can't use this data on its own to accurately project future per-carry rushing production across the league, there are indications that rookie-season data does, in fact, correlate with a back's long-term prospects.
Additionally, players who post outstanding (or dreadful) YPC and/or YAC rates tend to sustain that production over the long term. This study should make you wary of Langford and Jones and more confident in Johnson and Rawls. Of course, this group will make for a nice case study, as we watch and learn from their sophomore campaigns.