<
>

Fantasy baseball projections: How the sausage gets made

Figuring out which players are destined to outrace the field in 2021 was a struggle. Read on to find out how ESPN's projections get made. EPA

Projecting player performance for 2021 offers many unique challenges beyond those already associated with the process. Everyone is aware of the small sample variability generating several outlying seasons in both directions. Each year, there are surprise players on the various leaderboards after two months of play who often then end up being closer to the bottom of the pack than the top. Every season, some individuals struggle early before finding their footing and producing as usual.

The same was true in 2020, except the 60-game schedule didn't allow for the water to find its level. The problem is, all we have are those two months' worth of data to help baseline expectations for the impending campaign.

The length of the season isn't the only issue. Rule changes put into place in order to to facilitate play under pandemic conditions also influenced the numbers and need to be accounted for when comparing 2020 performance to other seasons.

What follows is a description of some of the nuances related to the 2020 season and the way they are being handled in ESPN's player projections.

How should 2020 be weighted?

Most formulaic projection systems start with a weighted average of past performance, with more recent seasons exerting a greater influence over the final results. The obvious problem for this season's work is that the 2020 schedule was just 60 games, a little over one-third of a standard campaign.

Keeping everything "status quo" would lessen the impact of 2020's performances -- but perhaps that might not be a bad thing. With so many extraneous factors influencing player performance, perhaps it would be best to soften last season's impact on 2021's expectations. This would have the effect of granting a "mulligan" of sorts to those whose performance was negatively affected by last year's unusual circumstances, but to do that would also undercut any improvement or decline of baseball skills.

One way to handle this problem is to weight skills differently. This could be a slippery slope, however, as the conventional means of applying stabilization factors is flawed. However, enough might be gleaned from the process of having certain targeted skills changes pulling more or less weight than others. For instance, a change in strikeout rate, for both hitters and pitchers, is more likely to be genuine than perhaps other measurable areas, so the 2020 level is weighted more than other metrics.

Back-testing to help determine the optimal weighting isn't going to be an option since this is the first instance of a season truncated in this manner, with all the ancillary factors. Yes, there have been strike-shortened seasons in the past, but the conditions in those cases weren't the same. Besides, the number of those such seasons is also very limited.

In the end, we opted to go essentially with an empirical weighting so that the full 60-game schedule last year counted the same as the full 2019 campaign, with 2018 and 2017 contributing at a diminishing level. The caveat is that the weighting was manually adjusted when deemed appropriate, including the alteration of certain particular skills on a global basis.

Prospects and young players with limited MLB experience

It may be flawed, but at least there is some data from 2020 to analyze for young players who appeared in the majors. Unfortunately there is nothing tangible for prospects. Normally, performance at Double-A and Triple-A is translated to a major league equivalency (MLE) and evaluated as if the player put up those numbers in MLB. It's not perfect, but it usually helps baseline younger players since the factors encompassing the MLE translations are based on the historical production of minor leaguers who end up playing in the majors.

With nothing new to evaluate, the MLE from 2017-2019 were simply carried over. This was completely subjective, but if there were reputable reports on players showing growth at the alternate training sites, we also factored those into the mix. That may not seem fair, since the publicly available information was mostly positive and it wasn't reported to the same extent across all 30 camps, but with so little overall information available, it doesn't make sense to ignore something actually on the record.

The geographical schedule

One of the underlying assumptions inherent to most projection systems is that the quality of opponents faced by each player is close enough so that adjusting for any differences doesn't significantly increase the accuracy of the results. For example, ignoring park factors, a batter hitting 32 homers for a team in the AL East counts the same as a hitter with the same output out of the NL West. Similarly, fanning 250 batters while toiling in the AL Central is just as dominant as doing it in any of the other divisions. That said, the proliferation of some of the newfangled metrics does require some quantification of the differences.

The reasoning for assuming equal quality of opponent is because over a 162-game season, everyone will face a wide assortment of competition, with the superior and inferior opponents generally balancing out by season's end. Sure, it doesn't always work out that way, so some analysts do normalize stat lines before incorporating them into the weighted average.

The system producing our ESPN projections typically accounts for these minor adjustments, but with 2020's geographical schedule, the impact of schedule strength was much deeper than ever before. With play concentrated within the East, Central and West zones, there were essentially three different leagues. As such, the differences in quality of competition were amplified, not lessened.

The catch is that simply lumping the stats from all 30 teams together and then ranking them isn't a proper reflection of quality. This is an extreme example, but if the best pitching was centralized in one zone with the poorest in another, a team posting a .330 weighted on base average (wOBA) in the region with the superior pitching would be better than a club registering the same mark against the inferior hurlers. However, if the wOBA from all 30 teams were ranked together, these two clubs would appear to be equal.

While this makes quantifying the strength of each of these three "leagues" difficult, it was observably apparent that the quality of both hitters and pitchers in the Central trailed that of both the East and West. A simpler way to say it is that if any player from the Central had faced more balanced opposition, his performance would have suffered. This is similar to the theory of an MLE. To best consider the performance of players in the Central, their numbers need to be normalized as if they came against league average competition. The process was tedious and involved a bit of speculation, but stat lines from the Central were massaged before being included in the overall weighted averages.

Ramifications of the Universal DH

The current projections assume that there will no designated hitters in the National League for 2021. If the league reverses course and ends up announcing a rule change to allow for a Universal DH, rest assured that the appropriate adjustments will be made.

Either way, National League pitchers must be altered as a result of what happened in 2020. Applying the same thought process as with geographical zones, last year's pitching numbers need to be adjusted to what they might have been if they had come routinely facing lineups with pitchers batting. Fortunately, there is a bevy of data from which to make these proper adjustments. Since National League pitchers hit before 2020, those seasons are good to go.

If it turns out that a Universal DH will indeed be deployed, using numbers directly from the 2020 season would be fine, as is. However, in that case, the stat lines from 2019 and before would then require attention. Those years would need to be "translated" as if NL pitchers had faced a daily designated hitter. In fact, this very alteration was done for 2020 projections, with many American League pitchers leapfrogging their National League counterparts.

National League batters are impacted too, since each team will lose around 300 plate appearances. Prior to 2020, pitchers comprised around 55% of the plate appearances for their spot in the order, usually coming from the No. 9 spot. This left 45% for pinch hitters and designated hitters who participated during those occasional road interleague tilts. This is admittedly speculation, but with starting pitchers throwing fewer innings and many clubs using openers and opting for bullpen games, there could be fewer total pitcher at-bats in 2021 than in the past. It's not likely to be a huge difference, but for the purpose of bookkeeping and projecting a logical level of playing time per team, 50% of the plate appearances for the nine-hole were assigned to position players.

Even so, batters such as Kyle Schwarber, Dominic Smith, Jesus Aguilar, Wilmer Flores and Colin Moran lost a decent chunk of projected playing time. Several catchers earmarked for some time at designated hitter also were affected, such as Austin Nola, J.T. Realmuto, Willson Contreras, Will Smith and Daulton Varsho.

Park factors

An in-depth look at the efficacy of two-month park factors is available as part of the 2021 ESPN Fantasy Baseball Draft Kit. To summarize, two months is not nearly long enough to trust 2020's Park Factors. As was discussed above with respect to quality of opposition, one of the principles of park factors is the quality of home and away competition balances. Even 162 games is not sufficient to eliminate all the bias, so it was decided to ignore 2020 and use the park indices from 2017-2019. The only exceptions include using subjective determinations for Globe Life Park (first year in existence), Marlins Park and Oracle Park (renovations) along with a trio of venues installing a humidor (Fenway Park, Citi Field and T-Mobile Park).

Pitcher workload

While some teams have shared plans with respect to how they'll handle innings, projecting starters is more art than science. Normally, innings-per-start is estimated based on history, then the number of starts is projected based on the current makeup of each team's roster. Multiply the two numbers together and voila you have an innings projection.

With the limited 2020 workload and the changing nature of just how staffs are handled, both elements of the traditional computation are currently unclear. The length of a typical start and the number of expected outings are both in jeopardy.

Some teams, like the Brewers and Mariners, have publicly discussed capping workloads to 100 more than last season's totals, with other clubs expected to follow suit. While that sounds like a tenable framework, will playoff teams count post-season innings in those decisions? Will some teams add extra frames to account for the work in summer camp? Plus, if there is a hard cap of say, 160 innings, what is the pathway to get there? Will team utilize more total starts with earlier hooks or fewer starts with longer efforts? This isn't as important for ratios and strikeouts as it is for wins, or even quality starts for leagues which use that category.

Other teams have either announced plans to use a six-man rotation (or it's obvious based on their depth chart that they will indeed do so). The catch here is that it is easy to plan on a six-man rotation in the spring, but an injury here and a slow start there and these squads may quickly be pressed into a return to a five-man rotation.

With logic and common sense as a guide, innings were allotted, with far fewer hurlers projected to eclipse certain milestones (180 innings, 200 innings, etc.). It's hard to envision the likes of Trevor Bauer, Gerrit Cole, Jacob deGrom or Shane Bieber missing too many turns. They encompass the very small group of hurlers expected to (barely) eclipse 200 IP. Even so, their current projections still fall a bit short of the 215-220 frames we'd expect under normal circumstances.

Saves

This is more about the manner in which teams are handling their bullpens than it is the repercussions of the 2020 season. Still, it's worth mentioning since understanding "the big picture" could influence your draft day planning. It may be just cyclical, but a lower percentage of wins are being saved than just a few years ago. Furthermore, saves are being distributed among more relievers.

Granted, part of this is economic as non-competing teams have learned it's not worth playing for a closer, allowing the competitive clubs to pay up for an "uber-bullpen." In any event, the reasons don't matter. Fewer total saves will be counted in most rotisserie league pools. In points leagues, points produced by closers are dropping to the extent that it may be worth carrying fewer closers and using more dominant set-up men, racking up more total innings and strikeouts while being penalized less for allowing walks, hits and runs.

Summary

Projections for the 2021 season will, essentially, look the same as always. What can't be seen is the wider range of plausible outcomes as a result of all the variance discussed above. In short, picks will be all over the place this season. Everyone will end up reflecting last season's numbers differently on their personal cheatsheets. Strict adherence to an ADP list is always shunned, but this season, the repercussions are even more damning. If you feel a player is worth the draft spot, don't play chicken with the room. Get your guy!

Good luck in what will no doubt be a very interesting fantasy baseball season!