<
>

How MLB umpire grades really work, and what it means for the future of balls and strikes

AP Photo/Matt Slocum

ON THE MORNING after his called third strike that sent Philadelphia Phillies outfielder Kyle Schwarber into a fit of pique, Angel Hernandez, the umpire whose strike-zone judgment has inspired a million tweets, received an email. In it was a breakdown of his performance -- the same evaluation every umpire receives the day after he works the plate -- and on the pitch in question, the report was clear: Hernandez's call, in the eyes of Major League Baseball, was acceptable.

For anyone who saw last week's Sunday Night Baseball game on ESPN, or the clip of Schwarber slamming his bat and helmet before Hernandez ejected him, that assessment might be surprising. The on-screen circle that denotes whether a pitch at any point touched the strike zone was hollow, indicating it was a ball. How, then, could the pitch be an appropriate call in the eyes of Hernandez's bosses?

The answer is simple: The zone you see on TV is not the same as the one the league utilizes to grade home-plate umpire performance.

The criticism of umpires is a story as old as baseball, though never has it been as easily verifiable. First seen on ESPN broadcasts in 2001, visual representations of the strike zone are ubiquitous today -- on every major league broadcast, in the league's app and on its website -- and tracked by hobbyists and enthusiasts whose social media posts throw gasoline on every fan's fire that burns with apparent blown calls.

Now, with MLB considering using its automated ball-strike system (ABS) in some form or fashion at the big league level, moments such as the April 24 one that caused Schwarber to erupt receive warranted scrutiny. Schwarber's argument makes sense: Balls and strikes are supposed to be binary. If a pitch is in the zone, it's a strike; if it isn't, it's a ball. The league, in its appraisal of umpires, takes a different approach: It bakes in a margin of error on the edges of the zone when rating each game.

It's a system that, depending on one's perspective, affirms just how good umpires are -- or manipulates the data too much to hold accountable those who clearly are worse than their peers.

"I just want to reiterate how hard it is what our umpires do and how unbelievably good they are at what they do. I can't overstate that," said Michael Hill, the former Miami Marlins president of baseball operations who is now a senior vice president at MLB in charge of umpiring. "Coming where I came from, to be in the trenches with these guys and see what they do, I find myself marveling."


IN HIS DAY job, Dylan Yep combs through reams of data to do policy analysis for the district attorney's office in San Francisco. At night, he assumes an alter ego: @UmpireAuditor, a Twitter hero who seeks truth and justice in an entirely different realm -- the calls by men who adjudicate balls and strikes in baseball games.

Yep is not alone. There are other public watchdogs, such as UmpScorecards and UmpScores, and proprietary systems, such as TruMedia, that endeavor to do the same thing: scrape the publicly available data from MLB's pitch-tracking system and interpret it to explain whether our eyes -- or the rectangular box on screens meant to represent the strike zone -- are deceiving us.

"It speaks to the inflection point we're at," Yep said. "Umpires are receiving an enormous amount of critique. Every call -- and I'm part of this -- is getting amplified on social media. It's not fair to the umpires to be in this position where, one, they're being graded on this seemingly opaque system and two, being the subject of constant scrutiny when they're trying to do this impossible task."

Yep played baseball. He acknowledges the near-impossibility of home-plate perfection, the expected standard, when balls are traveling at incredible speeds with mind-bending spin through three-dimensional space. He feels for umpires -- even while he recognizes that his posts only serve to heighten the scrutiny in a case such as Hernandez's, in which the 98 mph fastball to Schwarber was one of 18 missed ball-strike calls, according to TruMedia.

In 2014, when Yep was teaching himself to code, he started tracking and publicizing the worst call of the day on a website. Over time, he learned that as compelling as that raw data is, seeing the missed call -- feeling it -- engaged people far more. The evolution of @UmpireAuditor from nerdy number feed to online arsonist was only natural.

Yep's account, combined with the public's growing adherence to the televised live zone, serve as proof for what has long been assumed by fans: Umpires consistently blow ball-strike calls. ESPN's K-Zone, a three-dimensional rendering of the strike zone that uses MLB's in-game data and sets the height of its zone based on an average of the audits of the previous five games, has evolved into a standard on all televised games. Any pitch outside of the zone that is called a strike is regarded as an abomination. Any pitch inside of it called a ball must be fraudulent.

Never mind that umpires today -- particularly younger ones reared on umpire-evaluation systems, which began with QuesTec in 2001 -- are better at judging balls and strikes than ever. Last week, Chris Segal called 123 of 124 pitches taken correctly, according to Yep's system. In another game, Quinn Wolcott went 120 for 121, with his one miss borderline. Excellence exists aplenty.

Games such as Hernandez's are far more the exception than the rule, and yet they are the ones that are remembered. Yep's system had him at 113 for 129, with 16 blown calls -- 11 being strikes that were outside the supposed rulebook zone. The zone, it's important to note, is ever-shifting, with six different definitions historically, the current one being: "the area over home plate from the midpoint between a batter's shoulders and the top of the uniform pants -- when the batter is in his stance and prepared to swing at a pitched ball -- and a point just below the kneecap."

That differs from what's called in reality, with a zone that tops out around the beltline and bottoms at the hollow of the knee. Yep has continued to wonder how his interpretation of the strike zone -- his is 19.92 inches wide, with 17 inches from the plate and 1.46 inches on either side, representing the approximate size of half a baseball, with the center of the ball used as the spot to measure where a pitch lands -- differed from MLB's.

These different realities inform the confusion between what is seen publicly and what MLB judges with its private Zone Enforcement system. The public zone is stringent. It operates in black and white. The plate is 17 inches wide. If a pitch doesn't hit the zone, it's a miss, plain and simple. And that binary nature has spoiled fans, who have grown so accustomed that games without it feel like they're missing something.

"I would definitely consider it an entertainment product," Yep said. "It's pretty good. But it's not rigorous analysis. It plays on people's emotions in the way that certain members of the public rely too heavily when looking at it for analysis."

The private zone is something different altogether.


WHEN ANGEL HERNANDEZ opened the email to see his score from the previous night, it was not the 86% that TruMedia graded him or the 88% of UmpireAuditor. His adjusted score was 96.12%.

"The whole story was blown out of proportion because some media source said he scored 85," said Joe West, the umpire who retired last year after a record 5,460 games. "He didn't score 85. He scored 96. If they're not using the same grading system that the umpires are being graded by, they can't question what he calls."

Here's how the league's system works, according to sources: MLB employs a team of auditors to assist in its review of each game. The auditors set a unique strike zone for each player based on his setup in the batter's box. The top of the strike zone is his beltline and the bottom the hollow of his back knee, both determined when he's loading and preparing to swing. The margin of error is implemented off the corners -- 2 inches on each side of the plate.

The rationalization for the margin of error, which was collectively bargained between the league and the umpires' union -- the MLB Umpires Association declined comment -- was ostensibly due to the limitations of previous tracking technology but also buys umpires leeway in their grading. MLB currently employs a camera-based system to track its games, and it provides a wide array of data now seen as standard across baseball. Pitch velocity and movement, batted-ball exit velocity and launch angle -- each is measured by the 12-camera Hawk-Eye system installed in all 30 major league stadiums. While Hawk-Eye's margin of error is measured to be .16 of an inch, previous systems' was greater, and the umpires negotiated a so-called buffer zone of 2 inches on either side. Even with Hawk-Eye, that remains in place.

Furthermore, the umpires' union created a Zone Enforcement committee to double-check incorrect calls and file appeals to have the scoring of pitches reviewed. In 2021, about 30% of the pitches appealed, including those in which a pitcher misses his spot but still lands a ball in the strike zone to a scrambling catcher, were overturned.

With those parameters in place, the league breaks down pitches into three categories: "correct" calls, "acceptable" calls within the so-called buffer zone and "incorrect" calls. By MLB's calculations, the league-wide average for umpires on correct and acceptable calls -- belt to knees, 21 inches across (the 17-inch-wide plate plus 2 inches either way) -- was 97.4% in the 2021 season. The highest-ranked umpire, according to MLB, graded out at 98.5%. The lowest: 96%.

That's the only measurement system that matters to umps, one current MLB umpire told ESPN. "We are told by MLB to ignore the third party strike zone evaluations and only be concerned with our ZE scores provided by the league," he said.

But those third-party evaluations often tell a very different story. UmpScores said the best home-plate umpire in baseball last year, Tripp Gibson, graded out around 93.6% accurate. Four umpires last season, according to UmpScores, missed on more than 10% of pitches they called for balls and strikes.

TruMedia, an analytics company that provides data to ESPN, has metrics that measure correct-call percentage and adjusted correct-call percentage. Leaguewide this season, according to TruMedia, MLB umpires have called 92% of pitches correctly. With its adjusted metric, which penalizes particularly egregious calls in a similar fashion to MLB's system, TruMedia bumps that number to 96.24% -- close to in line with the league's internal measurement.

And there is another reality about the difficulty of umpires' jobs beyond what any system can measure -- that it's more than the mechanics of calling balls and strikes. It's the grind that comes with it.

"People take for granted that these umpires are traveling every three days to a new city, go maybe 35, 40 days without an off-day," New York Yankees first baseman Anthony Rizzo said. "Our first homestand of the year -- our umpire crew went from Sunday Night Baseball to a day game in Baltimore. They had to drive. I think Chad Fairchild had the plate. And I, as a hitter, expect him to be 100 percent right every time."


WHEN LOOKING AT the Josh Hader fastball that Schwarber took for strike three, it's easy to understand how Hernandez missed it. Yes: It was outside. But barely -- about an inch off the rulebook zone and well within the buffer zone.

"Borderline," Hill said. "Best way to describe that pitch."

Still, it's not a particularly satisfactory outcome for Schwarber, not when TruMedia's predicted called-strike percentage, based on a database with ball-strike outcomes for hundreds of thousands of other pitches, pegged it at 40.6%. That was better than the strike calls on the previous two pitches Schwarber had seen, which were 35.8% and 28.5%, and much better than the egregious miss Hernandez had in the fifth inning, when he called a strike on an Eric Lauer slider to Jean Segura that was 6 inches inside. The predicted called-strike percentage on that pitch: 0.0%.

"Close pitches are close pitches," Rizzo said. "As a hitter, you're gonna get some. You're probably gonna get a lot. The blatant misses are where we get really frustrated. And I think umpires, when they run the pitch back, are mad at themselves. They don't want to miss calls."

Said Gerrit Cole, Rizzo's teammate: "Sometimes we could check ourselves with a little more compassion or respect. But in the moment, we're all pretty much on edge. For the most part, honestly, I think they do a good job."

In Hernandez's four games calling balls and strikes this season, according to TruMedia, his correct-call rate is 91.3%, 65th of the 84 umpires who have worked the plate. His adjusted correct-call percentage is only slightly higher -- 94.50%, 83rd of 84. Such numbers have only fed Hernandez's reputation as a poor ball-strike umpire, which, according to the data, does have some merit: Over the previous five seasons, his adjusted correct-call percentage is 96.21%, 80th of 115 home plate umpires.

Adjudicating balls and strikes in the moment is something players long have believed they can do better than umpires, though until this year, that has been pure supposition. Now, for the first time, minor league players are getting the opportunity to put their money where their mouth is. And the results are shockingly bad.

In Florida State League games this season, MLB is allowing players to challenge umpires' ball-strike calls by using the ABS system. Each team receives three challenges per game, and a hitter, catcher or pitcher can call for one. When a ball or strike is contested, the umpire turns toward the press box, where the ABS call is relayed. In an early sample, according to MLB, hitters challenged 82 calls and saw only 34 overturns -- a 41% success rate. Pitchers and catchers challenging haven't fared much better, with just a 44% of calls changed (32 of 72).

"Everyone acts like it's easy to tell the difference between balls and strikes," said Morgan Sword, MLB's executive vice president of baseball operations. "It is not."

Were the three-challenge rule to make its way to MLB -- and it's likelier to start there, rather than see ABS replace home-plate umpires full stop -- that still could lead to thousands of overturned balls and strikes each season. Of course, the same rigor MLB uses in assessing umpires currently could not apply in real time. The ABS zone, like the one seen on screens, would almost certainly come pre-set with no buffer zone.

Even then, barring a philosophical change by MLB on how to present the zone publicly, balls and strikes at the big league level will continue to be consumed with one zone for the fans and another for the umpires. There will be blowups by players, and there will be disillusioned fans for whom the rectangle on their screen remains sacrosanct, and there will be umpires who bear the dual brunt despite being graded on a different criteria, and the long history of baseball being a literal game of inches will live on.

"It's important we make it clear that we're not trying to get rid of umpires," Yep said. "They very much have a role. It's just that their role is changing as the game changes and technology changes. I see umpires calling balls and strikes as being bad for baseball. It's bad for competitive integrity, it's bad for viewers, it's bad for players."

ESPN's Jesse Rogers contributed to this story.