Image credit: © Brett Davis-USA TODAY Sports
How much is a half of a mile per hour drop in velocity worth? I wondered that question this season as I watched Spencer Strider go from the most dominant pitcher in baseball with bases empty to roughly average once guys made it on. Normally I wouldn’t worry much about it, as those types of splits are incredibly noisy year to year, but for Strider it seemed to have a clearly defined cause. When runners get on base and Strider goes from the windup to the stretch, his velocity drops roughly a half mile per hour and hitters perform significantly better. Though it’s a small difference, changes in velocity don’t happen in isolation: fatigue, mechanics, and the state of the game itself prior to the pitch all play a role. Answering our initial question is going to require understanding not just the relationship between velocity and results but also between these other factors, and at the end we’ll have a better understanding of not only velocity but also exertion, fatigue, and the times through the order penalty.
Let’s begin by visualizing all of the relationships between velocity and results using a Directed Acyclic Graph, or DAG for short. A DAG is a tool data scientists use to sketch out relationships between variables, with arrows indicating the direction of the relationship. For a basic model of the impact of pitch velocity on pitch results, our DAG would look like this, with the arrow pointing toward Pitch Outcome from Pitch Velocity indicating a direct causal relationship between velocity and outcomes:
This path between velocity and outcomes should be intuitive to any baseball fan: The faster a pitch is thrown, the less time a batter has to react, and thus the worse their results on the pitch will be. Additionally, if a specific pitcher has lost some velocity on one of their specific pitches, there may be some expectation-based effect. Imagine a hitter had prepared for a pitcher who he believed would be throwing 97 mph, and all of a sudden the pitcher grooved a fastball at 95. If normally the batter would have struggled to be on time to 97, 95 could become an easy barrel. Alternatively, that little bit of extra time could allow them to better identify an offspeed pitch or even the expected location of the pitch.
So not only does velocity matter, but a change in velocity within a game may matter as well. Let’s explore these two relationships by making a table of the results of plate appearances that ended on a fastball. We’re going to use Statcast data for starting pitchers from 2020 through 2023, and we’ll distribute our bucket of fastballs into bins based on the pitcher’s average fastball velocity that day and the difference in velocity between the pitch and that pitcher’s average fastball velocity that day (from here on out we’ll refer to this difference in velocity as Delta Velocity). For each bucket we’ll calculate the average wOBA for plate appearances that ended on a pitch in that bucket. For those unfamiliar with wOBA, it’s a measure of the total offensive performance of a hitter at the plate with appropriate weighting for walks, singles, doubles, triples, and home runs. In our sample the average wOBA is 0.349, with values greater than that indicating better performance for the batter and less than that indicating better performance for the pitcher. The table below shows the results of our bucketing, with the rows being different splits of average fastball velocity and the columns being different splits of Delta Velocity.[1]
That’s a lot to take in, so let’s simplify it by taking the weighted average of the results for each Delta Velocity bucket and plotting it as a single line chart.
We see some huge differences here. To take one example, if we look at the 0 Delta Velocity column for the 97-mph row we see that against 97-mph fastballs batters average a 0.307 wOBA, which is the equivalent to hitting like Salvador Pérez. Looking next at the -2 Delta Velocity column for the 97-mph row we see that against 95-mph fastballs from pitchers who typically sit 97, batters average a 0.387 wOBA, which is equivalent to hitting like Freddie Freeman.
Our temptation at this point would be to say we answered our question: a half a mile per hour is worth roughly 0.050 points of wOBA, and this should explain Strider’s disparate outcomes with men on vs bases empty. Unfortunately, it’s not that simple. There’s more to the outcome of a pitch than velocity, and some factors that impact the outcome may impact velocity itself as well. If we want to isolate the direct impact of a change in velocity on pitch results, we’re going to need to control for those factors. Let’s go back to our DAG and add some detail. Below I’ve added boxes for pitch count in the game, pitch location, pitch shape, ball/strike count, the number of times the pitcher has faced the batter in that game (commonly referred to as Times Through the Order Penalty or TTOP), and batter quality.
To find the isolated effect of Delta Velocity on pitch outcomes we’ll go through these factors one by one, explain their relationships to both outcomes and Delta Velocity, and then control for these effects. To simplify things for now, we’ll ignore batter quality by assuming it is similar enough in each sample to not affect the results. Our final analysis will control for this, but during the exploratory phase this assumption should be appropriate.
The first boxes we’ll investigate are pitch location and pitch shape. For these I won’t walk through their direct impacts on pitch results, but I do want to note that all three factors of shape, location, and velocity can be thought of as acting together to create some composite quality of a pitch which has a combined impact on a pitch’s outcome. This composite quality of a pitch can be modeled using historical data of similar pitches and their results to give a Modeled Pitch Quality. You may be familiar with Stuff+ or PitchingBot, both of which can be found at FanGraphs. My personal Pitch Quality model works similar to theirs with a couple personal tweaks and some unique tuning, and that’s what we’ll be using for this analysis (see appendix). If the difference in pitch outcomes by Delta Velocity is driven by the impact of Delta Velocity on pitch quality, then we should see a similar relationship between Delta Velocity and Modeled Pitch Quality as we do between Delta Velocity and outcomes. Instead, the results are significantly muted.
Modeled Pitch Quality is impacted by a difference in velocity, but only to the tune of roughly 0.004 wOBA per one mile per hour of Delta Velocity. This is a far cry from the 0.050 wOBA difference we see in the actual pitch outcomes; thus, not what’s driving the difference.[2] Whether that’s because the model isn’t appropriately sensitive to these changes in velocity within a game or because our results are being driven by one of the other factors remains to be seen, so let’s continue our investigation.
Next we’ll look at both pitch count and the related times-through-the-order penalty. Their relationships to both pitch outcomes and velocity are well established in the baseball analytics literature: The longer a pitcher stays in a game the more his velocity drops (until the end of the outing, when he appears to empty the tank), and the more a batter sees a pitcher the better their results against that pitcher become. Since these two effects (velocity tied to pitch count and batter familiarity tied to times through the order) appear to be intermingled, and because pitch count naturally increases as times through the order increases, we can simplify things and explore both of their impacts simultaneously by filtering pitches based on TTO. Let’s return to our original chart and add a line for pitches thrown exclusively during the first time through the order:
And again for those thrown exclusively during the third time through:
These look almost identical to our original results, indicating that the relationship between Delta Velocity and outcomes is consistent regardless of how many pitches have been thrown or how many times the pitcher has faced the lineup that day. That doesn’t mean TTOP and pitch count don’t have an impact on a pitcher’s performance, just that those impacts appear independent of the impact due to a change in velocity.
This brings us to the final item in our DAG: the ball / strike count. Since we’ve been looking only at pitches that were the final pitch of an at bat, the count’s primary impact on outcomes is straightforward and obvious: A batter can neither walk nor strikeout in an 0 – 0 count, so the count in which a pitch is thrown limits the possible outcomes of that pitch. The count also has an impact on velocity, in that pitchers tend to crank up their velocity with two strikes and dial it down with three balls as they focus on striking out the batter or avoiding a walk, respectively. This effect of the count on velocity is significant, with pitchers on average gaining around a half of a mile per hour when going from a no strike to a two strike count.[3]
To control for the count while investigating the path between Delta Velocity and pitch outcomes, we have two options. The first option is to do what we did for TTOP and rebuild our chart while filtering for specific counts. Since the largest velocity differences happen in zero-strike, two-strike, and three-ball counts, we’ll use that as our filter. Here are the results when looking only at pitches thrown in with exactly one strike and fewer than three balls:
In this subsample, the Delta Velocity versus pitch outcomes relationship isn’t so clear. However, filtering this way cuts our sample of pitches significantly, which brings us to our second option for removing the impact of the count. As mentioned, up to now we’ve only been using pitches that were the final pitch of the at bat and looking at the raw outcomes of those pitches. This time we’ll look at all pitches, and instead of using raw outcomes of each pitch we’ll calculate the outcome relative to the expected outcome of all pitches in the given count.[4] Doing so yields the following chart:
These results are even clearer than those in our filtered data above: Almost all of the observed relationship between Delta Velocity and pitch outcomes appears due to the count in which the pitch was thrown. Controlling for that suggests it’s not necessarily that pitchers are getting better results when they turn up the heat; it’s that they turn up the heat in situations where they’re already in a position to succeed, specifically with two strikes against the hitter.
We’ve traveled a decent way on our journey to answering our question, so before we take our final step let’s pause for a recap of where we’ve been. We found:
- A change in velocity relative to a pitch’s average velocity that day (Delta Velocity) is strongly correlated with the outcome of the pitch
- The differences in outcomes across values of Delta Velocity are greater than what we would expect based on the Modeled Pitch Quality of those pitches, which captures a pitch’s shape, release point, movement, and location.
- Delta Velocity is related to pitch count and thus to TTOP, which is also correlated with results. However, the relationship between Delta Velocity and pitch outcomes remains if we control for TTOP.
- Delta Velocity is related to the ball / strike count in which the pitch was thrown, which is also correlated with results. The relationship between Delta Velocity and pitch outcomes goes away when controlling for the ball / strike count.
So far we’ve just been filtering and binning our data in simple ways, trying to tease out each of the individual relationships by cutting and slicing. The problem with this is that it’s difficult to do so in a way that controls for each of the interconnected relationships, rather than just one at a time. We could create tiny slivers of data for each combination of ball/strike count, TTO, average velocity, and delta velocity, but at that point the sample sizes in each slice are too small to reach any reasonable conclusions. We need a way to traverse the varied landscape of the interactions in a structured way that maps the path formed by each connection independent of its connection to the other points. To do that, we’ll turn to a mixed-effects model.
You’re likely familiar with a traditional linear model, in which one evaluates how much, on average, one variable changes in response to a change in the other variable. These interactions are called “fixed effects” because the effect of one on the other follows a fixed and discernable pattern, and they are all that are included in a linear model. A mixed–effects model goes one step further by allowing you to define “random effects” which describe sources of error—or differences between the actual and predicted response of each variable—in the fixed effect model. Fancy math is applied to determine how much each random effect is contributing to the error in the model, and it quantifies both the sizes of these contributions and the uncertainty of its estimate of them. For the question of what contributes to pitch results, we can think of Modeled Pitch Quality and Delta Velocity as fixed effects. Modeled Pitch Quality should closely follow Pitch Results, as that is what it was trained to do. Delta Velocity, based on our understanding of its relationship to Pitch Results, should also follow a steady relationship of increasing effect on result with increasing value of Delta Velocity.
For our random effects we’ll include the times through the order for the at bat in which the pitch was thrown, and the skill of the batter at the plate. Before, we had assumed batter quality was consistent enough across samples to not affect the results. However, this assumption won’t hold when looking at multiple times through the order, as the final time through a lineup a pitcher may only face the top few hitters. Additionally, because mixed-effects models make it so easy to control for this as a noise factor, there’s no reason not to include it here for robustness.
Looking at our DAG, all that’s left to account for are Pitch Count and Count. Because the path of Pitch Count passes through Modeled Pitch Quality and TTOP, both of which are included in the model, we don’t need to include it here. To control for the impact of the count on our results, we’ll again use the outcome of the pitch relative to the expected outcome of all pitches in the given count.[5]
Enough background; let’s get back to results. As mentioned previously, mixed-effect models not only provide an estimate of the effects, but also of the uncertainty of the estimate. We’ll start with our random effects to see how the model expects Batter Quality and TTOP to impact Pitch Results. To view the effect estimates we’ll plot them together, with the most likely value of the effect shown as a point and the uncertainty range of the estimate shown as lines extending from each point. Positive point values indicate an impact on Pitch Results that is good for the batter and thus bad for the pitcher. Longer lines indicate more uncertainty in the estimate.
Starting on the left, the random effect estimates for batter quality were in line with our expectations: Pitch outcomes against a bottom tier batter are, on average, around 0.030 points of wOBA worse than outcomes against a top tier batter.[6] For TTOP, we see the model found a 0.002 wOBA effect in favor of the pitcher the first time through the order, no significant effect the second time, a 0.0015 wOBA effect in favor of the batter the third time through, and no significant effect the fourth or fifth time. That’s similar to the difference in facing J.P. Crawford rather than Austin Riley the first time through vs. the fourth time through, which is interestingly enough a happy medium between the impact of TTOP found in the studies linked above.[7]
Finally, we’ll take a look at our fixed effects. For these plots the location of the points indicate the slope of the line between that fixed effect and the value of the Pitch Result. A slope of 1 would indicate that a 1-point increase in fixed effect results in a -point increase in the pitch outcome in wOBA. A slope of 2 indicates that the same 1-point increase in the fixed effect results in a 2-point increase in the pitch outcome. We’ll again add vertical lines extending from each point to indicate the uncertainty in our estimates of the slopes (although in this case the ranges were so small you can hardly see the vertical lines).
Our model found that Modeled Pitch Quality had a strong and highly confident relationship with Pitch Results, estimating a roughly 1-to-1 relationship between Modeled Pitch Quality and Pitch Outcome[8] This is what we hoped and expected, and it sets the baseline to determine the impacts of the other variables. Delta Velocity, on the other hand, demonstrated an extremely small relationship to results, with an estimated effect size of roughly a half point of wOBA for every one-mile per hour difference in velocity. The model was very confident in this relationship as well, indicating that after accounting for Pitch Quality, TTOP, count, and batter quality there is likely very little direct impact on the result of a pitch and the difference in velocity of the pitch relative to the pitch’s average velocity in that game. The impact of Delta Velocity on Pitch Results is instead primarily driven by the relationship between Delta Velocity and Pitch Quality, which we established to be roughly 0.004 points of wOBA for every 1-MPH difference in velocity on average. Losing a tick won’t turn Dansby Swanson into Freddie Freeman, but it might make him hit like Ha-Seong Kim.
And so we are finally able to return to our original question: how much is a half of a mile per hour drop in velocity worth? The short answer is that it’s complicated, but after systematically walking through all of its relationships that link to pitch outcomes, I’ve reached the following tentatively held conclusions:
- The largest contributing factor to pitch outcomes is the quality of the pitch based on its velocity, shape, and location (Pitch Quality).
- A change in pitch velocity of 1 mph is associated with a roughly 0.004 points of wOBA difference in Pitch Quality on average, though this will vary depending on the pitcher. Note that I say “associated with” here rather than “causes,” as a change in velocity is often accompanied by a change in pitch shape and location quality, and we did not attempt to tease out how much each of these things are contributing to the change in overall Pitch Quality.
- In addition to the effect on Pitch Quality, a change in pitch velocity of 1 mph relative to that pitch’s average velocity that day is associated with a roughly 0.0006 point wOBA difference in pitch outcomes.
Ultimately, if you want to know how a drop in velocity impacts a pitcher, look at the Modeled Pitch Quality of that pitch. As for Strider and his half a mile per hour difference in velocity specifically? His drop in velocity with men on base hasn’t been accompanied by a definite drop in Modeled Pitch Quality, nor a change in pitch usage. I can’t promise I won’t stop searching for some explanation of his varied results with men on base, but for now I’m willing to chalk the bulk of it up to our old friend Random Variation. But who knows, maybe we’ll revisit this in 2024 with another full season of data under our belt.
[1] Note that while we’re using just fastballs here, a check for other pitch types revealed similar relationships. Because of that we’ll stick to just fastballs to make things easier to digest.
[2] The table above is Overall Pitch Quality, but the pattern holds when looking specifically at Stuff Quality and Location Quality.
[3] It’s best to calculate the difference in velocity relative to the previous pitch rather than to the average velocity of that pitch that day, as the arrow between count and velocity runs both ways: if a pitcher is down in velocity that day then they may find themselves in worse counts and vice versa.
[4] These expected outcomes can be determined by observing the outcome of all pitches thrown in a given count and averaging the value of these outcomes together. Calculating the outcome relative to this then becomes as simple as subtracting that average value for that particular count from the value of the result of that particular pitch.
[5] While we could instead choose to include count as an effect in the model and look only at pitches that were the final pitch of an at-bat, this method should better isolate the effects of the count and will allow us to use all pitches thrown during the seasons in question, as mentioned above.
[6] One could likely quibble with using this as a random effect. However, using it as a fixed effect yielded similar results, as did using more granular demarcations of batter quality. In a future study I’ll spend more time getting a precise estimate for this particular effect.
[7] Overall, these results suggest some synergy between the TTOP studies linked above. A key piece may be our use of Modeled Pitch Quality here: Since some of the previous TTOP studies have not had such a direct way to capture the impact of Pitch Count and through it Pitch Quality on the results of a Pitch, and because the impact of Pitch Count expressed through Pitch Quality is so much larger than the residual impact due to TTOP alone, then that may have led to the conclusion that the TTOP was no more than a byproduct of pitcher exertion changing throughout the game. However, one should note that while historically this residual effect has been attributed to batter familiarity, I’d be cautious in holding too strongly to that given our results. Namely, if familiarity were the primary factor one would expect to see a similar effect of going from the third time to the fourth time through the order as we see when going from the first to the second or second to third. Again, the uncertainty here is large, but because the estimated effect reverses itself the fourth time through it could be more evidence of pitchers emptying the tank, and thus there’s some amount of Pitch Quality still being missed by the models. Ultimately, a deep dive into these results are beyond the scope of this article and may require its own future study.
[8] Note that these are both on the wOBA scale, so an increase in Modeled Pitch Quality is good for the batter. I apologize if this sign convention causes any confusion.
APPENDIX
Modeled Pitch Quality validation:
Additional Modeled Pitch Quality details:
- CatBoost is used for all the modeling (previously used xgBoost, but found it overfit)
- Pitches are divided into three groups (Fast, Offspeed, Bendy) and each combination of group and handedness gets its own set of models
- Run Value is not modeled directly, but built from the ground up:
- One model for the probability a batter will swing at the pitch
- One for the probability of a whiff, foul, or in play if swung at
- One for probability of strike, ball, or HBP is taken
- One for xwOBACON if put into play
- Thus Value if swung at:
- Value of Strike * Probability of Whiff + V(Foul) * P(Foul) + V(In Play) * P(In Play)
- Value if taken:
- V(Strike) * P(Called Strike) + V(Ball) * P(Called Ball) + V(HBP) * P(HBP)
- Overall Value:
- V(Swing) * P(Swing) + V(Take) * [1-P(Swing)]
- Model inputs include the following:
- For all models:
- Pitch location
- Velocity
- Horizontal and Vertical Movement
- Release position and extension
- Axis difference and spin efficiency (SSW)
- Pitcher height
- Batter handedness
- Count
- For non-fastballs:
- Also include Velocity and movement difference relative to their fastball
- All fastballs that weren’t a pitcher’s primary fastball that day get classified and modeled as changeups
- For all models:
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.