The interest in this research was fueled by the presentation of Micah Blake McCurdy at SEAHAC2019, where he described evaluation of quality of goaltenders as the evaluation of the quality of the shots saved.
The author of this article is not a mathematician or a statistician, so possibly the below is an utter absurd, or heresy, however it passed a few sanity checks the author was able to establish, and therefore he likes it.
1. Why Elo?
Shot (or shot attempt, but like Micah, I will use the term shot for all three locationtracked events: GOAL, SHOT and MISS) is an essential zerosum outcome: either the goal is scored (1) or it is not (0). That always invokes the question  can we use the Elo ratings to evaluate the expectation of a shot being stopped based on the difficulty of the shot, the shooter and the goaltender.
Thus, given the probability of the shot $P_s$ and the outcome of the shot $X (01)$, we can adjust the ratings of the shooter and the goalie:
$R_s=R_s_0 + (XP_s)$ (1)
$R_g=R_s_0 + (P_sX)$ (2)
For example, if a shot had a probability 0.06 of going in, and still the goal was scored, the rating of the shooter will increase by 0.94 and the rating of the goalie will decrease by 0.94.
But you may ask, how in the formulas above do ratings affect the probability of the shot going in?
2. The factors of the shot.
If we take the shots, for which coordinates are defined, i.e. from the 2010/11 season through 2018/19, we can observe that the average scoring rate is about 0.06% (remember, misses are included).
However as also described by Micah, not all shots are created equal. Therefore we dissect them by the following factors:
 Location on the rink
 Season (due to equipment changes)
 Stage (due to playing style changes)
 Shooter (Here comes Rs)
 Shooting team (Team attacking style)
 Goalie (Here comes Rg)
 Goalie team (Team defending style)
 Side (generalized code, see below)
 Shot type
 OnIce strength
 Score differential
 Rebound length
 Is it a rush?
 Is it a giveaway in Def. zone?
 Is it a takeaway in Off. zone?
Location:
For the sake of simplicity we divide the rink into the following segments:
 All of the rink outside the offensive zone
 All of the rink below the goal line of the offensive zone
 The remainder of the offensive zone split into squares of three to eight feet. Some tests run provided that the optimal size of such a quadrant is five feet.
Season:
The annual changes in the rules of the NHL dictate that a given season may provide different average shot quality than other ones.
Stage:
The famous statement of playoffs being won through defense implies the chances to score are smaller in the postseason.
Shooter:
We use current $R_s$ from (1) for the shooter.
Shooting team:
The team's style of play may rely more on the quantity of shots or the quality, thus affecting the overall chances of a given shot going in.
Goalie:
We use current $R_g$ from (2) for the goalie
Goalie team:
The defending style and capability are not dissimilar to the ones of the shooting team.
Side:
We try to estimate the effect of the shooting side of the skater and the catching side of the goalie together with relative location of the shot:
 Shot location:
 C  center, in front of the goal
 R  to the right of the goal, attacker wise
 L  to the left of the goal, attacker wise
 Shooter's side  RL
 Goalie's side  RL
Shot type:
We use all available shot types:
 Slap
 Snap
 Wrist
 Backhand
 Wraparound
 Tipin
 Deflection
Strength:
Onice strength, from 33 to 55. Penalty shots are discarded. A future development may introduce either special cases of goalies pulled, or incorporating those into different strengths, including the new 6365 ones.
Therefore the shots that do not have onice count are discarded.
Score differential:
We treat the score score situations differently for all wellknown reasons. We define the scores differences between 2 (shooting team 2 goals behind) to 2 separately, and the larger differences fall all under 3 and 3.
Rebound:
Rebound is defined when a shot attempt (including blocked shot) happened recently prior to this shot being analyzed. Not only we test for such an event in the 3 seconds before the shot, we also catalog them separately by the time passed between the previous attempt and the rebound (0, 1, 2, 3 seconds).
Rush:
We define rush similarly to others, as a shot attempt from offensive zone which is preceded by a nonfaceoff event in nonoffensive zone within five seconds.
Give:
We consider a defensive zone giveaway an event seriously tilting the chances of a score due to the goaltender being caught somewhat unaware. We define a qualifying giveaway event to have occurred in the defensive zone within 6 seconds of the shot.
Take:
We consider a offensive zone takeaway an event seriously tilting the chances of a score due to the goaltender being caught somewhat unaware. We define a qualifying takeaway event to have occurred in the offensive zone within 6 seconds of the shot.
Schedule aspects, such as backtoback games (or prolonged breaks) or Home/Away are possible factors that were not considered. Maybe, next summer.
3. The application of the factors
We begin with setting some starting/default values. We obtain these values from two first seasons of our research period, i.e. from 2010/11 and 2011/12. Later we will incorporate the remainder of the data into it.
As we said earlier, let's assume an overall probability of the shot going in is about 0.0622 (from these two seasons). That corresponds to Elo rating difference of about 471 points in favor of the goaltender. Therefore we can say that an average shooter from an average location against an average goaltender is like a match between a 2029 and a 2500 rated players.
Now the factors from the previous chapter come in handy. For each of these factors (excluding the personal ratings for now) we compute the probability of success for a shot with each value of the factor. For example, for a binary factor like takeaway we get the following table:
Takeaway  

0

0.0618

1

0.1014

For a nonbinary factor, e.g. strength (no penalty shots):
Strength  p  Δelo 

33  0.0500  40.0 
34  0.0968  83.5 
35  0.1053  99.7 
43  0.1099  108.1 
44  0.0645  6.9 
45  0.0656  10.1 
53  0.1683  193.9 
54  0.0852  59.1 
55  0.0563  18.3 
Note that these numbers predate the 3on3 overtime. Also, it's tougher to score in full strength than on average.
Then to get the overall shot rating we add these differences to the original value of $R_{base}$==2029 (divided by 1.5 due to the behavior of the Elo sigmoid at low probabilities), and also add the difference between $R_s$ and 2029 (also assigned as the initial shooter rating), and subtract the difference between $R_g$ 2500 (also assigned as the initial goalie rating).
So we have
$$R_{shot} = R_{base} + 3/2∑Δ_f +(R_s  R_{base} + (2500R_g)$$
$R_{shot} = R_s +3/2∑Δ_f + (2500  R_g)$ (3)
Then we can estimate the chances of the goal going in by the Elo formula:
$$P_s = 1 / ( 1 + 10 ^ (( 2500  R_{shot} ) / 400))$$
or, effectively
$P_s = 1 / ( 1 + 10 ^ ((R_g  R_s  3/2∑Δ_f)/400))$ (4)
Note that we never do an explicit match of shooter vs goalie. We could do that instead of including $R_s$ and $R_g$ in the formula, but that way proved to be more complicated and provided less consistent results.
Looks straightforward? Unfortunately, it isn't.
4. Confounders
If the factors were completely independent, our job would be done. Alas, they are not, they are implicitly confounding each other, i.e. there might be more deflecting shots resulting in goal on a powerplay, or more rebounds in quadrants close to the goal, and so on. Therefore we try to mitigate these dependencies in the following way:
I. As a base line we calculate the probabilities of success for shots in each quadrant we defined.
II. Then we compute the probability of success for shots with each separate factor value in the given quadrant.
III. We calculate the ratio the freshly computed probability to the general probability of success in this quadrant.
IV. The resulting confounding effect is then calculated according to the following formula:
$C_f = 1 / (log(ratio) + 1)$
V. We multiply each $Δ$ factor$ by the corresponding $C$, thus formulas (3) and (4) become:
$R_{shot} = R_s + 3/2∑Δ_fC_f/1.5 + (2500  R_g)$ (5)
and
$P_s = 1 / ( 1 + 10 ^ ((R_g  R_s  3/2∑Δ_fC_f) /400))$ (6)
To test the validity of the math above, we tested log loss of betting each shot not being a goal. By just using the base probability of 0.0622 the log loss was about 0.240. By using the probabilities computed through (5) and (6) the log loss was reduced to 0.210 with each factor contributing to the reduction.
5. The eXpected goal value and the save above expectation
Now by using (5) we can calculate the number of expected goals against each goalie in a game:
$xG = ∑↙{goalie}(P_s)$ in a given game.
We know how many goals were scored against the goalie and we can easily apply (2). The new rating of the goalie will be used in the calculations for the next game he participates in. Same applies for the shooters, only the sum is of the shots taken by the shooter. Empty Net shots are not accounted for.
We calculate the $xG$ and $G$ for games on each date starting with the 2012/13 season onward. After all games for a given date had been processed, we feed them back into the probabilities of the modifiers to keep them current, and in a way that gives the data from the current season double weight compared to the past data, whereas data from the earliest available date is tossed out.
Here is the sample of best and worst performances in $xGG$ for goalies and skaters, single game, and season (playoffs excluded):
Player  Date  Delta 

ALEXANDAR GEORGIEV  20190210  6.034 
EVGENI NABOKOV  20140323  5.787 
LAURENT BROSSOIT  20150409  5.245 
MIKE CONDON  20170119  5.178 
RYAN MILLER  20160117  5.153 
Player  Date  Delta 

AL MONTOYA  20161104  6.152 
SERGEI BOBROVSKY  20181204  5.706 
ROBIN LEHNER  20140227  5.313 
SERGEI BOBROVSKY  20181013  5.279 
JOEY MACDONALD  20130403  5.251 
Player  Date  Delta 

PATRIK LAINE  20181124  4.436 
CHRIS KUNITZ  20130203  3.646 
ALEX OVECHKIN  20131210  3.582 
AUSTON MATTHEWS  20161012  3.531 
BRAD RICHARDSON  20190228  3.525 
Player  Date  Delta 

NAZEM KADRI  20190210  2.355 
BROCK NELSON  20141213  1.873 
GABRIEL LANDESKOG  20190109  1.868 
LOGAN COUTURE  20150217  1.827 
RYAN O'REILLY  20190329  1.693 
Player  Date  Delta 

SERGEI BOBROVSKY  2016  35.787 
CAREY PRICE  2013  32.607 
JOHN GIBSON  2016  26.296 
CAREY PRICE  2014  24.424 
THOMAS GREISS  2015  24.153 
Player  Season  Delta 

JONATHAN QUICK  2018  42.428 
CAREY PRICE  2017  33.573 
CRAIG ANDERSON  2017  28.724 
THOMAS GREISS  2017  25.168 
SCOTT DARLING  2017  24.161 
Player  Season  Delta 

LEON DRAISAITL  2018  23.047 
PATRIK LAINE  2017  22.6 
ALEX DEBRINCAT  2018  20.249 
STEVEN STAMKOS  2018  19.833 
ALEX OVECHKIN  2013  19.052 
Player  Season  Delta 

ALEX CHIASSON  2013  12.174 
MIKE RICHARDS  2013  11.016 
ERIC STAAL  2015  10.476 
TYLER TOFFOLI  2018  10.319 
BRAYDEN SCHENN  2014  9.894 
6. Predictive aspects
Shooter
If a shooter has the rating $R_s$ above base shot rating $R_{base}$, then he increases the probability of a goal (and vice versa). The difference should be computed for each separate case, but on average, given a nearly linear behavior of the Elo function at low probabilities, each extra 10 points would account for 0.0035 difference in the probability of the shot. We can do a more particular job by surveying which factors dominate the shots of the player, and what's their probability altogether, excluding the shooter and thus compute the difference more precisely.Goalie
If a goalie has the rating $R_g$ above 2500 (base goaltender rating) then he decreases the probability of a goal (and vice versa) in exactly reverse way that the shooter does. However, the goaltenders face the shots from all possible factor values, therefore we must adjust the probability from the base probability (e.g. 0.0622). The only factor that possibly should be taken into account is the goalie's team.Team
We can approach the $xG$ (or rather $pG$ (projected goals)) of a team by two ways: iterating over the projected or published roster, or by blanketweightedaveraging the shots the team takes per game and their probabilities. The first way is more complex, but supposedly more precise.Season
We do not see any particular implications of a seasonwide projection at any level, team, skater or goaltender. For the first two we just multiple a single game projection by the number of games in a season. For the latter one, an estimate of the number of games would be necessary.Playoffs
In the playoffs we can hone our predictions to the given shooter and goalie's team. Maybe, that when home/away factors will also become more prominent.Here's current (EOS 2018/19) top 5 goalie and skater rankings:
Player  Rating 

ANTTI RAANTA  2535.5 
BEN BISHOP  2534.8 
JOHN GIBSON  2533.9 
ROBIN LEHNER  2524.7 
JUUSE SAROS  2523.1 
Player  Rating 

KEITH KINKAID  2480.6 
MAXIME LAGACE  2482.7 
GARRET SPARKS  2485.4 
CRAIG ANDERSON  2485.9 
CHAD JOHNSON  2487.7 
Player  Rating 

ALEX OVECHKIN  2104.4 
STEVEN STAMKOS  2097.2 
NIKITA KUCHEROV  2079.8 
PATRICK KANE  2077.3 
PATRIK LAINE  2076.0 
Player  Rating 

JORDAN STAAL  2002.8 
MATT MOULSON  2002.9 
JUSTIN ABDELKADER  2004.8 
PATRIC HORNQVIST  2004.9 
KYLE CLIFFORD  2005.1 
Concluding, the author wants to underline once again, that he realizes the insufficient theoretical background for the task undertaken, and that many assumptions that are made smell of ad hoc approach. However, we hope that the model finds its usefulness among the hockey fans, and that this research attracts people of better qualification that would be interested to polish and improve it.
No comments:
Post a Comment