Monday, March 13, 2017

On Buchholz and Sonneborn-Berger coefficients - Part II

Part I

2. The Sonneborn-Berger coefficient.
This stranger beast is a metric extensively used for tie-breaks in chess-round robins and as an auxiliary tie-break tool to the Buchholz coefficient in non-round robin. Let's start with the definition.

$$SB = Σ↙{n=1}↖N f(R_n,P_n)$$

where Rn is the result against the n-th opponent, and Pn is the opponent's points score.
The function  f(Rn, Pn) is defined as:

f(Win, Pn)  = Pn
f(Tie, Pn)  = Pn/2
f(Loss, Pn) = 0

The result value evaluates whether the participant performed better against stronger and weaker opposition. Actually, I do have a problem with this criteria as a tie-breaker, in my opinion ALL points are created equal, and it doesn't matter if they came from a contender or a bottom feeder. However, this metric does answer the notorious statements like "This team only shows up for big games" and "This team is only good against garbage opposition."

So, first of all, for the NHL application, we will modify the function f(Rn, Pn) to:

f(Win, Pn) = Pn
f(OW, Pn)  = 2*Pn/3
f(OL, Pn)  = Pn/3
f(L, Pn)   = 0

to account for the overtime point.

Then, we can calculate the minimal possible SBmin value for a team with the given schedule so far this season, by assigning Wins to be against the weakest teams played, and the OW/OL against the weakest remainder until the sum of W, OW and OL points add up to the number of points the team currently has.

Similarly we shall calculate the maximal possible SBmax value by assigning Wins to be against the strongest teams played, and the OW/OL against the strongest of the remainder, assuming OT wins are about 1/4 of the whole.

Then the closer the actual SB is to the SBmin or SBmax we may be able to say whether the team is successful more against the bottom feeders, the top guns, or whether it achieves its points from the whole spectrum available.

Here is the table describing how this season's teams have their SB positioned between SBmin and SBmax.

Team Points SBmin SBopt SB SBmax
Pittsburgh Penguins 1.40 44.28 46.48 46.24 53.06
Washington Capitals 1.40 44.70 46.74 47.77 52.89
Minnesota Wild 1.37 42.25 44.36 46.63 50.66
Columbus Blue Jackets 1.37 43.10 45.36 46.44 52.15
Chicago Blackhawks 1.34 41.61 43.90 43.79 50.80
San Jose Sharks 1.31 40.68 42.97 44.16 49.84
New York Rangers 1.30 41.25 43.67 45.55 50.92
Ottawa Senators 1.25 37.84 40.07 41.79 46.78
Montreal Canadiens 1.25 39.37 41.74 41.05 48.87
Anaheim Ducks 1.19 36.86 39.43 40.12 47.15
Calgary Flames 1.18 35.97 38.49 38.20 46.05
Edmonton Oilers 1.16 35.86 38.32 37.43 45.70
Boston Bruins 1.15 34.73 37.23 37.74 44.72
Nashville Predators 1.13 33.28 36.14 38.04 44.72
Toronto Maple Leafs 1.13 34.64 36.99 35.66 44.02
St. Louis Blues 1.12 34.69 37.14 38.52 44.50
New York Islanders 1.12 34.36 36.94 37.94 44.71
Tampa Bay Lightning 1.09 32.62 34.98 35.41 42.06
Los Angeles Kings 1.07 32.10 34.66 33.56 42.34
Philadelphia Flyers 1.04 31.26 33.56 32.01 40.48
Florida Panthers 1.03 30.89 33.12 30.95 39.82
Carolina Hurricanes 1.00 29.43 31.78 32.41 38.85
Buffalo Sabres 0.99 30.09 32.49 33.43 39.68
Winnipeg Jets 0.96 27.55 30.35 31.48 38.75
Vancouver Canucks 0.96 28.48 30.91 29.02 38.21
Dallas Stars 0.94 28.05 30.62 31.16 38.34
Detroit Red Wings 0.94 29.12 31.12 30.02 37.13
New Jersey Devils 0.91 27.78 30.15 28.63 37.27
Arizona Coyotes 0.84 25.13 27.24 25.86 33.56
Colorado Avalanche 0.61 17.90 19.74 19.98 25.25

Once again, we use Point Per Game values because the teams and their opponents have a different number of games played at most of the moments within a season.

We would dare to make one more step forward and claim that the team that performs closer to SBmax seem to have a coach problem (notable differences highlighted in green in the table above). The roster is there to compete against the best, but the points aren't trickling in at a pace good enough against the fodder. Similarly, if the SB value is closer to SBmin is more likely to have a GM problem (notable differences highlighted in blue in the table above), that its roster is not good enough to compete, but the coach is able to squeeze close to the maximum out of it. However, it is natural to win more games against the weaker teams, so we set the balance point at SBopt = (SBmax + 3*SBmin) / 4;

Wrapping up the talk about the Buchholz and the Sonneborn-Berger coefficients we would like to state that these values have an almost entirely descriptive value and without any predictive capability, with a small exception of the Buchholz-based remaining schedule strength metric. And even then, it's sort of a 'descriptive prediction'.

Please see more Buchholz and Berger-Sonneborn data on the website!

Sunday, March 12, 2017

On Buchholz and Sonneborn-Berger coefficients.


The practice of chess tournaments provides two traditional metrics that are used to rank participants beyond their mere scoring. Their names are the Buchholz coefficient and the Sonneborn-Berger coefficient (often called just Berger). They are frequently used as tie-breakers in chess events, however I arrived to completely different application for them for the National Hockey League seasons.

1. The Buchholz coefficient

The Buchholz coefficient is simply the sum of the points of your opponents.

B = Σn=1N Pn

So, if you played five games, and your opponents currently have 5, 3, 8, 6 and 6 points, your Buchholz value will be 28. Please note, that the current number of points is always used, not the number of points at the moment of meeting. The outcome of the game does not matter (for that one see the Sonneborn-Berger).

At first, the usefulness of such a criteria would prompt a raise of the eyebrow. However, it's not used in round-robin all-play-all tournaments as a final tie-break, because, naturally, the coefficient would be the same for all tied parties. It's used in a special format of chess events called the Swiss Tournament, not very popular outside of the realm of board games for purely logistic reason. But then, consider, first, an NFL season. The list of opponents every team plays there over the 16-game season may be quite different. And, whoever would end up with a larger Buchholz coefficient, clearly would've had stronger opposition on the way.

Now let's go back to hockey. First of all, at the end of the season, although everyone has played everyone, they did so a different number of times. Thus, the sum of opponents' points at the end of the season could be different between teams - including within the same division, if they had a different schedule. So, this could still be a very valid tiebreak. Secondly, the season is so long (82 games, unlike a chess Swiss which is rarely longer than 11 rounds), and that gives us a lot of midway points in time, when the all-play-all has not been completed yet! Here the Buchholz coefficient can clearly show, who has had the stronger opposition up until a certain moment.

Then, if we look at the remainder of the schedule for each team, and for every game we add the opponent's points we get an excellent remaining schedule strength estimator.

Wait... there's a caveat.

Unlike in a chess tournament, where every round occurs for everyone at the same time, and barring very rare circumstances, every participant played an equal amount of games at any point of the tournament, there may be a significant difference in the number of games played by different teams, so summing the opponents up will not work very well. And these opponents also played a different number of games, so their total amount of points is not a very good indicator.

Fortunately, it's not a big deal. Instead of totals, let's operate with per-game numbers. So the NHL Buchholz Coefficient for a team after N games becomes:

B = (Σn=1PPGn)/N. 

Same applies for the remaining schedule strength, where the per-game numbers of the remaining opposition are summed an averaged.

So, if the team played three games against opponents who currently are:
A) 6 points in 4 games, B) 3 points in 3 games, C) 2 point in 5 games, then the team's Buchholz value would be (6/4 + 3/3 + 2/5) / 3 = 2.9/3 ~ 0.967pts.

Here are the current (Mar 12th 2017) Buchholz coefficients and remaining schedule strengths for the entire 30 times (and note how the Blues stand out with plenty of matchups vs Colorado and Arizona remaining).

+-----------------------+-----------+-------+-------+
| Team Name             | PPG       | Buch  | RStr  |
+-----------------------+-----------+-------+-------+
| Washington Capitals   | 1.4179105 | 1.119 | 1.133 |
| Pittsburgh Penguins   | 1.4029851 | 1.117 | 1.127 |
| Minnesota Wild        | 1.3939394 | 1.090 | 1.070 |
| Columbus Blue Jackets | 1.3731343 | 1.125 | 1.132 |
| Chicago Blackhawks    | 1.3283582 | 1.088 | 1.096 |
| San Jose Sharks       | 1.2985075 | 1.106 | 1.106 |
| New York Rangers      | 1.2941176 | 1.120 | 1.184 |
| Ottawa Senators       | 1.2537313 | 1.105 | 1.169 |
| Montreal Canadiens    | 1.2352941 | 1.122 | 1.097 |
| Edmonton Oilers       | 1.1791044 | 1.121 | 1.040 |
| Anaheim Ducks         | 1.1764706 | 1.102 | 1.150 |
| Calgary Flames        | 1.1764706 | 1.099 | 1.140 |
| Boston Bruins         | 1.1470588 | 1.115 | 1.151 |
| Toronto Maple Leafs   | 1.1343284 | 1.114 | 1.150 |
| Nashville Predators   | 1.1323529 | 1.105 | 1.116 |
| St. Louis Blues       | 1.1194030 | 1.144 | 0.943 |
| New York Islanders    | 1.1194030 | 1.142 | 1.103 |
| Tampa Bay Lightning   | 1.0895522 | 1.121 | 1.134 |
| Los Angeles Kings     | 1.0746269 | 1.118 | 1.104 |
| Philadelphia Flyers   | 1.0447761 | 1.122 | 1.179 |
| Florida Panthers      | 1.0298507 | 1.118 | 1.175 |
| Carolina Hurricanes   | 1.0000000 | 1.138 | 1.136 |
| Buffalo Sabres        | 0.9855072 | 1.127 | 1.158 |
| Winnipeg Jets         | 0.9565217 | 1.110 | 1.143 |
| Vancouver Canucks     | 0.9558824 | 1.115 | 1.152 |
| Dallas Stars          | 0.9552239 | 1.119 | 1.100 |
| Detroit Red Wings     | 0.9545455 | 1.151 | 1.059 |
| New Jersey Devils     | 0.9117647 | 1.148 | 1.132 |
| Arizona Coyotes       | 0.8358209 | 1.133 | 1.098 |
| Colorado Avalanche    | 0.6119403 | 1.128 | 1.164 |
+-----------------------+-----------+-------+-------+

In tne next installment we're going to talk about the application of the Sonneborn-Berger coefficient to the NHL regular season.


Thursday, March 2, 2017

On schedule - played and remaining

Here I would like to present visualization of the schedule of the teams, played and remaining. This is actually a graphic representation of the Buchholz/Sonneborn and teams Elo tables I present on the website.

First, let's start with the played games and points.


 Naturally, most of the squares above the X-diagonal indicate more points than the ones below; however we can see interesting anomalies, such as BUF-OTT, TOR-BOS, ARI-SJS, WPG-CHI and probably the most intriguing: NYR - WSH (expected 1st round meeting)

Another unusual thing is that the Sharks are only playing Colorado twice this season rather than the regular 3-4 intraconference games.

Now let's take a look at the remaining games and the expected points.


We can see that STL may expect a big boost from having to play Colorado four(!) more times this season as well as Arizona three times and that Ottawa has two biggest season series mostly unresolved - against MTL and BOS. The expected points are being calculated based on teams Elo rating:

xPts = Ngames/(1 + 10(Eloopp-Eloteam)/400))

however for the sake of precision this number should've been scaled by 2 (since it produces an outcome between 0 and 1 (0.5 for a "tie") and also by the OT factor, i .e. the probability of a team getting an OT point, around 1.125. But for visualization purposes this does not matter.

There are also nice patters indicating travels through California and Western Canada. 

Sunday, February 19, 2017

On schedule breaks - some crosstables

I took a look at how the teams performed one against another depending on the break length. All lengths longer than five were truncated to five, and the back-to-backs are designated as 0-length.

Here's the crosstable for the 2015 season:
#days 0 1 2 3 4 5
0 X 84-90-28 35-38-8 7-6-1 3-7-0 4-2-0
1 118-65-19 X 118-82-27 28-18-9 6-4-1 7-5-0
2 46-29-6 109-95-23 X 15-11-3 5-1-0 2-0-0
3 7-3-4 27-23-5 14-11-4 X 5-4-0 0-1-0
4 7-3-0 5-3-3 1-4-1 4-4-1 X 2-0-0
5 2-2-2 5-4-3 0-2-0 1-0-0 0-2-0 X

And here's the one for the ongoing, 2016 season, in the midst of bye weeks:
#days 0 1 2 3 4 5
0 X 59-48-23 21-23-7 7-3-3 3-0-0 6-3-2
1 71-48-11 X 61-64-21 16-7-3 3-4-0 11-4-0
2 30-13-8 85-44-17 X 5-3-5 0-0-0 1-1-0
3 6-5-2 10-12-4 8-4-1 X 3-1-0 0-0-0
4 0-2-1 4-1-2 0-0-0 1-1-2 X 2-0-0
5 5-5-1 4-9-2 1-0-1 0-0-0 0-2-0 X

It's obvious that the bye weeks are no good for the teams, and NHL should convince the NHLPA to rescind it for 2017/18.

Just for fun, here's the aggregate since 2005, when the ties were abolished:
#days 0 1 2 3 4 5
0 X 1013-1030-285 352-380-104 117-104-26 37-34-10 45-34-11
1 1315-760-253 X 1128-917-284 305-231-67 119-86-24 61-42-7
2 484-267-85 1201-852-276 X 149-103-34 44-24-8 15-10-2
3 130-87-30 298-232-73 137-114-35 X 20-19-1 3-3-1
4 44-32-5 110-87-32 32-34-10 20-9-11 X 6-4-1
5 45-30-15 49-44-17 12-13-2 4-2-1 5-6-0 X

Soon to become part of the website!

Thursday, February 16, 2017

Another rule change suggestion

Better less, but better
V.I. Lenin

I've got another rule change suggestion, this one even simpler:

Allow teams to decline penalty shot awards in favor of a regular power-play.

I think it adds more tactical variety to the game and discourages penalties on breakaways that are worse in penalty shooting.

As a side matter, I think: a player who is charged with the offense after which the penalty shot is awarded should still be added a minor penalty (2 minutes) in the statistics.

Friday, February 10, 2017

On Leads Changes and Swings

Wild thing, you make my heart sing
You make everything groovy, wild thing

Also inspired by Twitter, and because I can, I decided to gather statistics on games with
  • most lead changes*
  • most lead swings**
Here, for the 2016/17 season:
By most lead swings:
AWAY    HOME   Date        Sco LC LS
CHI  vs DAL  on 2017/02/04: 5-3 7 3
CBJ  vs OTT  on 2017/01/22: 7-6 11 3
PHI  vs STL  on 2016/12/28: 3-6 7 3
MTL  vs PIT  on 2016/12/31: 3-4 7 3
CHI  vs NYI  on 2016/12/15: 5-4 7 3
ARI  vs PHI  on 2016/10/27: 5-4 9 3

with 60 games at 2 lead swings. Dallas leads the way with 8 games with at least two swings, and Carolina, Chicago, NY Islanders and Winnpeg follow with 7 each.

By most lead changes:


AWAY    HOME   Date        Sco LC LS
CBJ  vs OTT  on 2017/01/22: 7-6 11 3
TOR  vs WSH  on 2017/01/03: 5-6  9 2
TOR  vs NYI  on 2017/02/06: 5-6  9 2
NYI  vs DET  on 2017/02/03: 4-5  9 1
CHI  vs COL  on 2017/01/17: 6-4  9 2
CAR  vs NYI  on 2017/02/04: 5-4  9 2
CHI  vs STL  on 2016/12/17: 6-4  9 1
BUF  vs OTT  on 2016/11/29: 5-4  9 1
ARI  vs PHI  on 2016/10/27: 5-4  9 3

with 31 game with at least 7 lead changes. Here we've got Carolina, Chicago and NY Islanders at the lead with at least 6 games with 7 or more lead changes.

And what do we get historically?

The wildest games, regular season, by lead swings:
AWAY    HOME   Date         Sco LC LS
PHI  vs BOS  on 2011/01/13: 5-7  11 5
COL  vs CGY  on 1991/02/23: 8-10 11 5
ARI  vs CGY  on 1991/01/15: 5-7  11 5
PHI  vs COL  on 1988/11/19: 5-6  11 5

with 30 games at 4 lead swings.

The wildest games, regular season, by lead changes:
AWAY    HOME   Date        Sco LC LS
DET  vs SJS  on 2005/11/26: 7-6 13 4
MTL  vs COL  on 2002/12/06: 6-7 13 2
COL  vs SJS  on 1997/04/04: 6-7 13 2
ARI  vs PHI  on 1990/01/25: 6-8 13 1
TOR  vs PIT  on 1989/10/25: 8-6 13 3
COL  vs WSH  on 1997/11/18: 6-6 12 3
PIT  vs NJD  on 1993/04/14: 6-6 12 1
BUF  vs CAR  on 1991/12/07: 6-6 12 4
CAR  vs TOR  on 1990/02/14: 6-6 12 2
VAN  vs TOR  on 1988/01/04: 7-7 12 3

with 65 games at 11 lead changes (even numbers can only occur in the ties era).

The wildest games, playoffs, by lead swings:
AWAY    HOME   Date        Sco LC LS
STL  vs DAL  on 1999/05/08: 4-5 9 4
MTL  vs COL  on 1993/04/26: 5-4 9 4
EDM  vs LAK  on 1992/04/20: 5-8 9 4

with 33 games at 3 lead swings.

The wildest games, playoffs, by lead changes:
AWAY    HOME   Date        Sco LC LS
BUF  vs OTT  on 2006/05/05: 7-6 13 2
PHI  vs CHI  on 2010/05/29: 5-6 11 3
COL  vs SJS  on 2010/04/16: 5-6 11 1
PHI  vs WSH  on 1989/04/11: 8-5 11 3

with 42 games at 9 lead changes (only odd numbers can occur)

The data is presented since the year 1987 - the earliest boxscores from the NHL.com
Now this one is going to make it into the website, I just haven't decided in which form.

*   Lead swing is defined as when a team takes the lead after the other team had it. 
** Lead change is defined as when a team loses the lead, even if only temporarily to a tied score.

Thursday, February 9, 2017

On goalposts statistics

Why does the cat lick his balls?
Because it can.

Recently I saw a request on a stats of goal posts / crossbars hit per game. While I do have that statistic per player, I haven't one for games, so - since I can - why shouldn't I produce one?

About half an hour of Perl-ing created the following summary:

Irons altogether, top:
AWAY    HOME                P C T
OTT  vs BUF  on 2011/12/31: 8 0 8
VAN  vs FLA  on 2010/02/11: 7 0 7
WPG  vs FLA  on 2009/12/05: 6 1 7
TOR  vs BUF  on 2007/10/15: 6 1 7
TBL  vs FLA  on 2006/04/01: 6 1 7
PHI  vs PIT  on 2006/03/12: 7 0 7
COL  vs NYI  on 2005/12/17: 7 0 7
NSH  vs DAL  on 2016/03/29: 4 2 6
PIT  vs NSH  on 2014/03/04: 5 1 6
NYI  vs TBL  on 2014/01/16: 3 3 6
DAL  vs VAN  on 2013/02/15: 5 1 6
STL  vs CAR  on 2012/03/15: 5 1 6
WPG  vs MTL  on 2011/01/02: 6 0 6
OTT  vs VAN  on 2011/02/07: 6 0 6
MTL  vs CAR  on 2011/11/23: 6 0 6
LAK  vs DAL  on 2010/03/12: 4 2 6
NJD  vs TBL  on 2009/10/08: 6 0 6
LAK  vs DAL  on 2009/10/19: 5 1 6
DAL  vs CBJ  on 2009/01/31: 5 1 6
COL  vs CHI  on 2009/11/11: 6 0 6
PIT  vs WPG  on 2008/01/30: 5 1 6
NYR  vs NJD  on 2008/04/09: 4 2 6
STL  vs ARI  on 2007/01/15: 5 1 6

followed by 109 games with 5 irons hit.

Crossbars, top:
AWAY    HOME                P C T
CGY  vs CBJ  on 2008/11/08: 1 4 5
NYR  vs FLA  on 2007/11/23: 0 4 4
PHI  vs FLA  on 2006/12/27: 1 4 5
BUF  vs DAL  on 2017/01/26: 1 3 4
EDM  vs DAL  on 2016/01/21: 2 3 5
TOR  vs STL  on 2015/01/17: 1 3 4
CHI  vs ANA  on 2015/05/19: 1 3 4
BOS  vs VAN  on 2015/02/13: 1 3 4
NYI  vs TBL  on 2014/01/16: 3 3 6
CHI  vs ANA  on 2008/01/04: 2 3 5
CAR  vs FLA  on 2007/11/12: 1 3 4

followed by 50 games with 2 crossbars hit.

The data is extracted from the PBP files of NHL.com, from the year 2005 on.

However I consider this a one-time effort and will not add this to the website itself.