Wednesday, August 16, 2017

A revision of HTML reports from the NHL website.

So, while revising and revisiting the data files, I did an extra scan of the HTML reports available on NHL.com .

The HTML reports are extracted through the following URL pattern: 

sprintf('http://www.nhl.com/scores/htmlreports/%d%d/%s%02d%04d.HTM', $season, $season+1, $type, $stage, $game_id) 

where $season is the start year of the season, $type is the type of the report, $stage is 2 for regular, 3 for playoff, and $game_id is the NHL game id for the season.

Below are the results for the problematic games, with the reports that are beyond salvage marked accordingly:

Season Stg GameId ES GS PL RO
1999 Reg 0029 M M N N
1999 Reg 0045 I I N N
1999 Reg 0050 I I N N
1999 Reg 0058 R R N N
1999 Reg 0071 I I N N
1999 Reg 0072 I I N N
1999 Reg 0081 I I N N
1999 Reg 0109 I I N N
1999 Reg 0130 I I N N
1999 Reg 0323 R R N N
1999 Reg 0619 I I N N
1999 Reg 0689 I I N N
1999 Reg 0690 B I N N
1999 Reg 0836 I I N N
1999 Reg 1034 I I N N
1999 P/O 0325 I I N N
2000 Reg 0029 R R N N
2000 Reg 0038 R R N N
2000 Reg 0039 I I N N
2000 Reg 0041 R R N N
2000 Reg 0042 R R N N
2000 Reg 0043 R R N N
2000 Reg 0044 R R N N
2000 Reg 0045 I I N N
2000 Reg 0049 R R N N
2000 Reg 0067 R R N N
2000 Reg 0072 I I N N
2000 Reg 0073 I I N N
2000 Reg 0077 I I N N
2000 Reg 0080 R R N N
2000 Reg 0083 R R N N
2000 Reg 0085 I I N N
2000 Reg 0095 R R N N
2000 Reg 0102 I I N N
2000 Reg 0112 R R N N
2000 Reg 0186 I I N N
2000 Reg 0187 I I N N
2000 Reg 0189 I I N N
2000 Reg 0920 R R N N
2000 Reg 0921 I I N N
2000 Reg 0924 R R N N
2000 Reg 0925 R R N N
2000 Reg 0926 R R N N
2000 Reg 0928 I I N N
2000 Reg 0983 B V N N
2000 Reg 1166 I I N N
2003 Reg 0191 V I V N
2003 Reg 1205 I I I N
2003 P/O 0134 V V B N
2005 Reg 0298 R V V N
2005 Reg 0458 B V V N
2005 Reg 0677 V V V B
2005 Reg 0679 V V V B
2005 Reg 0681 V V V B
2007 Reg 1178 I I B I
2008 Reg 0259 I I B I
2008 Reg 0409 I I B I
2008 Reg 1077 I I B I
2009 Reg 0081 I I B I
2009 Reg 0827 V I V V
2009 Reg 0836 V I V V
2009 Reg 0857 I I V V
2009 Reg 0863 I I V V
2009 Reg 0874 I I V I
2009 Reg 0885 I V V I
2010 Reg 0429 I I V V
2011 Reg 0259 I I V V
Legend:
I - incomplete (not through the end of the game)
B - broken (doesn't pass HTML parser)
M - misplaced (belongs to a different game that doesn't have a file associated with it)
R - replica (copy of a file from another game
N - not available (file not available on NHL.com)
V - file is good.

Friday, June 30, 2017

I'm not gone

For those few people who may be reading this -

This blog ain't dead. It's just that I've been so busy so far with my regular job, with my website rework and with moving into the new house, that I haven't a time to consider a proper post.

Stay tuned, though...

Sunday, May 28, 2017

Last features - and a summer freeze

During the month of May I added two new features for the website, naturally dedicated to the upcoming draft:

Draft Pick Stats
Draft Pick Success

Now it's time to step back, look around and improve the infrastructure behind the website, both in the matter of the collection and the publication of the data. Thanks to all the visitors both of this blog and of the website.

I will continue to publish my predictions for the remainder of the playoff games on Twitter - follow me @morehockeystats! You are also most welcome to send me your ideas for new unusual statistics.

So the projected plan for the summer months is:
June - improve and finally release the Perl scraper
July - improve the publishing mechanism for the website and speed it up
August - improve the Elo-based models behind the site's projections
September - add some new features

This blog is on no pause! I will continue to publish entries about my hockey-related thoughts as they come. Stay tuned!

Monday, May 1, 2017

On carrying Momentum...

Frequently, the importance of carrying momentum over an intermission can be heard being talked about. I thought it were possible to measure this harmony with algebra, so I tried to do that. I choose to analyze a very specific question:

If the regulation of a game ends in a tie, other than 0-0, how frequently would the team that tied the game with the last regulation goal win in overtime. 

We would define the team that tied the game as the one having the momentum. We would define the other team as the one trying to show resilience. For answering the question, we analyzed the outcome of games of seasons 2007/08-2016/17 (including the ongoing playoffs). We discard the games that end in a shootout, because their outcome depend truly more on the skill of the shooting players/goaltenders rather than the whatever momentum might've been accrued.

The results of the analysis are displayed in the table below, per season, per the time frame during which the last tying goal was scored: in the last two, five, or ten minutes, in the last period, or in one of the first two. The numbers show the percentage of wins by the team having the momentum and the number of games falling into that specific segment. Also we display a separate column and a separate row for playoffs game, although a finer granularity is not really possible because of the sample size.

Season   2        5        10       20       1st/2nd  total     totalPO
2007     54.2/24  57.9/19  52.9/34  53.8/13  52.6/38  53.9/128  31.2/16
2008     43.5/23  48.1/27  45.2/31  53.8/13  40.0/40  44.8/134  25.0/16
2009     42.9/28  56.5/23  72.7/22  64.7/17  53.7/41  56.5/131  58.8/17
2010     48.6/37  54.2/24  47.1/34  40.7/27  56.8/44  50.0/166  59.1/22
2011     50.0/24  45.8/24  43.5/23  72.0/25  47.7/44  51.4/140  37.5/24
2012     62.5/16  33.3/15  50.0/22  50.0/14  57.9/19  51.2/86   53.8/26
2013     58.1/43  43.5/23  34.6/26  45.5/22  44.1/34  46.6/148  70.8/24
2014     51.7/29  65.2/23  55.3/38  46.7/15  60.5/43  56.8/148  57.9/19
2015     60.0/40  46.7/30  44.4/36  45.8/24  39.6/48  47.2/178  52.6/19
2016     43.6/39  50.0/28  60.5/38  48.1/27  61.8/68  54.5/200  63.2/19
totalPO  61.4/44  46.7/30  55.8/43  68.4/19  40.9/66  52.0/202 52.0/202
total   51.5/303 50.4/236 50.7/304 51.8/197 51.8/419 51.3/1459 52.0/202

We see that there is no specific "momentum" nor "resilience" capability overall, there is practically no indication on how the OT would end based on which team scored the last GTG. The only two moderate exceptions with decent sample sizes are the second and the sixth columns of the penultimate row. The GTG-scoring team is 27-17 (61.4%) in case it scored the tying goal in the last two minutes, however if the GTG was scored before the last period, as it happened in 66 games, the momentum would obviously not carry over two or more intermissions, and the tying team is 27-39 (40.9%) in these games.

Here is how it looks on a graph:
We can see all lines wobbling slightly above the 50 mark. Insufficiently above. Even if we observe the extra 1.3% chance overall (2.0% in playoffs) - wouldn't it be more related to the home/away advantage? I haven't looked at this aspect yet. Maybe another time.

The Real Life[TM] took a bit of a toll on this blog... But we resume, with resilience and hoping to generate momentum!

Thursday, March 30, 2017

On the NHL Scoring System - Part III

Part I
Part II


Once again, driven by idea that if you want to encourage goal scoring, you need to reward the goal scoring in standings directly, not indirectly through winning. Then, based on the idea of a fellow hockey fan and blogger, a new suggestion was born in my mind.

Not so long ago I was involved in another discussion on the subject on Twitter, where an interesting alternative, 2-1-0-0 was described. The idea is that you still get two points for a win in regulation, just one point for a win in OT, but nothing if you lose, and, the key, both teams get nothing if the game is tied at the end of regulation (shootouts are abolished). This is a very sharp idea, but for me something felt very wrong, and then it crystallized:

It's not fair to reward a hard fought 5-5 tie with zero points, just like a lazy-skated 1-1. We still want to encourage goal scoring, and the simple 2-1-0-0 just unbalances the game. And so it dawned on me. We should reward goals with extra standings points!

The formula that first came to mind, and which seemed fair: give each goal a 0.1 point in the standings, while the win-scoring system shall be 2-1-0-0. If you or your database have an aversion against decimals, assign 20 points for a win, 10 points for OT loss, and 1 extra point for each goal scored. This will encourage goal scoring in any situation, and for both sides, including the games that go into garbage time pretty quickly. So, a 7-2 win will give the winner 2.7 points, and the loser 0.2 points. A 2-0 win will give the winner 2.2 points, the loser 0. A 4-3 OT win will give the winner 1.4 points, the loser 0.3 points. A 5-5 OT tie will give each side 0.5 points.

Wait, there's a caveat.

Imagine a situation where a team needs just 0.1 point to pass another one in the standings for the playoff spot. They are playing an opponent whose number of points in the standings does not have any effect on them. In such a situation, the team would play without a goaltender at all, because they don't care how much they lose, they just need that goal. Now, this is not really hockey, so to prevent this kind of play a restriction needs to be introduced:

Any goal scored without a goaltender on the ice, when not on a delayed penalty, and when trailing by more than two goals shall not yield any standings points.

Here is an example what the today's standings would look like under the suggested system:

Team                           W  OW T  L  GF  GA  P
Boston Bruins                  34 04 04 34 216 201 93.6
Montreal Canadiens             31 09 05 31 205 186 91.5
Ottawa Senators                32 04 08 31 191 191 87.1
--------------------------------------------------------
Washington Capitals            41 08 07 20 246 165 114.6
Columbus Blue Jackets          38 09 04 24 233 170 108.3
Pittsburgh Penguins            37 06 08 25 256 211 105.6
--------------------------------------------------------
New York Rangers               38 05 06 28 242 203 105.2
Toronto Maple Leafs            29 06 09 31 229 213 86.9
--------------------------------------------------------
New York Islanders             28 05 06 36 217 224 82.7
Tampa Bay Lightning            27 06 07 35 206 207 80.6
Carolina Hurricanes            28 04 07 36 198 208 79.8
Buffalo Sabres                 24 06 08 39 191 215 73.1
Philadelphia Flyers            22 07 11 36 193 218 70.3
Florida Panthers               21 07 11 37 192 210 68.2
New Jersey Devils              18 06 06 46 171 221 59.1
Detroit Red Wings              16 07 08 45 181 224 57.1
--------------------------------------------------------
Chicago Blackhawks             36 09 05 27 230 197 104.0
Minnesota Wild                 37 04 05 30 241 193 102.1
St. Louis Blues                35 06 02 33 213 200 97.3
--------------------------------------------------------
San Jose Sharks                35 06 03 32 204 185 96.4
Anaheim Ducks                  37 02 06 31 200 183 96.0
Edmonton Oilers                33 05 09 29 221 191 93.1
--------------------------------------------------------
Nashville Predators            33 04 06 33 224 206 92.4
Calgary Flames                 30 09 06 32 208 206 89.8
--------------------------------------------------------
Winnipeg Jets                  29 03 04 41 226 243 83.6
Dallas Stars                   27 04 02 43 207 240 78.7
Los Angeles Kings              23 11 06 36 183 185 75.3
Vancouver Canucks              19 07 06 44 169 221 61.9
Arizona Coyotes                17 04 08 48 176 245 55.6
Colorado Avalanche             14 06 01 55 150 257 49.0

Naturally, they would not be the same standings if the system were indeed implemented, but why not to take a look. And once again, try it in the AHL first, it won't hurt anyone.

Monday, March 13, 2017

On Buchholz and Sonneborn-Berger coefficients - Part II

Part I

2. The Sonneborn-Berger coefficient.
This stranger beast is a metric extensively used for tie-breaks in chess-round robins and as an auxiliary tie-break tool to the Buchholz coefficient in non-round robin. Let's start with the definition.

$$SB = Σ↙{n=1}↖N f(R_n,P_n)$$

where Rn is the result against the n-th opponent, and Pn is the opponent's points score.
The function  f(Rn, Pn) is defined as:

f(Win, Pn)  = Pn
f(Tie, Pn)  = Pn/2
f(Loss, Pn) = 0

The result value evaluates whether the participant performed better against stronger and weaker opposition. Actually, I do have a problem with this criteria as a tie-breaker, in my opinion ALL points are created equal, and it doesn't matter if they came from a contender or a bottom feeder. However, this metric does answer the notorious statements like "This team only shows up for big games" and "This team is only good against garbage opposition."

So, first of all, for the NHL application, we will modify the function f(Rn, Pn) to:

f(Win, Pn) = Pn
f(OW, Pn)  = 2*Pn/3
f(OL, Pn)  = Pn/3
f(L, Pn)   = 0

to account for the overtime point.

Then, we can calculate the minimal possible SBmin value for a team with the given schedule so far this season, by assigning Wins to be against the weakest teams played, and the OW/OL against the weakest remainder until the sum of W, OW and OL points add up to the number of points the team currently has.

Similarly we shall calculate the maximal possible SBmax value by assigning Wins to be against the strongest teams played, and the OW/OL against the strongest of the remainder, assuming OT wins are about 1/4 of the whole.

Then the closer the actual SB is to the SBmin or SBmax we may be able to say whether the team is successful more against the bottom feeders, the top guns, or whether it achieves its points from the whole spectrum available.

Here is the table describing how this season's teams have their SB positioned between SBmin and SBmax.

Team Points SBmin SBopt SB SBmax
Pittsburgh Penguins 1.40 44.28 46.48 46.24 53.06
Washington Capitals 1.40 44.70 46.74 47.77 52.89
Minnesota Wild 1.37 42.25 44.36 46.63 50.66
Columbus Blue Jackets 1.37 43.10 45.36 46.44 52.15
Chicago Blackhawks 1.34 41.61 43.90 43.79 50.80
San Jose Sharks 1.31 40.68 42.97 44.16 49.84
New York Rangers 1.30 41.25 43.67 45.55 50.92
Ottawa Senators 1.25 37.84 40.07 41.79 46.78
Montreal Canadiens 1.25 39.37 41.74 41.05 48.87
Anaheim Ducks 1.19 36.86 39.43 40.12 47.15
Calgary Flames 1.18 35.97 38.49 38.20 46.05
Edmonton Oilers 1.16 35.86 38.32 37.43 45.70
Boston Bruins 1.15 34.73 37.23 37.74 44.72
Nashville Predators 1.13 33.28 36.14 38.04 44.72
Toronto Maple Leafs 1.13 34.64 36.99 35.66 44.02
St. Louis Blues 1.12 34.69 37.14 38.52 44.50
New York Islanders 1.12 34.36 36.94 37.94 44.71
Tampa Bay Lightning 1.09 32.62 34.98 35.41 42.06
Los Angeles Kings 1.07 32.10 34.66 33.56 42.34
Philadelphia Flyers 1.04 31.26 33.56 32.01 40.48
Florida Panthers 1.03 30.89 33.12 30.95 39.82
Carolina Hurricanes 1.00 29.43 31.78 32.41 38.85
Buffalo Sabres 0.99 30.09 32.49 33.43 39.68
Winnipeg Jets 0.96 27.55 30.35 31.48 38.75
Vancouver Canucks 0.96 28.48 30.91 29.02 38.21
Dallas Stars 0.94 28.05 30.62 31.16 38.34
Detroit Red Wings 0.94 29.12 31.12 30.02 37.13
New Jersey Devils 0.91 27.78 30.15 28.63 37.27
Arizona Coyotes 0.84 25.13 27.24 25.86 33.56
Colorado Avalanche 0.61 17.90 19.74 19.98 25.25

Once again, we use Point Per Game values because the teams and their opponents have a different number of games played at most of the moments within a season.

We would dare to make one more step forward and claim that the team that performs closer to SBmax seem to have a coach problem (notable differences highlighted in green in the table above). The roster is there to compete against the best, but the points aren't trickling in at a pace good enough against the fodder. Similarly, if the SB value is closer to SBmin is more likely to have a GM problem (notable differences highlighted in blue in the table above), that its roster is not good enough to compete, but the coach is able to squeeze close to the maximum out of it. However, it is natural to win more games against the weaker teams, so we set the balance point at SBopt = (SBmax + 3*SBmin) / 4;

Wrapping up the talk about the Buchholz and the Sonneborn-Berger coefficients we would like to state that these values have an almost entirely descriptive value and without any predictive capability, with a small exception of the Buchholz-based remaining schedule strength metric. And even then, it's sort of a 'descriptive prediction'.

Please see more Buchholz and Berger-Sonneborn data on the website!

Sunday, March 12, 2017

On Buchholz and Sonneborn-Berger coefficients.


The practice of chess tournaments provides two traditional metrics that are used to rank participants beyond their mere scoring. Their names are the Buchholz coefficient and the Sonneborn-Berger coefficient (often called just Berger). They are frequently used as tie-breakers in chess events, however I arrived to completely different application for them for the National Hockey League seasons.

1. The Buchholz coefficient

The Buchholz coefficient is simply the sum of the points of your opponents.

B = Σn=1N Pn

So, if you played five games, and your opponents currently have 5, 3, 8, 6 and 6 points, your Buchholz value will be 28. Please note, that the current number of points is always used, not the number of points at the moment of meeting. The outcome of the game does not matter (for that one see the Sonneborn-Berger).

At first, the usefulness of such a criteria would prompt a raise of the eyebrow. However, it's not used in round-robin all-play-all tournaments as a final tie-break, because, naturally, the coefficient would be the same for all tied parties. It's used in a special format of chess events called the Swiss Tournament, not very popular outside of the realm of board games for purely logistic reason. But then, consider, first, an NFL season. The list of opponents every team plays there over the 16-game season may be quite different. And, whoever would end up with a larger Buchholz coefficient, clearly would've had stronger opposition on the way.

Now let's go back to hockey. First of all, at the end of the season, although everyone has played everyone, they did so a different number of times. Thus, the sum of opponents' points at the end of the season could be different between teams - including within the same division, if they had a different schedule. So, this could still be a very valid tiebreak. Secondly, the season is so long (82 games, unlike a chess Swiss which is rarely longer than 11 rounds), and that gives us a lot of midway points in time, when the all-play-all has not been completed yet! Here the Buchholz coefficient can clearly show, who has had the stronger opposition up until a certain moment.

Then, if we look at the remainder of the schedule for each team, and for every game we add the opponent's points we get an excellent remaining schedule strength estimator.

Wait... there's a caveat.

Unlike in a chess tournament, where every round occurs for everyone at the same time, and barring very rare circumstances, every participant played an equal amount of games at any point of the tournament, there may be a significant difference in the number of games played by different teams, so summing the opponents up will not work very well. And these opponents also played a different number of games, so their total amount of points is not a very good indicator.

Fortunately, it's not a big deal. Instead of totals, let's operate with per-game numbers. So the NHL Buchholz Coefficient for a team after N games becomes:

B = (Σn=1PPGn)/N. 

Same applies for the remaining schedule strength, where the per-game numbers of the remaining opposition are summed an averaged.

So, if the team played three games against opponents who currently are:
A) 6 points in 4 games, B) 3 points in 3 games, C) 2 point in 5 games, then the team's Buchholz value would be (6/4 + 3/3 + 2/5) / 3 = 2.9/3 ~ 0.967pts.

Here are the current (Mar 12th 2017) Buchholz coefficients and remaining schedule strengths for the entire 30 times (and note how the Blues stand out with plenty of matchups vs Colorado and Arizona remaining).

+-----------------------+-----------+-------+-------+
| Team Name             | PPG       | Buch  | RStr  |
+-----------------------+-----------+-------+-------+
| Washington Capitals   | 1.4179105 | 1.119 | 1.133 |
| Pittsburgh Penguins   | 1.4029851 | 1.117 | 1.127 |
| Minnesota Wild        | 1.3939394 | 1.090 | 1.070 |
| Columbus Blue Jackets | 1.3731343 | 1.125 | 1.132 |
| Chicago Blackhawks    | 1.3283582 | 1.088 | 1.096 |
| San Jose Sharks       | 1.2985075 | 1.106 | 1.106 |
| New York Rangers      | 1.2941176 | 1.120 | 1.184 |
| Ottawa Senators       | 1.2537313 | 1.105 | 1.169 |
| Montreal Canadiens    | 1.2352941 | 1.122 | 1.097 |
| Edmonton Oilers       | 1.1791044 | 1.121 | 1.040 |
| Anaheim Ducks         | 1.1764706 | 1.102 | 1.150 |
| Calgary Flames        | 1.1764706 | 1.099 | 1.140 |
| Boston Bruins         | 1.1470588 | 1.115 | 1.151 |
| Toronto Maple Leafs   | 1.1343284 | 1.114 | 1.150 |
| Nashville Predators   | 1.1323529 | 1.105 | 1.116 |
| St. Louis Blues       | 1.1194030 | 1.144 | 0.943 |
| New York Islanders    | 1.1194030 | 1.142 | 1.103 |
| Tampa Bay Lightning   | 1.0895522 | 1.121 | 1.134 |
| Los Angeles Kings     | 1.0746269 | 1.118 | 1.104 |
| Philadelphia Flyers   | 1.0447761 | 1.122 | 1.179 |
| Florida Panthers      | 1.0298507 | 1.118 | 1.175 |
| Carolina Hurricanes   | 1.0000000 | 1.138 | 1.136 |
| Buffalo Sabres        | 0.9855072 | 1.127 | 1.158 |
| Winnipeg Jets         | 0.9565217 | 1.110 | 1.143 |
| Vancouver Canucks     | 0.9558824 | 1.115 | 1.152 |
| Dallas Stars          | 0.9552239 | 1.119 | 1.100 |
| Detroit Red Wings     | 0.9545455 | 1.151 | 1.059 |
| New Jersey Devils     | 0.9117647 | 1.148 | 1.132 |
| Arizona Coyotes       | 0.8358209 | 1.133 | 1.098 |
| Colorado Avalanche    | 0.6119403 | 1.128 | 1.164 |
+-----------------------+-----------+-------+-------+

In tne next installment we're going to talk about the application of the Sonneborn-Berger coefficient to the NHL regular season.


Thursday, March 2, 2017

On schedule - played and remaining

Here I would like to present visualization of the schedule of the teams, played and remaining. This is actually a graphic representation of the Buchholz/Sonneborn and teams Elo tables I present on the website.

First, let's start with the played games and points.


 Naturally, most of the squares above the X-diagonal indicate more points than the ones below; however we can see interesting anomalies, such as BUF-OTT, TOR-BOS, ARI-SJS, WPG-CHI and probably the most intriguing: NYR - WSH (expected 1st round meeting)

Another unusual thing is that the Sharks are only playing Colorado twice this season rather than the regular 3-4 intraconference games.

Now let's take a look at the remaining games and the expected points.


We can see that STL may expect a big boost from having to play Colorado four(!) more times this season as well as Arizona three times and that Ottawa has two biggest season series mostly unresolved - against MTL and BOS. The expected points are being calculated based on teams Elo rating:

xPts = Ngames/(1 + 10(Eloopp-Eloteam)/400))

however for the sake of precision this number should've been scaled by 2 (since it produces an outcome between 0 and 1 (0.5 for a "tie") and also by the OT factor, i .e. the probability of a team getting an OT point, around 1.125. But for visualization purposes this does not matter.

There are also nice patters indicating travels through California and Western Canada. 

Sunday, February 19, 2017

On schedule breaks - some crosstables

I took a look at how the teams performed one against another depending on the break length. All lengths longer than five were truncated to five, and the back-to-backs are designated as 0-length.

Here's the crosstable for the 2015 season:
#days 0 1 2 3 4 5
0 X 84-90-28 35-38-8 7-6-1 3-7-0 4-2-0
1 118-65-19 X 118-82-27 28-18-9 6-4-1 7-5-0
2 46-29-6 109-95-23 X 15-11-3 5-1-0 2-0-0
3 7-3-4 27-23-5 14-11-4 X 5-4-0 0-1-0
4 7-3-0 5-3-3 1-4-1 4-4-1 X 2-0-0
5 2-2-2 5-4-3 0-2-0 1-0-0 0-2-0 X

And here's the one for the ongoing, 2016 season, in the midst of bye weeks:
#days 0 1 2 3 4 5
0 X 59-48-23 21-23-7 7-3-3 3-0-0 6-3-2
1 71-48-11 X 61-64-21 16-7-3 3-4-0 11-4-0
2 30-13-8 85-44-17 X 5-3-5 0-0-0 1-1-0
3 6-5-2 10-12-4 8-4-1 X 3-1-0 0-0-0
4 0-2-1 4-1-2 0-0-0 1-1-2 X 2-0-0
5 5-5-1 4-9-2 1-0-1 0-0-0 0-2-0 X

It's obvious that the bye weeks are no good for the teams, and NHL should convince the NHLPA to rescind it for 2017/18.

Just for fun, here's the aggregate since 2005, when the ties were abolished:
#days 0 1 2 3 4 5
0 X 1013-1030-285 352-380-104 117-104-26 37-34-10 45-34-11
1 1315-760-253 X 1128-917-284 305-231-67 119-86-24 61-42-7
2 484-267-85 1201-852-276 X 149-103-34 44-24-8 15-10-2
3 130-87-30 298-232-73 137-114-35 X 20-19-1 3-3-1
4 44-32-5 110-87-32 32-34-10 20-9-11 X 6-4-1
5 45-30-15 49-44-17 12-13-2 4-2-1 5-6-0 X

Soon to become part of the website!

Thursday, February 16, 2017

Another rule change suggestion

Better less, but better
V.I. Lenin

I've got another rule change suggestion, this one even simpler:

Allow teams to decline penalty shot awards in favor of a regular power-play.

I think it adds more tactical variety to the game and discourages penalties on breakaways that are worse in penalty shooting.

As a side matter, I think: a player who is charged with the offense after which the penalty shot is awarded should still be added a minor penalty (2 minutes) in the statistics.

Friday, February 10, 2017

On Leads Changes and Swings

Wild thing, you make my heart sing
You make everything groovy, wild thing

Also inspired by Twitter, and because I can, I decided to gather statistics on games with
  • most lead changes*
  • most lead swings**
Here, for the 2016/17 season:
By most lead swings:
AWAY    HOME   Date        Sco LC LS
CHI  vs DAL  on 2017/02/04: 5-3 7 3
CBJ  vs OTT  on 2017/01/22: 7-6 11 3
PHI  vs STL  on 2016/12/28: 3-6 7 3
MTL  vs PIT  on 2016/12/31: 3-4 7 3
CHI  vs NYI  on 2016/12/15: 5-4 7 3
ARI  vs PHI  on 2016/10/27: 5-4 9 3

with 60 games at 2 lead swings. Dallas leads the way with 8 games with at least two swings, and Carolina, Chicago, NY Islanders and Winnpeg follow with 7 each.

By most lead changes:


AWAY    HOME   Date        Sco LC LS
CBJ  vs OTT  on 2017/01/22: 7-6 11 3
TOR  vs WSH  on 2017/01/03: 5-6  9 2
TOR  vs NYI  on 2017/02/06: 5-6  9 2
NYI  vs DET  on 2017/02/03: 4-5  9 1
CHI  vs COL  on 2017/01/17: 6-4  9 2
CAR  vs NYI  on 2017/02/04: 5-4  9 2
CHI  vs STL  on 2016/12/17: 6-4  9 1
BUF  vs OTT  on 2016/11/29: 5-4  9 1
ARI  vs PHI  on 2016/10/27: 5-4  9 3

with 31 game with at least 7 lead changes. Here we've got Carolina, Chicago and NY Islanders at the lead with at least 6 games with 7 or more lead changes.

And what do we get historically?

The wildest games, regular season, by lead swings:
AWAY    HOME   Date         Sco LC LS
PHI  vs BOS  on 2011/01/13: 5-7  11 5
COL  vs CGY  on 1991/02/23: 8-10 11 5
ARI  vs CGY  on 1991/01/15: 5-7  11 5
PHI  vs COL  on 1988/11/19: 5-6  11 5

with 30 games at 4 lead swings.

The wildest games, regular season, by lead changes:
AWAY    HOME   Date        Sco LC LS
DET  vs SJS  on 2005/11/26: 7-6 13 4
MTL  vs COL  on 2002/12/06: 6-7 13 2
COL  vs SJS  on 1997/04/04: 6-7 13 2
ARI  vs PHI  on 1990/01/25: 6-8 13 1
TOR  vs PIT  on 1989/10/25: 8-6 13 3
COL  vs WSH  on 1997/11/18: 6-6 12 3
PIT  vs NJD  on 1993/04/14: 6-6 12 1
BUF  vs CAR  on 1991/12/07: 6-6 12 4
CAR  vs TOR  on 1990/02/14: 6-6 12 2
VAN  vs TOR  on 1988/01/04: 7-7 12 3

with 65 games at 11 lead changes (even numbers can only occur in the ties era).

The wildest games, playoffs, by lead swings:
AWAY    HOME   Date        Sco LC LS
STL  vs DAL  on 1999/05/08: 4-5 9 4
MTL  vs COL  on 1993/04/26: 5-4 9 4
EDM  vs LAK  on 1992/04/20: 5-8 9 4

with 33 games at 3 lead swings.

The wildest games, playoffs, by lead changes:
AWAY    HOME   Date        Sco LC LS
BUF  vs OTT  on 2006/05/05: 7-6 13 2
PHI  vs CHI  on 2010/05/29: 5-6 11 3
COL  vs SJS  on 2010/04/16: 5-6 11 1
PHI  vs WSH  on 1989/04/11: 8-5 11 3

with 42 games at 9 lead changes (only odd numbers can occur)

The data is presented since the year 1987 - the earliest boxscores from the NHL.com
Now this one is going to make it into the website, I just haven't decided in which form.

*   Lead swing is defined as when a team takes the lead after the other team had it. 
** Lead change is defined as when a team loses the lead, even if only temporarily to a tied score.

Thursday, February 9, 2017

On goalposts statistics

Why does the cat lick his balls?
Because it can.

Recently I saw a request on a stats of goal posts / crossbars hit per game. While I do have that statistic per player, I haven't one for games, so - since I can - why shouldn't I produce one?

About half an hour of Perl-ing created the following summary:

Irons altogether, top:
AWAY    HOME                P C T
OTT  vs BUF  on 2011/12/31: 8 0 8
VAN  vs FLA  on 2010/02/11: 7 0 7
WPG  vs FLA  on 2009/12/05: 6 1 7
TOR  vs BUF  on 2007/10/15: 6 1 7
TBL  vs FLA  on 2006/04/01: 6 1 7
PHI  vs PIT  on 2006/03/12: 7 0 7
COL  vs NYI  on 2005/12/17: 7 0 7
NSH  vs DAL  on 2016/03/29: 4 2 6
PIT  vs NSH  on 2014/03/04: 5 1 6
NYI  vs TBL  on 2014/01/16: 3 3 6
DAL  vs VAN  on 2013/02/15: 5 1 6
STL  vs CAR  on 2012/03/15: 5 1 6
WPG  vs MTL  on 2011/01/02: 6 0 6
OTT  vs VAN  on 2011/02/07: 6 0 6
MTL  vs CAR  on 2011/11/23: 6 0 6
LAK  vs DAL  on 2010/03/12: 4 2 6
NJD  vs TBL  on 2009/10/08: 6 0 6
LAK  vs DAL  on 2009/10/19: 5 1 6
DAL  vs CBJ  on 2009/01/31: 5 1 6
COL  vs CHI  on 2009/11/11: 6 0 6
PIT  vs WPG  on 2008/01/30: 5 1 6
NYR  vs NJD  on 2008/04/09: 4 2 6
STL  vs ARI  on 2007/01/15: 5 1 6

followed by 109 games with 5 irons hit.

Crossbars, top:
AWAY    HOME                P C T
CGY  vs CBJ  on 2008/11/08: 1 4 5
NYR  vs FLA  on 2007/11/23: 0 4 4
PHI  vs FLA  on 2006/12/27: 1 4 5
BUF  vs DAL  on 2017/01/26: 1 3 4
EDM  vs DAL  on 2016/01/21: 2 3 5
TOR  vs STL  on 2015/01/17: 1 3 4
CHI  vs ANA  on 2015/05/19: 1 3 4
BOS  vs VAN  on 2015/02/13: 1 3 4
NYI  vs TBL  on 2014/01/16: 3 3 6
CHI  vs ANA  on 2008/01/04: 2 3 5
CAR  vs FLA  on 2007/11/12: 1 3 4

followed by 50 games with 2 crossbars hit.

The data is extracted from the PBP files of NHL.com, from the year 2005 on.

However I consider this a one-time effort and will not add this to the website itself.

Wednesday, February 1, 2017

On Streaks and Breaks

3 articles in two days... What's gotten into me.

So after remembering the Botwinnik's quote, and after publishing the stats how the teams actually play after different breaks, a new idea came to me - check whether the teams on streaks are affected positively or negatively by breaks.

For the sake of the analysis, I assumed the following:
  • A break is a period of three days at least between games.
  • A streak is a sequence of at least three wins in a row, or at least seven points in four games.
So we check for the last thirty years (as far as NHL.com would let us in) if the streaking team was able to keep the streak alive, or whether the streak was broken:

SEASON ALIVE BROKEN
1987/1988 5 11
1988/1989 12 7
1989/1990 8 14
1990/1991 13 11
1991/1992 17 13
1992/1993 20 16
1993/1994 19 20
1994/1995 2 7
1995/1996 15 11
1996/1997 15 11
1997/1998 12 20
1998/1999 12 9
1999/2000 18 12
2000/2001 21 11
2001/2002 17 6
2002/2003 13 10
2003/2004 12 14
2005/2006 31 15
2006/2007 16 16
2007/2008 23 24
2008/2009 15 20
2009/2010 14 17
2010/2011 19 11
2011/2012 22 11
2012/2013 6 3
2013/2014 15 15
2014/2015 16 16
2015/2016 16 14
2016/2017 8 11
TOTAL 432 376

Actually, it looks like the streaks weren't affected by the break either way. 53.4% of the times the streak continued, 46.6% of the time it went dead. There is a very large discrepancy between the seasons, although I'd attribute it to lesser parity between the teams overall in these years. For the last 5 years, the probability for the streak to stay alive has been 50.8% (61 cases of extended streaks out of 120).

Now, what would change, if we define a break a little bit longer, by a single day:

SEASON ALIVE BROKEN
1987/1988
2
2
1988/1989
4
1
1989/1990
3
1
1990/1991
4
4
1991/1992
7
8
1992/1993
8
2
1993/1994
7
7
1994/1995
1
1
1995/1996
6
7
1996/1997
6
2
1997/1998
5
5
1998/1999
6
2
1999/2000
9
3
2000/2001
7
4
2001/2002
6
4
2002/2003
5
1
2003/2004
3
6
2005/2006
16
4
2006/2007
8
5
2007/2008
10
6
2008/2009
8
8
2009/2010
6
6
2010/2011
9
3
2011/2012
8
4
2012/2013
2
1
2013/2014
3
9
2014/2015
5
9
2015/2016
7
6
2016/2017
3
8
TOTAL
174
129

The changes are rather interesting. Now, overall, the chances of streak to continue are up to 57.4%, and only in 42.6% of the cases it came to a stop. But in the last five years - since the last lockout - and with the schedule changes so that there are at least two games between every team (increasing travel), the ratio drops from 50.8% to the humble 37.7% (20 out of 53!)

Extending the breaks to five days provides too little data to draw any conclusions.

So I am inclined to agree with Dr. Botwinnik, that extended breaks of more than three days throw teams off their pace and should be reduced to minimum. Three days are borderline alright.

A rule change suggestion

There's no irreplaceable people.
I.V. Stalin

Rushing this one up, because this idea already came to my mind before, but I forgot about it. The age is taking its toll.

Anyways. Everyone is talking these days about rule changes. I've already expressed a few thoughts on the scoring systems, but I am not original there. Now, however, I want to make a suggestion I haven't seen mentioned yet.

Allow soccer (baseball, too)-like substitutions in hockey. Allow the coaches to replace players in the original lineup at the start of the game with one of the "healthy scratches", as submitted in the roster sheet, like the one Peter DeBoer recently messed up in the game against Edmonton.

The substitution goes ONE-WAY. That means that the player that was substituted cannot return to the game. The substitutions may occur:

  • During the intermissions
  • During the commercial breaks
  • During a time-out
First and foremost this will allow teams to handle early injuries much better. Your D-man got injured at the 7:04 mark of the 1st period? Around 10:00 there will be a commercial break, you can substitute him with one of the scratches!

Second, it may allow coaches to send stronger messages to players they deem slacking. Rather than shorten the roster by benching that guy, you can send an eager healthy scratch in. Of course, then the "slacking" player is benched for the whole remainder of the game.

Third (oh, I did military service, so I have a natural obsession of providing three reasons for each thing), it may give the coaches some extra flexibility if a designated roster player gets slightly injured in the warm-ups. Then a scratch takes his place as usual, but if the original player is fixed by the 1st intermission, he can substitute the starting scratch.

The substitutes will have to come from the "scratch" list with the exception of the emergency goaltending contracts.

Oh, and I am sure the NHL website will make a mess out of it in their game reports.

Tuesday, January 31, 2017

On A Hockey Website

Yesterday our town was honoring signor Trombetti Giovancarlo, 
who after thirty years of work, alone, with no help, 
recorded the opera "Aida" by Giuseppe Verdi.

Gianni Rodari, "A Musical Story".

Just to not let the month of January slip away without another post, I got sentimental and decided to tell a small story about how my website came to life.

There was a void. A lot of time people on hockey boards would wonder if specific statistics of players and teams were available, and they wouldn't, although the raw data seemed to be there. Then, there was the fantasy hockey world, with its pizzazz, and asking for a predictive tool, - and again, the raw data seemed to be there.

Now, I am a sysadmin by trade, with occasional forays into software development, and since I've been doing Perl for all of my career, I got a few exposures to the Web development process and to databases. I've got a college degree in Engineering, so that gave me some idea about statistics.

So I got a look at the publicly available NHL reports, but was unsure of how to use them. I tried some standard database approach, but it wasn't working.

The turning point came when I attended a lecture on MongoDB. That one turned out to be perfect, with the loosely compiled NHL stats documents, just spill them into the Mongo database. Then extract data from them and summarize them into tables. Store the tables in an SQL database for quick serving on the website. And along came more luck - a lecture on the Mojolicious Perl Web framework which equipped me with an easy solution for how to run a website.

Thus, I was able to actually implement what I had in mind. First came the spider part, to crawl and collect the data available on NHL.com. Fortunately, I was able to scrape everything before the website's design changed drastically, and the box scores prior to 2002 stopped being available. I got everything from the 1987/88 season on.

Then, I started writing the parsers,.. and had to take a step back. There was quite a lot of inconsistent and missing reports. Therefore I had to a) add a thorough testing of every report I scraped to ensure it came together, b) look for complementing sources for whatever data was missing. So before I got done with the parsers, I had a large testing framework, and also visited all corners of the hockey-related websites to get the missing or conflicting data resolved, even the online archives of newspapers such as USA Today. Some of the downloaded reports had to be edited manually. Then, NHL.com landed another blow, dropping draft pick information from their player info pages. Luckily, the German version of the website still had it, so I began to scrape the German NHL website too.

I was able to produce the unusual statistics tables relatively quickly and easily. However I decided that the website will not open without the prediction models I had in mind. Being a retired chess arbiter and a big chess enthusiast I decided to try to apply the Chess Elo rating model to the performances of hockey teams and players. Whether it really works, or not, I don't know yet. I guess by the end of the season I can make a judgement on that.

In October 2016 I opened my website by using a free design I found somewhere online. Unfortunately, I quickly realized it was not a good fit with the contents the site was serving, so I sighed, breathed some air, opened w3schools.com in my browser, and created my own design. And a CMS too. At least I am happy with the way the site looks now, and even more happier that when someone asks a question - on Twitter, Reddit or hockey forums - whether it's possible to measure a specific metric, I am able to answer, 'Already done! Welcome to my website!'

At the end I'm a software developer, a web designer, a DBA, a sysadmin, a statistician and an SEO amateur. Oh, and a journalist too, since I'm writing a blog.

Monday, January 23, 2017

On Intangibles (a small addendum)

One "intangible" being tossed around is "motivation" of the players. Which brings memories of an episode I was witness to.

In 2003/04, in the Israeli Top Tier Chess League (which is indeed no slouch) our club managed to assemble an outstanding team, featuring, among others, a former Champion of Russia and a former Champion of Europe. I was part of the management team, and orchestrated bringing the first of the two, who also happened to be my childhood friend back in Leningrad, Soviet Union.

And so, in round III we were to face our main rival for the title, and the club's GM (also a pedestrian chess player) gathered the team and carried out a pronounced motivational speech, how we have to beat the team we're facing, and so on, and so on.

We lost 1½-4½ without winning a single game and lost any chance for the championship we could have.

Sunday, January 22, 2017

On Intangibles. Carpe Jugulum.


Often the general managers, the coaches and the players talk about "intangible values". Sometimes it's about the "locker room contributions". Sometimes it's about "passion". In my opinion, these two are actually negligible and in certain cases even harmful. I remember such references, especially the latter one, made about Israeli soccer players, and that usually meant that the player doesn't have a lot of talent to go along, but contributes a lot of passion into the game. While a passionate play can indeed ignite the play and carry the team along, more often it indicated dumb physical low-talent execution that actually harmed the team.

However, there is one intangible that I take my hat off in front. It's the one that I always admired, and myself did not have enough in my chess career. It's the ability to go for the throat of the opposition at even momentary display of weakness by it, or as Terry Pratchett put it one of its books, 'Carpe Jugulum1'.

So what is it, in my understanding? It is the situation when your opponent puts itself into an inferior position in a volatile situation (for example, in a close score), such as by a penalty, or by an icing at the end of a long shift, or by allowing an odd-man rush, and you are able to capitalize on it, yanking any remains the carpet of security from under the feet of the opposition. And then, you continue to hammer the blows on the opposition until it collapses completely. Some also call it the 'killer instinct'. This blog (and this article too) sins with abundance of examples from chess, so let me plant one from tennis... Before the match between Lleyton Hewitt and Taylor Dent at the New York Open, 2005, the latter one complained: 'He displays a poor sportsmanship: taking joy in double errors at the opponent services as well as in unforced errors.' 'I don't care what Dent thinks about it', parried Hewitt, 'I always go for a win, and on the way to it many things are allowed.'

Machiavelli advised the rulers and the politicians, 'Don't be kind'. Winston Churchill also knew something about achieving the goals when he was recommending: 'If you want to get to your goal, don't be delicate or kind. Be rough. Hit the target immediately. Come back and hit again. Then hit again with the strongest swing you can...'

All the chess champions had it, the extremes going to Alexander Alekhine, Robert J. Fischer and Garry Kasparov. Many wonderful players that never got the title complained that they couldn't commit themselves to going for the throat of the opponent time after time.

These qualities were elevated to perfection by the two best teams of the first half of 2010s, by the Los Angeles Kings and the Chicago Blackhawks that split between themselves five cups out of six from 2010 to 2015. Even when both teams seem to be struggling and wobbling, they seemed to be able to instill some kind of uncertainty into their opponents - and certainty into the spectators that these teams are going to be able to make a fist out of themselves that is going to hammer their opponents once they display any kind, and minimal level of weakness. That capability was championed by their leaders, Anze Kopitar, Drew Doughty and Jeff Carter for the Kings, and Jonathan Toews, Patrick Kane and Duncan Keith for the Hawks. When the playoffs series between the Blackhawks and their opponents were tied 3-3, Chicago has always been the favorite to win the game 7 because of their Carpe Jugulum reputation. The Kings gained even more notoriety, first by burying their sword to the hilt into each and every opponent in 2012 en route from the #8 seed to their first Stanley Cup, and then from the reverse sweep they managed against the Sharks that started their 2014 Cup run - which included two more comings from behind, 2-3 and 1-3. And even in 2016, down 1-3 to the Sharks in the first round of the playoffs somehow fans around the league were not ready to commit to the Sharks as the favorites to win the series, because the Kings were a hair away from the Sharks' throat in game 4, from 0-3 to 2-3 in the 3rd period, and then in game 5, they indeed were able to erase the 0-3 deficit into a 3-3 tie.

Well, that tie didn't hold, the Sharks broke the stranglehold and got a boost that carried them all the way to their own first even Stanley Cup Finals, and that outcome got the Kings' reputation as the Carpe Jugulum team damaged to a degree. So did the Blackhawks' one, losing their game 7 to a team that - along with the Sharks and, for instance, the Washington Capitals - had a reputation of a somewhat nonplussed one - the St. Louis Blues.

It would be entertaining to see whether the Carpe Jugulum landscape changes this year in the league, and whether the teams who were able to overcome their "benign" reputation will be able to go all the way to the Cup Finals - through their opponents' throats.

Chess Grandmaster Gennady Sosonko wrote, 'A real professional, having thought about the situation on the board, acts most decisively. He knows, that during the game, there should be no place either for doubt, nor for compassion, because a thought which is not converted into action, isn't worth much, and an action that does not come from a thought isn't worth anything at all.'

And it's important to remember, Carpe Jugulum is a necessary key to success in a competitive environment only. Albert Einstein used to say that chess "are foreign to me due to their suppression of intellect and the spirit of rivalry."

1Carpe Jugulum (lat.) - seize the throat.