Sunday, October 21, 2018

NHLErrata launch

Ever since I started collecting the NHL data it was very important to me to validate the collected information. So I created a set of checks that finally formed a whole library to test both the NHL boxscore feeds and the HTML reports.

I managed to fish out quite a few errors and inconsistencies. Some where systemic, and could be fixed in software, some required a manual intervention, sometimes by editing the source file, sometimes by providing the correct value overriding what the parser would read. These interventions, classified into a variety of types, formed another library.

But I also wanted to share the errors that I found and the fixes I figured out with the analytics community. So I decided to create a website dedicated to it. It took a while, but finally a couple of days ago I was able to open There you can find:

  • An overview of data sources.
  • Information on missing players and events
  • Information of broken reports, players and events
  • Systemic problems encountered with the reports

Both mentioned libraries, and are part of my scrape-to-database package on CPAN.

Monday, October 15, 2018

New features for the 2018/19 season.

After a period of deep dormancy we welcome the 2018/19 on our website with three major changes:

1. MoreHockeyStats is now a network of sites:

  • - the unusual statistics from the NHL for the league, the teams, the players, the coaches and the draft.
  • - the Elo ratings for teams, coaches and players, in general and for particular stats and situations.
  • - the errors we discovered while crawling and testing the NHL data.
2. All our data is now available through a sort of API. When a table is displayed, there is a set of links on how you can access the data:

  • JSON - receive the data displayed in the table as a JSON via direct link. 
  • CSV - receive the data in a CSV table.
  • Table - display the data as a pure HTML table with a direct link.
  • Link - direct link to the results displayed.
These links have a systemic structure and thus can be crawled by a bot.

3. Most of our tables now feature links to personal cards for teams, players, coaches and even the rinks. These cards display the visual changes to the stat displayed in the table for the particular team, player etc. We welcome ideas for the cards in the pages that do not have these cards yet. Here's a sample playercard:

Welcome, and have a great 2018/19 hockey season.

Tuesday, June 12, 2018

Success randomness in chess

This is a surprising entry. It's not about hockey at all But maybe it's good to write surprising entries after a long silence like this. This post was inspired by the research of randomness in playoff results in four major sports that was discussed on Twitter between @StatsByLopez, @StatsInTheWild and @BaumerBen.

Chess is regarded as the least random sport, i.e. most frequently the top seeds succeed and the bottom seeds trail in the standings. Naturally, there is some randomness due to humans making errors as well as the error margins in seeding. But I decided to put this to test and to analyze the closest analogs of playoff competitions in chess: the Candidates competitions between 1950 and 1983 and the FIDE World Cup/KO championships between 1998 and 2017. From 1985 to 1997, due to the prolonged competition between Karpov and Kasparov at first, and the breakaway of Kasparov from FIDE later, the playoff competition systems were unstable and I decided to avoid them in this research. Maybe I'll take a deeper look and include them later.

I. Candidates Tournaments (1950-1983)
These were the events where most of the top chess players competed to determine who is going to challenge the World Champion. In these events there was no literal seedings, but we can assign seeds by the historical Elo ratings of chess player at the start of the event, as provided by the wonderful chess statistics site ChessMetrics. Naturally World Champion, most often ranked #1 in the world did not take part in the competition. 

Twelve tournaments were examined, and here are the outcomes for the top 4 placements at the end:

Seed placement in the top 4 finishers
Year # Pl System 1st 2nd 3rd 4th
1950 10 DRR 2 8 1 9
1953 15 DRR 2 6 7 1
1956 10 DRR 1 2 3 9
1959 8 QRR 1 4 3 2
1962 8 QRR 1 6 7 4
1965 8 P/O 2 1 7 5
1968 8 P/O 2 1 4 3
1971 8 P/O 1 4 3 2
1974 8 P/O 1 2 5 6
1977 8 P/O 1 8 3 7
1980 8 P/O 1 6 4 2
1983 8 P/O 1 5 4 6
Systems: DRR - Double Round-Robin. QRR - Quadruple Round-Robin. P/O 8-game or longer head-to-head matches.

In the knock-out playoff format, the opponent seedings were random rather than rating based!

Now if we find the average value of seeds, there are quite big surprises both ways!

Avg. seed1.334.414.254.25

The value for the column "1st", i.e. for winners is outrageously, uber-surprisingly good. Average seed of 1.33 means 2/3 of the time the top seed won the competition and the remainder of the time the 2nd seed won it. By the way the first seed never finished outside top 4 (8 wins, twice runner-up and once for 3rd and 4th each). That indicates a minimal randomness in the outcome. But then baffling is the average seed taking second place: 4.41, almost completely random, bigger than both average seed for 3rd and 4th places. For myself I haven't any good explanation that would stem from the game theory, so I'll leave this exercise to my readers.

II. World Knock-out tournaments (1998-2017)
At the end of the 1990s the World Chess Federation (FIDE) tried a completely new format: 128 (in the first three editions - 100) players who qualify through various criteria, played a massive 7-round playoff tournament, with most of the rounds consisting of just two games (rapid and blitz tiebreaks if necessary), and only the final stages were played as longer matches - and even these matches became shorter as the tournament developed. The tournament was serving as a World Championship between 1998 and 2004, and as a World Cup (a World Championship qualifier) since 2005.

Although quite frequently some of the top players in the World (sometimes more than a half of the actual top 10) did not take part in the event, the stability of the seedings was ensured because all the 100 or the 128 participants were ranked by their Elo ratings prior to the event.

The summary, similar to the table posted above, turns out like this:

Seed placement in the top 4 finishers
Year # pl 1st 2nd 3rd 4th
1998 100 2 9 18 8
1999 100 36 31 46 5
2000 100 1 4 3 46
2002 128 19 4 15 1
2004 128 28 3 1 18
2005 128 3 9 2 4
2007 128 11 5 17 10
2009 128 1 7 22 12
2011 128 9 6 2 4
2013 128 3 21 23 32
2015 128 11 16 26 4
2017 128 5 11 8 2
We immediately see the high element of randomness in the top placements. In the years 1999 and 2000 46th seed out of 100 made it twice into the semifinals! The average seed summary table comes out as following:

Avg. seed10.7510.515.2512.17

Interestingly, both the consistency and the inconsistency of the part I (extremely predictable winner and rather random runner-up) disappear. A top 20 player is expected to either win or to finish second, and the semifinalists are expected to be not very much lower seeded. It is once again confirmed, that the length of the series between two competitors is key to eliminate randomness.

 However, the average seed as runner-up in the 1950-1983 Candidates remains a mystery to me.

Tuesday, February 27, 2018

Website - A Page A Day, Part VII - Buchholz and Sonneborn-Berger

Part I
Part II
Part III
Part IV
Part V
Part VI (I need to make a table of contents entry)

This might be the best possible timing for revisiting this page.

We are now past the first section of the website, 'League', and onto the second and probably the richest one - 'Teams'. It opens with Buchholz and Sonneborn-Berger coefficients for the teams.

And why is the timing the best? Because I am going to give a lightning talk about this at the VANHAC 2018!

Actually I already blogged about these two before, so now only a brief recap will follow with a couple of examples:

1. The Buchholz coefficient

The Buchholz coefficient is simply the sum of the points of your opponents.

B = Σn=1N Pn

So, if you played five games, and your opponents currently have 5, 3, 8, 6 and 6 points, your Buchholz value will be 28. Please note, that the current number of points is always used, not the number of points at the moment of meeting. The outcome of the game does not matter (for that one see the Sonneborn-Berger).

 the Buchholz coefficient can clearly show, who has had the stronger opposition up until a certain moment.

Then, if we look at the remainder of the schedule for each team, and for every game we add the opponent's points we get an excellent remaining schedule strength estimator.

Wait... there's a caveat.

Unlike in a chess tournament, where every round occurs for everyone at the same time, and barring very rare circumstances, every participant played an equal amount of games at any point of the tournament, there may be a significant difference in the number of games played by different teams, so summing the opponents up will not work very well. And these opponents also played a different number of games, so their total amount of points is not a very good indicator.

Fortunately, it's not a big deal. Instead of totals, let's operate with per-game numbers. So the NHL Buchholz Coefficient for a team after N games becomes:

B = (Σn=1PPGn)/N. 

Same applies for the remaining schedule strength, where the per-game numbers of the remaining opposition are summed an averaged.

So, if the team played three games against opponents who currently are:
A) 6 points in 4 games, B) 3 points in 3 games, C) 2 point in 5 games, then the team's Buchholz value would be (6/4 + 3/3 + 2/5) / 3 = 2.9/3 ~ 0.967pts.

Here are the Mar 12th 2017 Buchholz coefficients and remaining schedule strengths for the entire 30 times (and note how the Blues stand out with plenty of matchups vs Colorado and Arizona remaining).

| Team Name             | PPG       | Buch  | RStr  |
| Washington Capitals   | 1.4179105 | 1.119 | 1.133 |
| Pittsburgh Penguins   | 1.4029851 | 1.117 | 1.127 |
| Minnesota Wild        | 1.3939394 | 1.090 | 1.070 |
| Columbus Blue Jackets | 1.3731343 | 1.125 | 1.132 |
| Chicago Blackhawks    | 1.3283582 | 1.088 | 1.096 |
| San Jose Sharks       | 1.2985075 | 1.106 | 1.106 |
| New York Rangers      | 1.2941176 | 1.120 | 1.184 |
| Ottawa Senators       | 1.2537313 | 1.105 | 1.169 |
| Montreal Canadiens    | 1.2352941 | 1.122 | 1.097 |
| Edmonton Oilers       | 1.1791044 | 1.121 | 1.040 |
| Anaheim Ducks         | 1.1764706 | 1.102 | 1.150 |
| Calgary Flames        | 1.1764706 | 1.099 | 1.140 |
| Boston Bruins         | 1.1470588 | 1.115 | 1.151 |
| Toronto Maple Leafs   | 1.1343284 | 1.114 | 1.150 |
| Nashville Predators   | 1.1323529 | 1.105 | 1.116 |
| St. Louis Blues       | 1.1194030 | 1.144 | 0.943 |
| New York Islanders    | 1.1194030 | 1.142 | 1.103 |
| Tampa Bay Lightning   | 1.0895522 | 1.121 | 1.134 |
| Los Angeles Kings     | 1.0746269 | 1.118 | 1.104 |
| Philadelphia Flyers   | 1.0447761 | 1.122 | 1.179 |
| Florida Panthers      | 1.0298507 | 1.118 | 1.175 |
| Carolina Hurricanes   | 1.0000000 | 1.138 | 1.136 |
| Buffalo Sabres        | 0.9855072 | 1.127 | 1.158 |
| Winnipeg Jets         | 0.9565217 | 1.110 | 1.143 |
| Vancouver Canucks     | 0.9558824 | 1.115 | 1.152 |
| Dallas Stars          | 0.9552239 | 1.119 | 1.100 |
| Detroit Red Wings     | 0.9545455 | 1.151 | 1.059 |
| New Jersey Devils     | 0.9117647 | 1.148 | 1.132 |
| Arizona Coyotes       | 0.8358209 | 1.133 | 1.098 |
| Colorado Avalanche    | 0.6119403 | 1.128 | 1.164 |

2. The Sonneborn-Berger coefficient.
This stranger beast is a metric extensively used for tie-breaks in chess-round robins and as an auxiliary tie-break tool to the Buchholz coefficient in non-round robin. Let's start with the definition.

$$SB = Σ↙{n=1}↖N f(R_n,P_n)$$

where Rn is the result against the n-th opponent, and Pn is the opponent's points score.
The function  f(Rn, Pn) in the NHL is defined as:

f(Win, Pn) = Pn/N
f(OW, Pn)  = 2*Pn/(N*3)
f(OL, Pn)  = Pn/(N*3)
f(L, Pn)   = 0
where N is the number of games by that opponent

to account for the overtime point.

Then, we can calculate the minimal possible SBmin value for a team with the given schedule so far this season, by assigning Wins to be against the weakest teams played, and the OW/OL against the weakest remainder until the sum of W, OW and OL points add up to the number of points the team currently has.

Similarly we shall calculate the maximal possible SBmax value by assigning Wins to be against the strongest teams played, and the OW/OL against the strongest of the remainder, assuming OT wins are about 1/4 of the whole.

Then the closer the actual SB is to the SBmin or SBmax we may be able to say whether the team is successful more against the bottom feeders, the top guns, or whether it achieves its points from the whole spectrum available.

Here is the table describing how teams had their SB positioned between SBmin and SBmax. on 03/12/2017, multiplied back by the number of games of each team for better visibility:

Pittsburgh Penguins1.4044.2846.4846.2453.06
Washington Capitals1.4044.7046.7447.7752.89
Minnesota Wild1.3742.2544.3646.6350.66
Columbus Blue Jackets1.3743.1045.3646.4452.15
Chicago Blackhawks1.3441.6143.9043.7950.80
San Jose Sharks1.3140.6842.9744.1649.84
New York Rangers1.3041.2543.6745.5550.92
Ottawa Senators1.2537.8440.0741.7946.78
Montreal Canadiens1.2539.3741.7441.0548.87
Anaheim Ducks1.1936.8639.4340.1247.15
Calgary Flames1.1835.9738.4938.2046.05
Edmonton Oilers1.1635.8638.3237.4345.70
Boston Bruins1.1534.7337.2337.7444.72
Nashville Predators1.1333.2836.1438.0444.72
Toronto Maple Leafs1.1334.6436.9935.6644.02
St. Louis Blues1.1234.6937.1438.5244.50
New York Islanders1.1234.3636.9437.9444.71
Tampa Bay Lightning1.0932.6234.9835.4142.06
Los Angeles Kings1.0732.1034.6633.5642.34
Philadelphia Flyers1.0431.2633.5632.0140.48
Florida Panthers1.0330.8933.1230.9539.82
Carolina Hurricanes1.0029.4331.7832.4138.85
Buffalo Sabres0.9930.0932.4933.4339.68
Winnipeg Jets0.9627.5530.3531.4838.75
Vancouver Canucks0.9628.4830.9129.0238.21
Dallas Stars0.9428.0530.6231.1638.34
Detroit Red Wings0.9429.1231.1230.0237.13
New Jersey Devils0.9127.7830.1528.6337.27
Arizona Coyotes0.8425.1327.2425.8633.56
Colorado Avalanche0.6117.9019.7419.9825.25

Once again, we use Point Per Game values because the teams and their opponents have a different number of games played at most of the moments within a season.

We would dare to make one more step forward and claim that the team that performs closer to SBmax seem to have a coach problem (notable differences highlighted in green in the table above). The roster is there to compete against the best, but the points aren't trickling in at a pace goo

Sunday, February 25, 2018

Website A Page - A Day, Part VI - Warm Welcome

Part I
Part II
Part III
Part IV
Part V

With nothing else to do for about half an hour, why not to resume the series?

Have you wondered if it is your team, or your goalie, who always has the first career goal scored against them? Now you can check if your feeling is right. On the page 'Warm Welcome' of the League section on the website we present a look at the first goals scored against different teams and goalies.

With the two left items in the top menu you can select the span over which you want to see the statistics. Want one season only? Select the same season for the start and for the end. Then you can toggle the view by the team or the goalie the first goal is scored against.

For example, for the 2016/17 season, team view:

# Team Goals Allowed
1 Dallas Stars 7
2 Carolina Hurricanes 7
3 New Jersey Devils 7
4 Tampa Bay Lightning 6
5 Ottawa Senators 6
6 Detroit Red Wings 5
7 Pittsburgh Penguins 5
8 New York Rangers 5
9 San Jose Sharks 5
10 Vancouver Canucks 5
11 Colorado Avalanche 5
12 Nashville Predators 4
13 Boston Bruins 4
14 Buffalo Sabres 4
15 Winnipeg Jets 4
16 Arizona Coyotes 3
17 Columbus Blue Jackets 3
18 Edmonton Oilers 3
19 Montreal Canadiens 3
20 Anaheim Ducks 3
21 St. Louis Blues 3
22 Los Angeles Kings 2
23 Washington Capitals 2
24 Calgary Flames 2
25 Minnesota Wild 2
26 New York Islanders 2
27 Toronto Maple Leafs 2
28 Chicago Blackhawks 2
29 Florida Panthers 2
30 Philadelphia Flyers 0

Take a look. You might be surprised. Or not.

Tuesday, February 20, 2018

A Lemma Research - Penalty Box and TOI - Part II

Part I

After completing the first part of the lemma research - penalty box - the second part was shorter, easier, but just as useful. I decided to find out the share of time teams spend on average while at even strength, on power play/shorthanded and with empty net. Then given this number, and the number of goals scored in each such situation, I was able to calculate the frequency of EVG/PPG/SHG/ENG or the reverse of it which I called the difficult of such goal.

I scanned the database of all games between the 1999/00 season and today, and all the goals extracted from these games. Penalty shot goals were ignored, regardless if during the game itself, or in post-game shootout. The EN time was calculated as total game time minus goaltender TOI. PP/SH time was deducted from the recorded PP TOI of the players. The EV time would naturally become the total game time minus EN minus PP of both teams.

Then I calculated the difficulty of scoring a goal in each of these situations through the following formula:


where the difficulty of the EV goal is considered "1". Here are the combined results of the difficulties in a table:
1999 1.000 0.502 3.506 0.162
2000 1.000 0.473 3.387 0.146
2001 1.000 0.492 3.635 0.153
2002 1.000 0.468 3.585 0.167
2003 1.000 0.445 3.127 0.221
2005 1.000 0.535 4.247 0.272
2006 1.000 0.506 4.000 0.228
2007 1.000 0.458 3.597 0.187
2008 1.000 0.438 3.745 0.183
2009 1.000 0.456 4.044 0.192
2010 1.000 0.450 3.517 0.177
2011 1.000 0.460 3.568 0.169
2012 1.000 0.430 3.890 0.169
2013 1.000 0.453 3.209 0.198
2014 1.000 0.427 3.564 0.171
2015 1.000 0.415 3.284 0.158
2016 1.000 0.419 3.252 0.178
2017 1.000 0.427 3.052 0.168

If you divide 1 by these values you can get the relative frequency of goals scored in each situation.

The dataset containing this data is available on the website, on the Request Analysis page.

So why did I need these two lemmas? That blog post won't be ready any time soon, and I better resume the "Page A Day series'.

Monday, February 19, 2018

A Lemma Research - Penalty Box and TOI

Nothing derails from the regular blogging like a research that you've been so eager to start, but were putting off to finish simpler and more materialistic stuff, but then you couldn't hold it off any longer.

But then, I realized that in order to do that research I need to perform a lemma research. Just like its mathematical namesake, a lemma research is one that is done for the sake of a bigger one, yet producing a useful result by itself.

So I noticed that I needed the penalty box data. Who was in the penalty box during a power play goal? More specifically, who was responsible for that goal, i.e. who left the penalty box as the result (or had a non-matched 5-minute major penalty during it). Once there was a feed that had penalty box data, but it was only since 2010 or so, and it seems to have become discontinued. Therefore I gathered the game data I already had and just went through the power play goals from 1987/88 through now, and tried to match them with penalties, while weeding out all the cancelling and all the irrelevant (e.g. misconduct) ones.

I am glad to tell that I was able to create a consistent, at least at first look, dataset. But before that I had to go in and correct penalty box entries for about 160 goals manually. About 35 of them were just forced to enter the penalty data manually, because any algorithm assigning the player in the box to the goal would be ambiguous. However I also discovered about 125 goals (120 pre 1999/00, when extra reports were introduced, and only five since) that should not even be marked as power play goals. There was no matching penalty. Of course, the mistake can be on the penalty data in the NHL report: the time of the penalty may be reported wrongly. But until further notice, these goals should not be considered PPG:

GameID P Time Scorer Team
198720125 3 9:19 BOB SWEENEY Boston Bruins
198720134 3 18:58 RICK TOCCHET Philadelphia Flyers
198720239 1 1:23 STEPHANE RICHER Montreal Canadiens
198720367 1 4:51 DALE HAWERCHUK Winnipeg Jets
198720388 1 17:46 TROY MURRAY Chicago Blackhawks
198720389 3 3:55 PAUL MACLEAN Winnipeg Jets
198720431 2 6:56 PETER TAGLIANETTI Winnipeg Jets
198720449 2 12:34 MIKKO MAKELA New York Islanders
198720471 3 10:51 ANTON STASTNY Quebec Nordiques
198720484 2 13:38 MARIO LEMIEUX Pittsburgh Penguins
198720528 1 6:32 CHARLIE SIMMER Pittsburgh Penguins
198720610 3 3:08 LAURIE BOSCHMAN Winnipeg Jets
198720735 3 8:10 MIKE FOLIGNO Buffalo Sabres
198720755 1 3:06 GARRY GALLEY Washington Capitals
198720755 2 0:29 GERALD DIDUCK New York Islanders
198720787 1 17:02 AARON BROTEN New Jersey Devils
198720799 2 0:49 JIMMY CARSON Los Angeles Kings
198720802 2 3:24 MIKE FOLIGNO Buffalo Sabres
198720804 1 18:03 PAT VERBEEK New Jersey Devils
198730134 1 7:52 BRUCE DRIVER New Jersey Devils
198730223 3 9:47 MARK JOHNSON New Jersey Devils
198730314 2 12:31 CAM NEELY Boston Bruins
198820088 2 6:17 RANDY MOLLER Quebec Nordiques
198820088 2 7:11 ANTON STASTNY Quebec Nordiques
198820147 1 15:58 JOHN CULLEN Pittsburgh Penguins
198820150 2 2:29 KEVIN DINEEN Hartford Whalers
198820203 3 12:45 JOE MULLEN Calgary Flames
198820241 2 18:53 DAN QUINN Pittsburgh Penguins
198820307 3 10:12 MARIO LEMIEUX Pittsburgh Penguins
198820318 2 5:24 GAETAN DUCHESNE Quebec Nordiques
198820331 3 7:36 PAUL GAGNE Toronto Maple Leafs
198820489 1 16:20 DALE HUNTER Washington Capitals
198820542 1 18:13 DOUG EVANS St. Louis Blues
198820727 3 7:49 PAUL MACLEAN Detroit Red Wings
198820803 2 6:58 DOUG SMITH Vancouver Canucks
198820821 3 17:43 BRENT FEDYK Detroit Red Wings
198920146 1 5:14 KEVIN DINEEN Hartford Whalers
198920158 1 5:26 TROY MURRAY Chicago Blackhawks
198920220 2 15:53 NEAL BROTEN Minnesota North Stars
198920238 1 2:39 GREG PASLAWSKI Winnipeg Jets
198920303 2 4:55 PAT ELYNUIK Winnipeg Jets
198920475 3 8:25 BRIAN MULLEN New York Rangers
198920543 3 18:46 AL MACINNIS Calgary Flames
198920667 1 3:04 JEREMY ROENICK Chicago Blackhawks
198920749 1 8:18 CRAIG JANNEY Boston Bruins
198920818 2 1:25 JOHN OGRODNICK New York Rangers
198930231 2 13:43 TRENT YAWNEY Chicago Blackhawks
198930322 2 18:01 JARI KURRI Edmonton Oilers
199020005 3 19:54 RAY SHEPPARD New York Rangers
199020056 1 15:44 DAVE TAYLOR Los Angeles Kings
199020262 2 11:20 BOBBY HOLIK Hartford Whalers
199020612 3 7:18 JOHN CHABOT Detroit Red Wings
199020636 1 15:02 BRIAN LEETCH New York Rangers
199020647 2 9:16 KENNETH JR HODGE Boston Bruins
199020704 2 12:47 KELLY KISIO New York Rangers
199020716 3 0:34 JOE SAKIC Quebec Nordiques
199020762 2 5:16 MARK RECCHI Pittsburgh Penguins
199020815 2 19:28 KEVIN STEVENS Pittsburgh Penguins
199030222 3 2:17 DINO CICCARELLI Washington Capitals
199030235 3 14:04 BRIAN PROPP Minnesota North Stars
199120059 3 2:20 DOUG GILMOUR Calgary Flames
199120070 2 10:26 DAVE GAGNER Minnesota North Stars
199120093 3 18:48 DARREN TURCOTTE New York Rangers
199120238 3 0:18 PAUL RANHEIM Calgary Flames
199120313 3 11:52 JEREMY ROENICK Chicago Blackhawks
199120407 1 10:39 BOBBY CARPENTER Boston Bruins
199120504 2 18:42 JIMMY CARSON Detroit Red Wings
199120617 2 16:43 MARTY MCSORLEY Los Angeles Kings
199120625 3 18:17 TODD ELIK Minnesota North Stars
199120638 1 13:15 GREG ADAMS Vancouver Canucks
199120755 3 10:48 DOUG GILMOUR Toronto Maple Leafs
199130143 3 13:57 AL IAFRATE Washington Capitals
199220002 2 5:17 SCOTT STEVENS New Jersey Devils
199220013 1 12:08 RICK TOCCHET Pittsburgh Penguins
199220023 2 17:32 MARIO LEMIEUX Pittsburgh Penguins
199220068 2 17:22 MIKE GARTNER New York Rangers
199220091 3 19:34 JOE JUNEAU Boston Bruins
199220121 1 6:14 MIKE GARTNER New York Rangers
199220149 2 18:44 CHRIS KONTOS Tampa Bay Lightning
199220181 3 6:21 ULF DAHLEN Minnesota North Stars
199220287 1 16:56 CHRIS KONTOS Tampa Bay Lightning
199220337 2 5:42 ALEXANDER MOGILNY Buffalo Sabres
199220388 1 15:45 GREG HAWGOOD Edmonton Oilers
199220401 2 12:48 JEFF NORTON New York Islanders
199220562 3 9:19 CHRIS KONTOS Tampa Bay Lightning
199220563 1 9:25 TEPPO NUMMINEN Winnipeg Jets
199220589 3 9:18 ROD BRIND'AMOUR Philadelphia Flyers
199220595 1 7:20 CRAIG JANNEY St. Louis Blues
199220908 2 16:28 VALERI KAMENSKY Quebec Nordiques
199220986 1 15:44 JIRI SLEGR Vancouver Canucks
199320065 1 15:37 DENIS SAVARD Tampa Bay Lightning
199320092 3 11:22 SERGEI FEDOROV Detroit Red Wings
199320636 1 16:40 TIM SWEENEY Mighty Ducks Of Anaheim
199320643 3 16:50 MARTIN LAPOINTE Detroit Red Wings
199320740 2 7:53 KEITH TKACHUK Winnipeg Jets
199320840 1 17:17 SERGEI ZUBOV New York Rangers
199320905 3 18:16 KEVIN STEVENS Pittsburgh Penguins
199420025 2 1:32 KELLY BUCHBERGER Edmonton Oilers
199420072 2 7:27 STEVE THOMAS New York Islanders
199420087 1 9:52 KEITH TKACHUK Winnipeg Jets
199420332 2 2:35 GRANT LEDYARD Dallas Stars
199420332 2 7:11 RAY SHEPPARD Detroit Red Wings
199520189 2 15:41 LUC ROBITAILLE New York Rangers
199520226 3 17:34 CHRIS GRATTON Tampa Bay Lightning
199520490 2 9:41 TODD BERTUZZI New York Islanders
199520539 3 13:25 MARK MESSIER New York Rangers
199520560 1 9:03 VYACHESLAV KOZLOV Detroit Red Wings
199520694 3 15:43 RON FRANCIS Pittsburgh Penguins
199520744 2 6:41 SCOTT MELLANBY Florida Panthers
199520750 2 9:05 NICKLAS LIDSTROM Detroit Red Wings
199520766 1 16:00 BENOIT HOGUE Dallas Stars
199520776 2 5:53 MARIO LEMIEUX Pittsburgh Penguins
199520809 1 12:47 ADAM GRAVES New York Rangers
199520847 2 16:40 KEITH PRIMEAU Detroit Red Wings
199620127 3 12:03 ALEXEI ZHAMNOV Chicago Blackhawks
199620211 2 13:08 GREG ADAMS Dallas Stars
199620466 3 14:50 JOZEF STUMPEL Boston Bruins
199620792 1 11:50 BRENDAN SHANAHAN Detroit Red Wings
199621058 2 1:41 MARTIN GELINAS Vancouver Canucks
199720028 2 18:01 TREVOR LINDEN Vancouver Canucks
199720067 3 16:29 TERRY YAKE St. Louis Blues
199920756 1 6:46 JAMIE LANGENBRUNNER Dallas Stars
200520486 1 17:28 ROB COLLINS New York Islanders
200620150 1 7:59 NATHAN HORTON Florida Panthers
200620627 1 5:00 RYANE CLOWE San Jose Sharks
200821010 1 16:44 MIKAEL SAMUELSSON Detroit Red Wings
201620276 2 12:03 JOHAN LARSSON Buffalo Sabres

The resulting penalty box data is available on our website, in the Request Analysis section.
Hopefully, tomorrow, I'll blog about another useful dataset the lemma research has produced.

Monday, February 5, 2018

The Website - A Page A Day, Part V - Rink Repairs

Part I
Part II
Part III
Part IV

Well, the pace didn't last long, but we're back with Part V - Rink Repair Statistics

Probably, this is the least hockey related page of the website, but hey, we can do it, and it's actually pretty easy, so why not?

The STOP events as listed in the Play-By-Play HTML summaries since the 2002/03 season on the website contain the reason(s) for the stop. While most of them are game-based, such as ICING or OFFSIDE, one of the more rare ones caught our eye: RINK REPAIR. We decided to collect these stops and rank NHL home teams as well as NHL arenas by the amount of rink repairs that happened. We filter out the occasional venues, such as outdoor stadiums for NHL special events or the foreign arenas for the European showdown games.

Since we already parse practically all events from the Play-By-Play summaries (or PL), and classify them by types, we already have a collection of STOP events. And we also detect and catalog the reasons for the stoppages, so it is an easy Mongo query to extract all rink repairs incurred. We then aggregate them by locations and teams, and create SQL tables for easy quick display of the data.

And so we were able to assemble a nice set of tables displaying the level of maintenance in the rinks around the league. Here's a look at the 2016/17 season:

Arena Repairs
So the claims that the Islanders' current home has the worst ice has been substantial. For the last three years, actually, Barclays Center ranks among highest. Naturally, the old arenas like Canadian Tire Centre or Rexall Palace also seemed to get a lot of rink repairs during the games.
You can switch the year of the display or the mode between 'By Team' and 'By Arena'.
As with all the displayed tables on our site, there are links below it to see it in a pristine form, the direct link to the page for the selected season, and buttons to share it on Twitter and Facebook.
Once again, it doesn't seem like a highly meaningful piece of work, but we consider it fun, and it took about quarter an hour to implement altogether.
  • Aggregate show by a period rather than a single season
  • Include arena ages (requires manual input of construction years)

Friday, January 26, 2018

The Website - A Page A Day, Part IV - West vs. East

Part I
Part II
Part III

And now we're putting two in a row with West vs. East.


This page shows the results of games between Eastern and Western Conference teams. The data is available from the year 1993, when Western and Eastern Conference were formed, from the Boxscore files.

Shown from left to right:
  • Regular West wins
  • OT/SO West wins
  • Ties(up to year 2005)
  • OT/SO East wins
  • Regular East wins
  • Stanley Cup Winning Conference
A total tally is available as well. As of the date of this post the standings are:
WEST 2802 Reg. W - 595 OTW - 444 T - 541 Reg. W - 2570 Reg. W EAST
West holds a formidable lead which eroded a bit in the 2015/16 and 2016/17 seasons. Before that the last time East won the count was back in 1998/99! The 2009/10 was the most lopsided season with the final score of 155-115 in West's favor.
West won 14 times, East - 6. Once, in 2011/12 there was a tie with 134 wins apiece.

The Stanley Cup winners are divided more evenly, however, with West holding the edge 13-10.

There is no data for seasons 1994/95, 2004/05 and 2012/13 because of full or or partial season lockout. Also, the win-loss count from the Finals are not included.

If you are an experienced Webmaster, especially with Javascript and CSS we would greatly appreciate the potential tidy-up and enrichment you could provide to these graphs. Your work will be credited for future reference! Write to us, or get in touch with us on Twitter. Thanks!

Possible future additions:
  • Season aggregation
  • More compact design
  • Your ideas are welcome!

Thursday, January 25, 2018

The Website - A Page A Day, Part III - NHL's Yearly Tendencies

Part I
Part II

Despite all the delays we move on to League's Yearly Tendencies.


This view shows stacked bars of various event counts per season in the NHL.All the stats are presented per game, since the amount of games played varied every season.

For the first time we encounter the stat selection menu, on top of the graphs. On this page it features the following options:

  • Category of the stats
  • Stage of the games (Regular or Playoff)

Available stats are:
  • Shots
    • Goals (available since 1987, from Boxscore reports)
    • Shots on Goal (available since 1987, from Boxscore reports)
    • Misses (available since 2005, from PBP reports)
    • Blocks (avaliable since 2005, from PBP reports)
  • Icings (available since 2002, from PBP reports)
  • Margin of victory (available since 1987, from Boxscore reports)
  • Penalties
    • Minor
    • Major
    • Fighting
    • Misconduct
    • Match Penalty
  • Goals per Game
The stats are available for regular season, playoffs and both stages combined.

The PBP reports from the NHL dating before 2005 are wildly inconsistent, thus we didn't use them for MISSES and BLOCKS.

We are using the wonderful d3js library for producing these graphs.

If you are an experienced Webmaster, especially with Javascript and CSS we would greatly appreciate the potential tidy-up and enrichment you could provide to these graphs. Your work will be credited for future reference! Write to us, or get in touch with us on Twitter. Thanks!

Possible future additions:
  • PowerPlay success
  • Minor penalty breakdown
  • Aggregations and overlays
  • Mouse graph manipulations
  • Your ideas are welcome!

Tuesday, January 16, 2018

The Website - A Page A Day, Part II - The League section and the Daily Summary

Part I

It looks like it's time to resume our Website - A Page A Day series.


The first section available on our website is the League section. It contains pages either describe statistics applicable to the entire league, rather than to a specific team, player or coach, such as, for example, the West vs East account, or the statistics go across different categories, for instance, the Warm Welcome page that is applicable to the teams as we as the goaltenders.

We'll start with the first page of this first section - the Daily Summary.

This page looks like most of the data pages of this website:

  • A menu on the top (we remove the game ticker to lighten the pages)
  • A section navigation on the left
  • Main data pane on the right, featuring graphs and tables
  • A utility menu on the bottom.

This page features the summaries applicable for today's games. This page is planned very much like an ever going work in progress, but currently only two items are featured:

The historic tally of today's coaches against each other

Our database features games from the season 1987/88 through current. Our statistics account for these games in the last 30+ years. This is usually sufficient to cover all active coaches' encounters. However we plan to integrate the remaining seventy years, from 1917 to 1986, as they became available from NHL.

The predictions for today's games by our model

We display the matches for the day where we calculate the expected number of goals by each team against its opponent, and derive the probability of a win from these expected scores.

As the rosters become available for the upcoming games, we adjust our prediction by these rosters, otherwise best-scoring rosters are used. This may change in the future. The predictions by the published rosters are marked by asterisks.

Beneath the table there is a set of useful links:

  • 'Table' link that produces a pristine, unstyled table in a separate frame for easy re-use
  • 'Link' link that points at the table which is currently being displayed (this one is more useful for paged and/or filtered listings)
  • Tweet button that allows you to share this page on Twitter
  • Like button that allows you to like this page for Facebook
Possible future additions:
  • Historical browsing of previous predictions
  • Elaboration of the scoring components
  • Change in the win probability formula
  • Your ideas are welcome!

Monday, January 15, 2018

A suggestion for the All-Star Game

While the series "Website - A Page A Day" is being delayed by all kinds of things, here comes a short post on a different topic.

Last year, in my opinion, the accuracy shooting competition which included shooting the pack from the goal line into a small hole was, in my opinion a total failure. Mike Smith's spectacular score across the rink did the injustice and provided a false impression this skill contest was any good. Otherwise, the competition was not exciting to say the least.

Therefore, here's a suggestion to replace it: reverse shootouts.

Let the goaltenders shed their equipment for once, and let the skaters don it instead. Let's have a competition where the goaltenders skate and attempt to score in shootout, while the skaters try to stop them. I am sure that somewhere in the back of their minds that would fulfill a little dream both parties would have!

Wednesday, January 10, 2018

The Website - A Page A Day, Part I - Home Page

The main page of the website shows the summary of all its features.

The site menu features the main statistical sections:

  • League - league-wide statistics
  • Teams - team-based statistics
  • Players - personal statistics
  • Coaches - statistics for the NHL coaches
  • Drafts - statistics for currently drafted players and the historical performance by draft
  • Fantasy - tools to help the fantasy player

On the top we have a ticker of the scores, as predicted by the model. The ticker is always scrolled to the current date, however you can navigate it back and forth using the two arrows at the edges. The away team is on the top, and the home team is on the bottom. The predicted score is in the Prdct column. The actual score is in the Act column. The projected winner is displayed in bold. If the prediction failed, the displayed teams will be painted red. We never adjust our predictions backwards. Only the scores for the current season are featured.

Then, on the top right we have three very important links that would help you with understanding the pages:

  • Learn More - about the methodology of the Website
  • Glossary - about the terms used in the pages
  • Blog - link to this blog which, as you see, also takes time to elaborate on the site.
We also display our latest addition to the website and the latest blog entry.

Then we have three random snippets in the columns. The snippets represent excerpts from the tables published elsewhere on the site and may change after each publication, which happens overnight. The snippets currently (hopefully they would become more diverse) are:
Only current season data is displayed in the snippets.
Below the random snippets we feature a permanent snippet that shows the best projected picks for the daily fantasy competitions. We provide a model-based evaluation of the expected score for players in the three most popular Daily Fantasy websites - Yahoo, FanDuel and DraftKings.

Below each snippet there is a link to the page with the full data.

At the bottom we have a collection of information links. Make sure you visit the Glossary and the About pages. The Links page has an ever-growing collection of hockey-related links. The Forum link leads to, the site we're partnering with - this blog is broadcast there as well. The Data section shows the software and the data sources this website is built with.

Hopefully this small introduction proves helpful and entices you to pay another visit to MoreHockeyStats, and to share this site with your friends. Follow us on Twitter, through the site or through this blog!

Tuesday, January 9, 2018

The Website - A Page A Day - Prologue

Hello, hockey world, from Mr. Van Winkle...

This prolonged silence was caused by a lot of factors, led, naturally, by REAL LIFE™. But I also made a fundamental mistake in the infrastructure setup, and so I had to roll out a hastily patched setup when the season was starting; I then went back and worked on this infrastructure fix, the work which continued over three months and is now complete! Unfortunately, all this work will not be visible to the visitor, with possible exception of a slightly faster page loading - all the changes belong to the back end.

But now, once again, with REAL LIFE™ limitations I am able to work on improving the website look, speed and features. I am being somewhat torn apart since I also need to hone the models I'm using, but I decided the models can wait with half of the hockey season gone by now. In addition I'll try to put an extra effort to promote the site and made it more visible on the Web.

One of the things I was not able to complete is to release my code as the open source. This has also been put into a bottom drawer. However, I still welcome cooperation very much and will gladly share the code with people who would want to contribute to the project. The areas I could really use some help are:
  • JavaScript/HTML/CSS
  • MongoDB query and database optimization
  • SEO optimization
So if you feel like helping - drop me a mail.

Also I will begin a series "Website - A Page A Day", where I plan to describe each and every statistical page on the site because I have a feeling I haven't been clear enough with the explanations until now; moreover it will help me discover errors and inconsistencies that probably crawled through.

So, stay tuned, here in this blog, on the website, and on Twitter!