Wednesday, August 16, 2017

A revision of HTML reports from the NHL website.

So, while revising and revisiting the data files, I did an extra scan of the HTML reports available on NHL.com .

The HTML reports are extracted through the following URL pattern: 

sprintf('http://www.nhl.com/scores/htmlreports/%d%d/%s%02d%04d.HTM', $season, $season+1, $type, $stage, $game_id) 

where $season is the start year of the season, $type is the type of the report, $stage is 2 for regular, 3 for playoff, and $game_id is the NHL game id for the season.

Below are the results for the problematic games, with the reports that are beyond salvage marked accordingly:

Season Stg GameId ES GS PL RO
1999 Reg 0029 M M N N
1999 Reg 0045 I I N N
1999 Reg 0050 I I N N
1999 Reg 0058 R R N N
1999 Reg 0071 I I N N
1999 Reg 0072 I I N N
1999 Reg 0081 I I N N
1999 Reg 0109 I I N N
1999 Reg 0130 I I N N
1999 Reg 0323 R R N N
1999 Reg 0619 I I N N
1999 Reg 0689 I I N N
1999 Reg 0690 B I N N
1999 Reg 0836 I I N N
1999 Reg 1034 I I N N
1999 P/O 0325 I I N N
2000 Reg 0029 R R N N
2000 Reg 0038 R R N N
2000 Reg 0039 I I N N
2000 Reg 0041 R R N N
2000 Reg 0042 R R N N
2000 Reg 0043 R R N N
2000 Reg 0044 R R N N
2000 Reg 0045 I I N N
2000 Reg 0049 R R N N
2000 Reg 0067 R R N N
2000 Reg 0072 I I N N
2000 Reg 0073 I I N N
2000 Reg 0077 I I N N
2000 Reg 0080 R R N N
2000 Reg 0083 R R N N
2000 Reg 0085 I I N N
2000 Reg 0095 R R N N
2000 Reg 0102 I I N N
2000 Reg 0112 R R N N
2000 Reg 0186 I I N N
2000 Reg 0187 I I N N
2000 Reg 0189 I I N N
2000 Reg 0920 R R N N
2000 Reg 0921 I I N N
2000 Reg 0924 R R N N
2000 Reg 0925 R R N N
2000 Reg 0926 R R N N
2000 Reg 0928 I I N N
2000 Reg 0983 B V N N
2000 Reg 1166 I I N N
2003 Reg 0191 V I V N
2003 Reg 1205 I I I N
2003 P/O 0134 V V B N
2005 Reg 0298 R V V N
2005 Reg 0458 B V V N
2005 Reg 0677 V V V B
2005 Reg 0679 V V V B
2005 Reg 0681 V V V B
2007 Reg 1178 I I B I
2008 Reg 0259 I I B I
2008 Reg 0409 I I B I
2008 Reg 1077 I I B I
2009 Reg 0081 I I B I
2009 Reg 0827 V I V V
2009 Reg 0836 V I V V
2009 Reg 0857 I I V V
2009 Reg 0863 I I V V
2009 Reg 0874 I I V I
2009 Reg 0885 I V V I
2010 Reg 0429 I I V V
2011 Reg 0259 I I V V
Legend:
I - incomplete (not through the end of the game)
B - broken (doesn't pass HTML parser)
M - misplaced (belongs to a different game that doesn't have a file associated with it)
R - replica (copy of a file from another game
N - not available (file not available on NHL.com)
V - file is good.