Sunday, October 21, 2018

NHLErrata launch

Ever since I started collecting the NHL data it was very important to me to validate the collected information. So I created a set of checks that finally formed a whole library to test both the NHL boxscore feeds and the HTML reports.

I managed to fish out quite a few errors and inconsistencies. Some where systemic, and could be fixed in software, some required a manual intervention, sometimes by editing the source file, sometimes by providing the correct value overriding what the parser would read. These interventions, classified into a variety of types, formed another library.

But I also wanted to share the errors that I found and the fixes I figured out with the analytics community. So I decided to create a website dedicated to it. It took a while, but finally a couple of days ago I was able to open NHLErrata.com. There you can find:

  • An overview of data sources.
  • Information on missing players and events
  • Information of broken reports, players and events
  • Systemic problems encountered with the reports

Both mentioned libraries, Test.pm and Errors.pm are part of my scrape-to-database package on CPAN.

1 comment:

  1. just wanted to say thanks for all the great data on hockeyelorankings.com :)

    i used them to create an animated graph video about NHL team's ELO history: https://www.youtube.com/watch?v=FldemIx6mCc

    ReplyDelete