I managed to fish out quite a few errors and inconsistencies. Some where systemic, and could be fixed in software, some required a manual intervention, sometimes by editing the source file, sometimes by providing the correct value overriding what the parser would read. These interventions, classified into a variety of types, formed another library.
But I also wanted to share the errors that I found and the fixes I figured out with the analytics community. So I decided to create a website dedicated to it. It took a while, but finally a couple of days ago I was able to open NHLErrata.com. There you can find:
- An overview of data sources.
- Information on missing players and events
- Information of broken reports, players and events
- Systemic problems encountered with the reports
Both mentioned libraries, Test.pm and Errors.pm are part of my scrape-to-database package on CPAN.
just wanted to say thanks for all the great data on hockeyelorankings.com :)
ReplyDeletei used them to create an animated graph video about NHL team's ELO history: https://www.youtube.com/watch?v=FldemIx6mCc