I managed to fish out quite a few errors and inconsistencies. Some where systemic, and could be fixed in software, some required a manual intervention, sometimes by editing the source file, sometimes by providing the correct value overriding what the parser would read. These interventions, classified into a variety of types, formed another library.
But I also wanted to share the errors that I found and the fixes I figured out with the analytics community. So I decided to create a website dedicated to it. It took a while, but finally a couple of days ago I was able to open NHLErrata.com. There you can find:
- An overview of data sources.
- Information on missing players and events
- Information of broken reports, players and events
- Systemic problems encountered with the reports
Both mentioned libraries, Test.pm and Errors.pm are part of my scrape-to-database package on CPAN.