The Ontario government released their annual Sunshine List on March 24, detailing public sector employees earning more than $100,000 per year. We created a table so readers can explore the list in more detail, letting you search, sort and filter by name, salary and more.
This list is published each year on the government’s website in a way that’s hard to search, impossible to sort and difficult to navigate. The Globe wanted to pull the data from this year’s list and publish it in a more usable way, as a tool for our reporters and our readers.
Here’s a little background on how we made the tool. (Be warned: it gets technical.)
We started by building a scraper, which trolls web pages for content and saves it in a more sophisticated way than copy-paste. Using a coding language called Python, we built a universal scraper that could pull all the data back to 1997 – the first year it was released.
We cleaned this data using Google Refine, converting encoded HTML characters and renaming some categories. The next challenge was cutting the data down as much as possible. While there were only a few thousand records back in the 1990s, other years had as many as 79,000 records, making the file sizes very large. While Chrome and Firefox could handle it well, Internet Explorer chugged slowly with each new megabyte we pushed its way.
Since we had data from the 2010 release, we added a feature to let readers compare increases or decreases. In an earlier version of the table, we also included the employer name and position with this pop-up. But we had to cut it late in development because it nearly doubled the size of the 2010 dataset.
The final tool is very simple to use and, admittedly, not very flashy. But it lets readers dig a little deeper into the list, search specific jobs and find notable people.
If you have any questions, comments or suggestions about this interactive, reach us at email@example.com.