How crowd-sourcing will spark a data revolution

Frances Woolley is a professor of economics at Carleton University

Governments collect the information that seems important at the time. In 1911, the Dominion Bureau of Statistics duly recorded the total tons of freight passing through Canadian canals. There are no official statistics, however, on the date the first snowdrops appeared in Edmonton that year, or the frequency of goldfinch sightings. This lack of baseline data makes it hard to know if climate change is altering birds' migration patterns, or if the growing season is getting longer.

Yet governments have no monopoly on data. People across Canada keep detailed records of the minutiae of their daily lives - the birds at their birdfeeders or the flowers in their garden - for no reason other than personal satisfaction.

There are also thousands of organizations, small and large, with fascinating data to share. Hockey Canada, for example, keeps detailed statistics on minor hockey registration trends, which it publishes each year in its annual report. This information could be uploaded, merged with other demographic data, and used to see if the child's fitness tax credit had any measurable impact on hockey registration numbers.

Businesses will not put information on-line unless they can profit from doing so through advertising revenue or subscription fees. Governments, however, can take a broader perspective. Data infrastructure is like physical infrastructure. The Red River flood control scheme does not generate revenues directly. However the benefits to people living near the river, and the economic activity it makes possible, make flood control worthwhile.

The federal government is taking steps to build the country's data infrastructure. Last year saw the launch of the open data pilot project, data.gc.ca. Earlier this year the paywall in front of Statistics Canada's enormous CANSIM database was taken down. The National Research Council, together with University of Guelph and Carleton University, has a new data registration service, DataCite, which allows Canadian researches to give their data permanent names in the form of digital object identifiers. In the long run, these projects should, as the press releases claim, "support innovation", "add value-for-money for Canadians," and promote "the reuse of existing data in commercial applications."

Yet all of these initiatives are geared towards government data sets and professional researchers. Important private records – diaries of early settlers, for example – can find a home in Canada's National Archives. But the Archives do not have sufficient resources to process and document records of snowdrops or goldfinches. Moreover, the Archives keep records, not data sets – it is fascinating to look at census records from 120 years ago, but they aren't much use for statistical analysis.

There is a solution: crowd-sourcing. Across the country there are students, amateur and professional historians, policy analysts, bloggers and data nerds. I'm one of them. I've taken data collected by a notable Ottawa record keeper, Mr. Harry Thomson, and posted it on Worthwhile Canadian Initiative. Mr. Thomson's records go back to the 1960s, long before Environment Canada began collecting comparable hydrometric data. An analysis of the data shows a significant decline in peak water levels during the spring flood – with this year being no exception.

Yet Worthwhile Canadian Initiative is just one blog in the vast expanse of the World Wide Web, and might not even be there in five or ten year's time. We need a permanent site for all of this data, through which the collective power of the internet can be unleashed – editing, compiling, analyzing, telling stories and, above all, building understanding.

Follow related authors and topics

Interact with The Globe

Latest in

Follow related authors and topics

Interact with The Globe