Relatively small organizations can - and do - accumulate vast amounts of data. And that presents some big challenges.
Consider Delvinia Interactive Inc., a Toronto-based digital strategy firm that counts among its electronic assets the 160,000-plus member AskingCanadians online panel, and its French language sibling, Qu'en pensez vous.
Adam Froman, the company's CEO, says that to allow it to house panel-member information and survey data, Delvinia manages 10 terabytes of storage. This for a company with only 30 employees.
Small used to mean, relatively speaking, small profits, small staff and small buildings. But today, in the information age, small is irrelevant when it comes to data. Small or large, databases are growing with incredible speed, presenting small businesses challenges to match that growth and to develop security policies to manage their sensitive data.
Just ask the Institute for Clinical Evaluative Sciences. With upwards of 150 faculty and staff, it houses multiple databases of medical information, from sources such as the Ontario Health Insurance Plan, the Ontario Drug Benefit plan, and the Canadian Institute for Health Information hospital discharge abstract database.
The organization's day-to-day working database is 1.5 terabytes, and it increases by 50 gigabytes per month, according to Ruth Croxford, senior research coordinator at ICES. This represents over 17 years of primary data, growing by 20 million records monthly.
ICES has more than 100 projects under way at any given time, on topics such as the assessment of care delivery, patterns of service utilization, health technologies, drug therapies and treatment modalities. Subsets of the data are made available to external researchers as well.
Given the sensitivity of the information they hold, both companies face larger than usual challenges to manage their terabytes. ICES, for example, has to anonymize all data before its researchers can work with it. Even dates are removed from records, replaced by a number representing the number of days from patient diagnosis, to ensure individuals can't be identified.
It uses SAS Data Management , which provides a single environment of solutions, tools, methodologies and workflows, to manage and manipulate the data. "We know a lot about people in the aggregate, but know nothing about individuals," says Ms. Croxford.
Mr. Froman's company also routinely deals with its respondents' personal information, whether it is basic demographics or responses to various surveys. But unlike ICES, whose data is hosted in-house and completely isolated from the outside world, Delvinia relies on its data collection and hosting vendor, ConfirmIT to house its panel, in the company's European datacentre.
"For companies today, there's so much great technology you can license. I don't recommend that any small business build its own technology," Mr. Froman says. "You have to find the right partners - you can't afford to be making mistakes with that."
Mr. Froman notes that, when considering hosting, data storage location can be important to clients, and that was one of his considerations.
Delvinia uses both ConfirmIT's panel management and SAS analytic software for its data collection and management, although Mr. Froman acknowledges that some small businesses could be scared off by the cost of solutions like these.
But, he points out, a lot of technology companies want to work with small business, and government programs such as the National Research Council's Industrial Research Assistance program will help fund development projects. "If you're a Canadian small business, there are a lot of great programs out there," he says. "It's a case of choosing the right platform."
His team has built processes to connect ConfirmIT to Delvinia's internal databases, and to manage panel integrity and panelist privacy. "You can't overlook the importance of having documented policies and guidelines about data handling," he says. "Small companies often don't have a lot of policies and processes."
Without these policies, he explains, it's all too easy for someone to inadvertently corrupt or expose data. "Establish them while you're still small," Mr. Froman advises.
Ms. Croxford agrees. ICES security is multi-layered and well-defined, with strict rules and controls on who can physically access servers, and who can access any given dataset or data item.
She adds that there are no external connections to the data, to prevent inadvertent (or malicious) exposure of patient information. Any data released to external researchers is scrubbed to remove or obscure any items that could reveal the identities of individual patients.
In addition to security rules, Mr. Froman advises strong backup policies. There is a cost, but, he points out that data loss can shut down a business, so the price is worth the investment. Delvinia does nightly backups of its internal servers, and sends a copy of its database to a secure location offsite every week.
Mr. Froman says that he also regularly downloads and vaults a copy of his panel databases from ConfirmIT's servers as a supplement to ConfirmIT's own backups.
"You have to take the process seriously," he says. "It's become a world of digital data. If you're not thinking things through and working with a partner that's sound itself, you're setting yourself up for a lot of problems."