For Finlay Maguire, the story of Canada’s pandemic is written in a single multicoloured image taking shape on his computer screen month by month.
What it shows is not case numbers or vaccination rates, but lineages – part of a burgeoning family tree that reveals the genetic diversity of the virus that causes COVID-19.
So far, at least 541 of those lineages are known to have turned up in Canada. They represent nearly every major genetic branch of COVID-19 and a sizable fraction of the 1,528 lineages that have been catalogued by scientists around the world. Each lineage carries its own unique version of the virus’ genome, in the form of RNA that is extracted from patient samples and then translated into digital sequences by machines working day and night in labs across the country. The task has been accomplished in Canada more than 270,000 times.
Worldwide, more than 5.4 million genomes of the virus have been uploaded to data-sharing repositories. All of this has given infectious-disease experts something that was unattainable during previous pandemics: the power to watch a deadly pathogen evolve before their eyes.
“That’s the new thing,” said Dr. Maguire, a data scientist at Dalhousie University in Halifax who specializes in microbial bioinformatics. “It’s being able to look at what variations are occurring, at how they potentially impact vaccine efficacy or transmission rates … and being able to connect that to what’s happening globally in real time.”
Genomic sequencing is what enabled scientists to identify the virus’ four known variants of concern – Alpha through Delta – which have prolonged the pandemic and increased the human toll. This week it has turned up a possible fifth such variant in South Africa. As the virus is increasingly hemmed in by vaccines, genomics will be just as important for spotting mutations that can circumvent our hard-won immunity.
But while Canada is among the countries best equipped to investigate the coronavirus genome, Canadian researchers have found the path to utilizing this biological superpower convoluted and slow. The limiting factor is not scientific or technical. Rather, it is the difficulty in gaining access to the data in a useful and timely way.
“Canada has long had a data-sharing problem,” said Andrew McArthur, a computational biologist whose lab at McMaster University in Hamilton, in collaboration with Toronto’s Sunnybrook Health Science’s Centre, decoded the viral genome of the country’s first successful isolation of live virus. Data-sharing barriers are a fact of life between provincial and federal agencies, and between government-run public health labs and their university counterparts, he said. And despite the urgency of the pandemic, “the needle really hasn’t moved much.”
Because of this, Canada’s COVID-19 virus-sequencing effort was described as only a “partial success” in a case study issued by the independent Public Policy Forum think tank in October. The report notes that privacy concerns were often cited as the reason for holding back data.
This is not because the RNA of a virus contains any genetic information about the person who provided the material. Rather it is the metadata associated with the sample – including patient age, gender and time and location of collection – that could make it possible to find out whose case of COVID-19 it was.
In principle, this concern has been addressed through a standard set of protocols and a national database for viral genomes launched earlier this year. But interviews conducted by The Globe and Mail suggest there are additional barriers that have blocked access. These come down to differences of purpose, priority and attitude among the various parties involved.
For example, provincial public health labs, which do the bulk of COVID-19 genome sequencing, have tended to be more focused on informing their own decision makers about the immediate pandemic response rather than on sharing data for scientific studies. In contrast, university-based researchers are seeking to discover and publish broader insights about the evolution and behaviour of the virus. Somewhere in between, the goal of giving scientists a play-by-play of the virus’ ever-shifting genetic profile in Canada has remained elusive.
”I think the culture in public health labs is not as familiar with the idea of open science and of data sharing being a priority,” said Yann Joly, research director of McGill University’s Centre of Genomics and Policy.
Dr. Joly chairs the data-sharing committee for CanCOGeN – the Canadian COVID Genomics Network – a $40-million federal project that was launched last year to support genome sequencing for the pandemic response. Earlier this year the project launched an online portal called VirusSeq to make it easier for Canadian scientists to access Canadian data. After a slow start, the portal has now accumulated more than 100,000 viral genomes.
That’s a big improvement over where things stood two months ago, said Catalina Lopez-Correa, who leads the network. The question is whether the struggle to open lines of communication will lead to more lasting changes in how data on pathogens are managed across Canada’s fragmented public health landscape.
“This pandemic is pushing us to face those barriers and address the impediments to data-sharing,” Dr. Lopez-Correa said. “Hopefully that can have an impact across the whole health care system.”
COVID-19 cases and genome sequencing
After COVID-19 cases began climbing in Canada in 2020, a relatively small number (blue) were genetically sequenced to identify separate lineages of the virus. The sequencing effort grew dramatically in 2021 after variants began to emerge. So far, the viral genomes from about 10 per cent of all Canadian cases (approximately 179,000) have been made available for scientists to analyze.
Rise of the variants
A breakdown of publicly available viral genomes gathered in Canada shows waves of variants overtaking each other to fuel the pandemic. The earliest branches of the virus (19A and B) were soon overtaken by several others that spread from Europe and the U.S. (20A, B, C and others). This growing diversity was then crushed by the variants, particularly Delta.
While Delta now dominates Canadian cases of COVID-19, it has since diverged into several sublineages. This suggests that more recent versions of Delta are only marginally better at infecting people. But further large leaps are possible.
The technology to read an organism’s full genetic sequence came of age during the U.S.-led Human Genome Project, a titanic undertaking that culminated in April 2003. Coincidentally, the world was then under a global health threat because of the SARS outbreak. But it took another 17 years and COVID-19 to demonstrate just how important genomic sequencing can be during a pandemic.
In late 2019, after patients in Wuhan, China, began showing up in hospital with pneumonia-like symptoms, Chinese labs were able to sequence the genome of an emerging infectious agent. It proved to be a novel coronavirus – a close relative of the virus that causes SARS.
On January 11, 2020, the new viral sequence was posted online, making it available to scientists around the world. That made possible two developments that were crucial to the global response. The first was a genetic test for the disease – the PCR test that remains the gold standard for identifying cases today. The second was the use of the genome to recreate the coronavirus spike protein, the target that would guide vaccine makers and give them a huge head start.
As COVID-19 spread around the globe, another reason for sequencing became apparent. Because viruses accumulate genetic changes at a steady rate – about once every two weeks for COVID-19 – genomes can be used to determine which cases are closely related and how various outbreaks are likely to have started. By March 2020, Canadian labs with sequencing capabilities were pivoting to provide this information to health authorities.
“The emphasis in the beginning was really on characterizing how many different versions of the virus we had and whether we could use this to try to understand the patterns of spread,” said Guillaume Bourque, director of bioinformatics at the McGill Genome Centre in Montreal.
It was through working with Quebec’s public health agency that Dr. Bourque and Jesse Shapiro, another McGill researcher, were able to show that COVID-19 emerged in the province as the result of approximately 600 separate introductions, mainly from travellers returning from Florida and New York before borders closed.
In April 2020 the federal government recognized the need for a national approach to genomic surveillance and gave Genome Canada, a research-funding organization, the money to launch CanCOGeN. The project was modelled after a parallel effort in the United Kingdom, which was already leading the world in sequencing the virus. But where the British version has the advantage of dealing with a single national health service to collect and interpret coronavirus genomes, CanCOGeN must co-ordinate with six regional and provincial centres and the National Microbiology Laboratory in Winnipeg.
Academic researchers pitched in early to get the project started. At McMaster, Dr. Maguire, who was then a post-doctoral researcher, worked with Jalees Nasir, a PhD candidate, and others to develop the software that would allow raw data from the sequencing machines to be shared and analyzed across Canada.
For labs that were used to sequencing the genomes of bacteria or larger pathogens, there was a learning curve in understanding how to perform the task for a virus. While the COVID-19 genome is minuscule – just 30,000 bases (units of RNA) long, compared with the 3.2 billion DNA base pairs that make up the human genome – its brevity demands high precision. A single error in sequencing might be mistaken for a new variant.
“The bar for getting it right is just fundamentally higher,” Dr. McArthur said.
Then, after months of working quietly, everything changed for CanCOGeN. In December 2020, the U.K. genomics program confirmed that a new variant had emerged that was apparently driving a steep surge in cases. Up to that point, Canada had only been sequencing genomes from a few per cent of the total number of COVID-19 cases. Suddenly, in labs across the country, holiday plans were scrapped as researchers began sequencing patient samples in a round-the-clock quest for signs of the new variant.
“That’s when it became frantic,” said Dr. McArthur, whose lab played a key role in Ontario’s genomic surveillance. “We had to boost the sampling rate. We had to work more closely with public health to find out if this variant was in our neighbourhood.”
The variant – now known as Alpha – quickly turned up in Canada and was soon joined by others. With most of the population still unvaccinated, a new and worrying phase of the pandemic was under way.
As variants of COVID-19 emerge it is tempting to picture them as armies of conquest, rising up in disparate corners of the globe and sweeping into new territories they seek to dominate. Evolutionary biologists see a different picture – one in which the virus is on a journey of discovery across an abstract terrain known as the fitness landscape.
To move through the landscape, the virus needs to change its genetic code little by little. Whenever a genetic mutation shifts it toward higher ground, the virus gets better at infecting host cells. The mutation is rewarded as more copies of that particular version of the virus are made. But the landscape is complex. Sometimes the virus needs a few mutations working together to jump across a valley and find a more promising height that lies farther away. This is essentially what the variants have done.
More dramatic jumps are also possible. This is accomplished when two variants meet in the same host and swap genes – a process known as recombination. That prospect had scientists worried in British Columbia last spring, when the Alpha and Gamma variants were both on the rise in the province. The former variant is more transmissible but the latter is associated with cases of reinfection. The fear was that a new variant might appear with the worst features of both strains. Since then, evidence of recombination has been seen in other places, including the United States. This is one reason why Canadian public health labs are not simply tallying up how much of each variant they find. They are also looking for new lineages of the virus that no one has yet identified as a problem.
Samir Patel, Ontario’s chief microbiologist, said his lab is on the lookout for new variants that occur with a higher frequency than would be expected from mere chance. Since the start of the pandemic, the provincial lab in Toronto has sequenced more than 40,000 COVID-19 genomes – more than any other site in Canada – and it collects data from four other labs across the province. With case numbers relatively low compared with earlier this year, Ontario now has the capacity to randomly sequence one out of every four cases. If an unusual trend emerges, the lab can look at additional information about where the samples came from to see whether the virus has gained some new advantage, particularly among vaccinated individuals.
“You need to analyze the genomic data with other information to see if it’s real or if it’s background noise,” Dr. Patel said.
Scientists outside of the provincial labs say that’s the kind of information they have been wanting to get their hands on, so that the value of Canada’s COVID-19 genomic sequencing effort can be fully realized. That includes combining data from all the provinces to improve confidence in what they reveal.
In one of the first such examples, a study posted online in November examines how two different lineages of the Delta variant, designated as AY.25 and AY.27, have together overtaken an earlier version that was first identified in India. AY.27 is of special interest because it is found almost exclusively in Canada. It also shares a mutation with another lineage, AY.4.2, sometimes nicknamed “Delta plus,” that has been growing in other countries.
Sarah Otto, a theoretical biologist at the University of British Columbia and one of the co-authors of the study, said it underscores why it is important to be looking at genomes gathered across the country rather than at the provincial level in isolation. But the situation is still far from ideal, she said, because provinces do not have to include the vaccination status of the individuals who were sampled, and scientists are unable to access that information. As a result, important questions – such as whether AY.27′s success is due to an improved ability to skirt vaccines – cannot be answered.
“I think we have to be humbled in the face of a new disease,” Dr. Otto said. “No one person, no one team can do all possible analyses. We really do benefit from having all eyes on the problem.”
Today, most eyes that are following COVID-19′s evolutionary voyage are fixed on the Global Initiative on Sharing Avian Influenza Data, better known by its acronym, GISAID. Based in Munich, Germany, it is an international repository for virus genomes that was launched in 2008 as a way to promote data sharing between countries that were combatting the rise of bird flu (H5N1).
When COVID-19 arrived, GISAID became the world’s central clearing house for data about emerging strains. The platform is known for its transparency, and for allowing contributors to share without surrendering the rights to their data. But relative to other countries, Canada been slow to contribute.
That became obvious in August when an analysis published in the peer-reviewed journal Nature Biotechnology ranked 54 countries that submit data to GISAID based on their ability to minimize lag time – the delay between the sequencing of a coronavirus genome and its appearance in the database.
The U.K. topped the list with a median lag time of 16 days. The U.K. has led the world in genomic sequencing around COVID-19, and last December it became the first country to report the emergence of variants of concern. In comparison, Canada trailed at 48th place, with a median lag time of 88 days.
The reason for the delay boils down to individual provinces deciding when and how to share data, some of which also is curated by the National Microbiology Laboratory.
Anna Maddison, a spokesperson for the Public Health Agency of Canada, which operates the federal facility, said that Canada’s record has improved substantially since the summer because of investments in capacity and training across the system. This includes the creation of 16 “genomics liaison technical officers” to support the gathering and sharing of sequences.
By October, Canada’s median lag time for releasing genomic data was down to 35 days, and the overall proportion of data released is up to 60 per cent of all Canadian sequences, or about 170,000 viral genomes.
Given that GISAID now contains millions of COVID-19 genomes, it’s hard to know how much of a loss to the world Canada’s lack of sharing has been. On the other hand, it’s also clear that the U.K.’s own sequencing effort has provided a huge benefit. Even thought the variants have not been contained, an understanding of how they have altered the dynamics of the pandemic has proved crucial for many jurisdictions – and disastrous for others that did not make use of the available knowledge.
As of this week, Britain has sequenced more than 1.4 million COVID-19 virus genomes, or about five times what Canada has achieved. Nearly all of that information has been made available to researchers with minimum delay. But then the U.K. program is led by academic researchers who are empowered and committed to data sharing, including the program’s director Sharon Peacock, a professor of public health and microbiology at the University of Cambridge.
When asked last spring why it was important for health agencies to share genomic information about the coronavirus, Dr. Peacock told The Globe and Mail: “Why wouldn’t you share it, given that it’s such an important resource for your pandemic response?”
In Canada, where public health is largely a provincial matter, CanCOGeN has faced an uphill battle trying to foster a similar, unifying sensibility.
Nevertheless, recent progress suggests that the Canadian network is finally building up some momentum, and a sense that genomics is coming into its own as a tool for public health.
As one example, Dr. Maguire has recently teamed up with Samira Mubareka, a microbiologist at Sunnybrook, in a project that uses genomic data to understand which public health interventions were most effective at controlling the spread of COVID-19 in vulnerable populations. More broadly, there is a sense among researchers that the lessons learned can be extended beyond COVID-19 to other pathogens, including influenza.
In the meantime, COVID-19 remains the focus as new lineages keep appearing on Dr. Maguire’s computer screen, showing that the virus has not yet found the highest peak in the fitness landscape.
“I think there’s a sense that there’s still some function to be gained by the virus,” Dr. Mubareka said. “Regardless of what we’re hoping will happen, we need to prepare for the virus to evolve, to become more transmissible and cause more severe disease.”