As mutations of the virus that causes COVID-19 emerge around the world, variant names such as B.1.1.7, B.1.617.2 and P.1 have become almost household terms. But where do the names come from and what do the letters and numbers mean?
SARS-CoV-2 has been fraught with naming controversies ever since the virus first took hold in China in late 2019. Terms including “Wuhan virus” and “China virus” became loaded descriptions, and even now health experts avoid using names such as “Vietnamese variant,” “Indian variant” or “British variant” for fear of stigmatizing countries and curbing scientific co-operation.
“We don’t want anything that might dissuade people from sharing data with the global community,” said Oliver Pybus, a professor of evolution and infectious disease at the University of Oxford. “If there is any barrier to people sharing that data immediately and openly, that’s a threat to all of us.”
Dr. Pybus was part of a group of 10 scientists in Britain who took a novel approach to the naming challenge. They recognized early in the pandemic that researchers would need a simple and adaptable way of identifying mutations as the virus evolved. That was not only critical for scientists; it would also be key in helping drug makers develop vaccines against variants.
The group came up with a system called the Pango nomenclature. It’s based on the way biologists categorize organisms by using a tree-like structure where each branch represents the ancestors of a particular life form, such as all descendants of primates.
Since the origins of SARS-CoV-2 are still unclear, the group began with two main versions of the virus that emerged in Wuhan in late 2019 and early 2020. They named them “A” and “B.” Each significant mutation of those strands – such as the appearance of the virus in a new location, a rapid increase in cases, or the evolution of new genetic traits – creates a new lineage or branch on the tree. New lineages are designated A.1, A.2 or B.1, B.2 and so on.
The virus has evolved so rapidly that most lineages have developed their own mutations, or branches. To accommodate these changes, Pango designates sub-lineages with dots and numbers, such as B.1.1, which is a subset of the B.1 lineage. The name B.1.1.7, which describes the variant first detected in Britain, is the seventh descendant of the sub-lineage B.1.1, which in turn is the first descendent of the lineage B.1.
In order to prevent the names from becoming too long, Pango incorporates a series of substitutions. For example, a sub-lineage in South Africa called C.1 is derived from B.188.8.131.52, with C being a substitute for B.1.1.1. So C.1 is the first sub-lineage of B.1.1.1. The name P.1, associated with a variant first detected in Brazil, is an alias generated from a sub-lineage of B.1.1.28.
Pango “is pretty much infinitely extendable,” Dr. Pybus told a recent media briefing. “We can handle millions of lineage names without any problem at all.”
The team developed Pango in April, 2020. Since then the two root strands have evolved into an evolutionary tree with 1,260 lineages encompassing 1.7 million genome sequences of the virus, and the total keeps rising. B.1.1.7 is one of the largest lineages with more 600,000 genetic sequences. The vast majority of mutations, and Pango names, go unnoticed by the public. It’s only when some are designated as “variants of concern” by health agencies that the Pango term becomes topical.
Pango has proven so successful among researchers that it has become the global standard for tracking COVID-19 and many of its names are widely used by health officials, journalists and world leaders. Last week Pango’s founders announced a formal structure for the system to handle the growing number of mutations, with committees to verify new lineages and assign names.
Much of the work to develop the system and update the data has been done on a volunteer basis by a group of PhD students at Oxford and the University of Edinburgh, Dr. Pybus said. “I think when people step back and they look at what’s been done over the pandemic, the achievements of [the students] has been astonishing.” The technology the students created, which involved machine learning, is handling “orders of magnitude more data than we have ever had to process before.”
Aine O’Toole, a PhD student at Edinburgh, is among those who have been involved in the project from the start. Before the pandemic, Ms. O’Toole had been developing software tools for polio surveillance in hospitals. She switched to COVID-19 early in 2020 and joined Dr. Pybus’s team. “It has been a very busy year but really quite rewarding from an academic point of view,” she told the briefing. “I’ve been able to actually contribute quite a lot, which has been actually really nice.”
Dr. Pybus said Pango was developed for scientists and he never imagined the terms would be used in common parlance. While he’s happy to see the names popularized, he becomes mildly irritated when commentators get them wrong. For example, he pointed out that B.117, is not the same as B.1.1.7 or B.1.17. And the variant first detected in India actually has three subsets – B.1.617.1, B.1.617.2, B.1.617.3.
The origins of some variants have also been misconstrued, he said. South Africa has been a leader in genome sequencing but when the country notified the world about B.1.351 last year it was quickly tagged the “South African variant” – even though the mutation could have been imported from somewhere else by travellers. “We don’t know if it originated in South Africa and the stigma associated with a variant originating in a place is a tricky thing to work with,” he added.
The World Health Organization has wrestled with the challenge of avoiding geographic descriptions for variants of concern, and on Monday it announced a naming scheme involving letters of the Greek alphabet. For example, the WHO has labelled B.1.1.7 as “Alpha,” B.1.351 as “Beta” and P.1 as “Gamma.”
The system won’t replace Pango, but the WHO said the scientific names “can be difficult to say and recall, and are prone to misreporting. As a result, people often resort to calling variants by the places where they are detected, which is stigmatizing and discriminatory.” It has urged national authorities and media outlets to use the new labels.
Dr. Pybus said the announcement may be too late because the Pango names have become so familiar. “Are names like B.1.1.7 already just sticking?,” he said. “It’s very difficult to change that once everyone starts to use a name.”
Our Morning Update and Evening Update newsletters are written by Globe editors, giving you a concise summary of the day’s most important headlines. Sign up today.