Definitions for those complicated health and data science terms that experts use

ace2 receptor

ACE2 is a protein on the surface of many cell types. It is an enzyme that generates small proteins – by cutting up the larger protein angiotensinogen – that then go on to regulate functions in the cell. Using the spike-like protein on its surface, the SARS-CoV-2 virus binds to ACE2 – like a key being inserted into a lock – prior to entry and infection of cells. Hence, ACE2 acts as a cellular doorway – a receptor – for the virus that causes COVID-19.


Bioinformatics is a field of computational science that has to do with the analysis of sequences of biological molecules. It is a subdiscipline of biology and computer science concerned with the acquisition, storage, analysis, and dissemination of biological data, most often DNA and amino acid sequences. Bioinformatics uses computer programs for a variety of applications, including determining gene and protein functions, establishing evolutionary relationships, and predicting the three-dimensional shapes of proteins.


A clade is a term for a group of organisms that all originate from a common ancestor, and is widely used in biology. In virology, a clade describes groups of similar viruses based on their genetic sequences, and changes in those viruses can also be tracked using phylogeny. SARS-CoV-2 is itself a clade within the family coronaviridae and the genus betacoronavirus. Generally, the genetic variations of a virus are grouped into clades, which can also be called subtypes, genotypes, or groups.There are multiple different nomenclatures for the SARS-CoV-2 clades. Each health organization may use its own identifier for different variants, amoung the clade naming methods are the PANGOLIN nomenclature, GISAID nomenclature, and Nextstrain nomenclature.

clinical outcomes of sars-cov-2

Patients with SARS-CoV-2 infection can experience a range of clinical manifestations, which can be grouped into the following illness categories: asymptomatic or presymptomatic infection, mild illness, moderate illness, severe illness, and critical illness.

contact tracing

The practice of identifying, notifying, and monitoring individuals who may have had close contact with a person having a confirmed or probable case of an infectious disease as a means of controlling the spread of infection


coronavirus disease (covid-19)

Coronavirus disease is an infectious disease caused by the SARS-CoV-2 virus. Coronaviruses are a of a group of RNA viruses that cause a variety of respiratory, gastrointestinal, and neurological diseases in humans and other animals.

delta variant

Pango Lineage: B.1.617.2 and AY lineages. May spread more easily than other variants. May cause more severe cases than the other variants. Breakthrough infections in people who are vaccinated are expected,

disease severity

While all people are susceptible to the SARS-CoV-2 infection, the nature and severity of the disease varies significantly among individuals and populations. Importantly, reported disease burdens, case fatality rates, utilization of resources, and comorbidities differ considerably from country to country. There are, however, still uncertainties about the severity of the disease among individuals or the reason behind a more severe disease in some cases. There is a strong possibility that the severity of this disease depends on a complicated interaction between the host, virus, and environment, which leads to different clinical outcomes.

disease transmissibility

A quality that describes how easy it is for a disease to spread from an infected person to a susceptible person. Transmissibility is determined by the infectivity of the pathogen, the contagiousness of the infected individual, the susceptibility of the exposed individual, the contact patterns between the infected individual and the exposed individual, and the environmental stress exerted on the pathogen during transmission. These will determine the scale and intensity of control measures needed to suppress transmission.

Nature Reviews


Epidemiology is the study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to the control of health problems. This method involves scientific, systematic, and data-driven analysis and interpretation of the frequency and patterns of health-related states and events that arent limited to diseases. Epidemiology is often described as the basic science of public health and relies on evidence-based practice to direct prompt and effective public health control and prevention measures.

genetic epidemiology

Epidemiology draws on methods from other scientific fields, one of the fields it draws from is genetic epidemiology; which is a medical discipline that seeks to understand how genetic factors interact with the environment in the context of disease in populations. Areas of study include the causes of inherited disease and its distribution and control.


A genome is an organism's complete set of DNA, including all of its genes. Each genome contains all of the information needed to build and maintain that organism.

genomic epidemiology.

Genomic epidemiology make use of the genomic data of pathogens to understand the distribution and spread of an disease1 in a specified population as well the application genomic data in the control of health challenges. The main difference between genomics and genetics is that genetics scrutinizes the functioning and composition of the single gene where as genomics addresses all genes and their inter relationships in order to identify their combined influence on the growth and development of the organism. Genomic epidemiology seeks to derive a statistical and quantitative analysis of how genetics work in populations, and plays an important role in understanding disease as a health related state or event.


genomic sampling / sampling

Is a statistical analysis technique used to select, manipulate and analyze a representative subset of coronavirus (SAR-CoV-2 specifically) samples to identify patterns and trends in the context of the Covid-19 pandemic .


genomic sequencing / sequencing

According to NIH | NCI, genomic sequencing is a laboratory method that is used to determine the entire genetic makeup of a specific organism or cell type. This method can be used to find changes in areas of the genome. These changes may help scientists understand how specific diseases form. CDC defines genmic sequencing as a process scientists use to decipher the genetic material found in an organism or virus. Sequences from specimens can be compared to help scientists track the spread of a virus, how it is changing, and how those changes may affect public health.

genomic surveillance

Viruses can be tracked using genomic sequence data collected by CDC and its partners. Effective surveillance does not require the sequencing of a specimen from every COVID-19 case. Instead, scientists rely on collecting enough sequence data from representative populations to detect new variants and monitor trends in circulating variants.


A mutation refers to a single change in a virus’s genome (genetic code). Mutations happen frequently, but only sometimes change the characteristics of the virus.

omicron virus

Pango Lineage: B.1.1.529 and BA lineages. Variant has been detected in at least 145 countries and dominating globally Split into different lineages based on different mutational profiles. Data suggest that Omicron is less severe in general. However, a surge in cases may lead to significant increases in hospitalization and death. More data are needed to fully understand the severity of illness and death associated with this variant.

pango lineage nomanclature

Pango lineage names comprise an alphabetical prefix and a numerical suffix. The alphabetical prefix contains latin characters only which are case insensitive. 1). The letters I, O and X are not used in the prefix of the names of standard lineages. 2). Each dot in the numerical suffix means “descendent of” and is applied when one ancestor can be clearly identified. So lineage B.1.1.7 is the seventh named descendent of lineage B.1.1 and C.1 is the first named descendent of lineage C. 3). The suffix can contain a maximum of 3 hierarchical levels, referred to as the primary, secondary and tertiary suffixes. . 4). In order to avoid four or more suffix levels, a new lineage suffix is introduced, which acts as an alias. For example, C is an alias of B.1.1.1 hence the descendent of B.1.1.1 is called C.1 (rather than B. Consequently the name C, by itself, is never directly applied to a sequence. 5). In some instances, it is not possible to unambiguously identify an ancestral lineage within the Pango nomenclature for a given lineage of interest. This is the case for lineages A and B, because of their position near the root of the phylogeny. For these “special case ancestors”, the alphabetical part alone can be applied directly to sequences. In all other cases the suffix is mandatory.



A pathogen is an organism that causes disease. Pathogens are different and can cause disease upon entering the body, all a pathogen needs to thrive and survive is a host. Once the pathogen sets itself up in a host’s body, it manages to avoid the body’s immune responses and uses the body’s resources to replicate before exiting and spreading to a new host. Pathogens can be transmitted a few ways depending on the type, the four most common types of pathogens are viruses, bacteria, fungi, and parasites.

Health Line

phylogenetic tree

Phylogenetic trees are often constructed based on genetic (or genomic) data using modern computer algorithms. Several methods can be used to build trees, like parsimony, maximum likelihood, and Bayesian analysis. These methods all have distinct assumptions and can give different results.

Luke Harmon


Severe Acute Respiratory Syndrome (SARS) - Coronavirus (CoV), often referred to as SARS-CoV-2 is the virus that causes COVID-19. The virus changes over time, most changes have little to no impact on the virus’ properties. However, some changes may affect these properties, such as how easily it spreads, the associated disease severity, the performance of vaccines, therapeutic medicines, diagnostic tools, or other public health and social measures.

sars-cov-2 rna

The genetic material of SARS-CoV-2, the coronavirus that causes COVID-19, is called ribonucleic acid (RNA). To replicate, and therefore establish infection, SARS-CoV-2 RNA must hijack a host cell and use the cell’s machinery to duplicate itself. Errors often occur during the process of duplicating the viral RNA. This results in viruses that are similar but not exact copies of the original virus. These errors in the viral RNA are called mutations, and viruses with these mutations are called variants. Variants could differ by a single or many mutations.

sars-cov-2 spike protein

The word corona means crown and refers to the appearance that coronaviruses get from the spike proteins sticking out of them. These spike proteins are important to the biology of this virus. The spike protein is the part of the virus that attaches to a human cell to infect it, allowing it to replicate inside of the cell and spread to other cells. Because of the importance of this specific part of the virus, scientists who sequence the virus for research constantly monitor mutations causing changes to the spike protein through a process called genomic surveillance.

sars-cov-2 strain

Any change in the viral genetic sequence during replication is known as a mutation and descendents of the reference SARS-CoV-2 with new mutations are sometimes called variants. When a new variant with one or multiple mutations shows functional properties different from the original virus and becomes established in a population, it is referred to as a new strain of the virus. All strains are variants, but not all variants are strains.

Economic Times

sars-cov-2 variant

A variant is a viral genome (genetic code) that may contain one or more mutations, these mutations differentiate it from other variants of the SARS-CoV-2 viruses. As expected, multiple variants of SARS-CoV-2 have been documented globally throughout the pandemic. To inform local outbreak investigations and understand national trends, scientists compare genetic differences between viruses to identify variants and how they are related to each other.

variant escalation

Evaluation is undertaken by a group of subject matter experts who assess available data, including variant proportions at the national and regional levels and the potential or known impact of the constellation of mutations on the effectiveness of medical countermeasures, severity of disease, and ability to spread from person to person. Given the continuous evolution of SARS-CoV-2 and our understanding of the impact of variants on public health, variants may be reclassified based on their attributes and prevalence. Variants may be designated by public health organizations as a Variant of Concern (VOC), a Variant of Interest (VOI), or a Variant of High Concern (VOHC) due to shared attributes and characteristics that may require public health action.

variant of convern (cov)

A variant for which there is evidence of an increase in transmissibility, more severe disease (for example, increased hospitalizations or deaths), significant reduction in neutralization by antibodies generated during previous infection or vaccination, reduced effectiveness of treatments or vaccines, or diagnostic detection failures. In addition to the possible attributes of a variant of interest, for a variant to be characterised as a VOC it may have the following attributes: 1). Evidence of impact on diagnostics, treatments, or vaccines 2). Evidence of increased transmissibility 3). Evidence of increased disease severity. The Delta Variant and the Omicron variant have been declared variants of concern.

variant of high consequence (vohc)

A VOHC has clear evidence that prevention measures or medical countermeasures (MCMs) have significantly reduced effectiveness relative to previously circulating variants. In addition to the parameters used to classify VOC, VOHC have the following impact on medical conter mesures: 1. failure of diagnostic test targets. 2. there is evidence to suggest a significant reduction in vaccine effectiveness, and there is a disproportionately high number of infections in vaccinated persons, or very low vaccine-induced protection against severe disease. 3. Significantly reduced susceptibility to multiple EUA or approved therapeutics and 4. More severe clinical disease and increased hospitalizations. A variant of high consequence would require notification to WHO under the International Health Regulations, reporting to CDC, an announcement of strategies to prevent or contain transmission, and recommendations to update treatments and vaccines.

variant of interest (voi)

A variant with specific genetic markers that have been associated with changes to receptor binding, reduced neutralization by antibodies generated against previous infection or vaccination, reduced efficacy of treatments, potential diagnostic impact, or predicted increase in transmissibility or disease severity. Possible attributes of a Variant of Interest: 1). Specific genetic markers that are predicted to affect transmission, diagnostics, therapeutics, or immune escape. 2). Evidence that it is the cause of an increased proportion of cases or unique outbreak clusters. 3). Limited prevalence or expansion nationally or in other countries.

variants being monitored (vbm)

Variants that are designatied as Variants Being Monitored include those whose data indicates there is a potential or clear impact on approved or authorized medical countermeasures, or those that have been associated with more severe disease or increased transmission but are no longer detected, or are circulating at very low levels and pose no imminent risk to public health services. These variants are monitored closely for changes and their propotions and changes are analaysed continually. If changes in new genomic data analysed warrants more concrern, the classication of the variant may be changed.

viral load

When an individual is first infected with COVID-19, the virus starts replicating inside cells and then infects more cells. Viral load is a measure of the total number of viral particles inside the individual. The more replications that have occurred, the higher the viral load

Alex Polyakov


A virus is a small collection of genetic code, either DNA or RNA, surrounded by a protein coat. A virus cannot replicate alone. Viruses must infect cells and use components of the host cell to make copies of themselves. Often, they kill the host cell in the process, and cause damage to the host organism. Because viruses don’t have the same components as bacteria, they cannot be killed by antibiotics; only antiviral medications or vaccines can eliminate or reduce the severity of viral diseases. Covid-19 is caused by a novel coronavirus known as SARS-CoV-2.

virus isolate

An isolate is the name for a virus that has been isolated from an infected host and propagated in culture. The first isolates of SARS-CoV-2 were obtained from patients with pnemonia in Wuhan in late 2019. A small amount of fluid was inserted into their lungs, withdrawn, and placed on cells in culture. An isolate comes from a single host/person.


virus lineage

A lineage is a group of closely related viruses with a common ancestor, SARS-CoV-2 has many lineages; all cause COVID-19. Genetic lineages of SARS-CoV-2 have been emerging and circulating around the world since the beginning of the COVID-19 pandemic. SARS-CoV-2 genetic lineages are routinely monitored through epidemiological investigations, virus genetic sequence-based surveillance, and laboratory studies.