Health News

Purifying selection drives time-dependency of short-term mutation rates in SARS-CoV-2 and pH1N1 influenza

High-throughput sequencing is a rapid form of genomic sequencing that has been extensively used during infectious disease outbreaks, both to track and contain the infection and to monitor for future outbreaks. The ability to perform quantitative analysis of infectious disease pathogen genomes and to identify the number and rate of mutations occurring in the genome near their time of occurrence is a valuable advantage of this technology.

Study: Purifying selection determines the short-term time dependency of evolutionary rates in SARS-CoV-2 and pH1N1 influenza. Image Credit: vchal / Shutterstock


A new paper, available on the medRxiv* preprint server, indicates the adaptation undergone by the SARS-CoV-2 molecular genome and the pH1N1 influenza pandemic strain during the first year of their spread, separately. Using statistical and phylogenetic skills, the researchers present a picture of the emergence, rates of change, and of growth, of these two viruses as time passes.

They estimate how time and sample size affect the final result. The evolutionary parameters for both depend on when the sampling is carried out, and decline by 50% and 100% over a year. The researchers also found that at four months or more from the onset of the pandemic, the rates of growth and the dates of emergence show little change, allowing reasonable inferences to be drawn.

The findings also demonstrate the role of high rates of substitutions on the final branches of the phylogenetic tree in driving the time-dependent nature of the mean substitution rate, pushing it 2-4 times higher than that of the internal branches. The high substitution rates are closely associated with a rise in segregating non-neutral sites.

This shows “the role of purifying selection in generating the time-dependency of evolutionary parameters during pandemics,” say the researchers.


While rapid whole-genome sequencing is useful in inferring the time of origin, the rate of growth of the outbreak, and the rate of evolution, over short times, the danger it presents is in not allowing for the dynamic changes that occur in these parameters over time. Thus, the snapshot effect freezes the growth rate, causing findings to differ from one study to another performed at a different time.

Of course, other factors may contribute, such as mutants that are very alike, which makes it impossible to derive a sound phylogeny.

Strong purifying selection has been demonstrated for specific species over periods of thousands to millions of years, based on the dN/dS ratio, but shorter periods do not allow this because fewer samples are available to track variants that are relatively uncommon. In the ongoing COVID-19 pandemic, large databases such as the Global Initiative for Sharing All Influenza Data (GISAID) and GenBank have come up, with millions of sequences shared from all over the world.

Secondly, tracking sequence differences over short timescales lead to the identification of segregating polymorphisms rather than fixed mutations leading to the emergence of independent lineages, both of which are associated with varying dN/dS ratios over time.

Study aims

The current study was aimed at using the polymorphisms observed among the genomic sequences in these two viruses to estimate the rate of substitution and the evolutionary rate ratio (dN/dS ratio).

The rationale was that most early mutations are likely to be deleterious and thus segregating, leading to their rapid dying out as a result of purifying selection. When the population exposed and susceptible is very large, the strain may take a longer time to fade out because the effect of the purifying selection is incomplete.

The reason for the variation in virus substitution rates over time was explored in the current study. The scientists also estimated the “molecular clock rate, time of origin, growth rate, and number of non-neutral sites for different datasets that represent different timescales of genomic observation.”

What were the findings?

The investigators found that the low time-based signals in the early pandemic data could not reliably drive the study of evolutionary changes over the first two months, causing the substitution rate to be wrongly underestimated, and leaving wide confidence intervals in the estimates.

With pH1N1, conversely, ample data was available so that the molecular clock ran at two times higher over the first three months of the influenza pandemic compared to the next period and then achieved stability. This was not seen with SARS-CoV-2, where the inferred substitution rate continues to decline as the time period of measurement increases.

The researchers compared two models, the logistic growth and exponential growth models, showing the former to be more accurate in pinpointing the time of origin of both viruses. Whereas the pH1N1 was shown to be 01/18/2009, that of the SARS-CoV-2 was 28/10/2019 for SARS-CoV-2, which agrees with earlier studies.

Since the growth rate declines after the first few months, the exponential growth model fails to provide a reasonable estimate for the time of origin of SARS-CoV-2 when inferred from samples collected six months or more from the start of the pandemic. This pushes back the time of origin by underestimating the growth rate.

What are the implications?

The argument put forward by the researchers was that low-level harmful mutations tend to segregate and are then eliminated as the effect of purifying selection. At an early time point, this means they make up a larger share of all deleterious mutations, but a smaller proportion over time, as other neutral or beneficial polymorphisms, accumulate due to their favorable effect on the virus.

This hypothesis is supported by the findings of this study, where segregating mutations make up less than 15% in the first few months of sampling, for both viruses. Especially, the SARS-CoV-2 ORF1ab gene showed this phenomenon, with a fourfold fall in the number of low-frequency non-neutral sites over the first 4-5 months of the pandemic to about 15%, vs. two-fold for the hemagglutinin gene of the pH1N1 virus.

Later, such sites remain constant in frequency, but actually increase in pH1N1, as shown by the increase from 15% to 17% near the start of 2010. This concords with the greater number of pH1N1 strains identified during this period, which in turn means more segregating deleterious mutations were to be expected in the population.

They also found that low-frequency replacement sites are more frequent than silent sites in ORF1ab, while the pattern is the opposite with pH1N1 hemagglutinin gene. Moreover, they say, “The total number of non-neutral sites in ORF1ab is higher because it is a much longer gene (~21,000 nt) compared to HA (~1,800 nt).”

The study shows that the variation in estimates of the substitution rate over time is driven by the high substitution rate at the terminal but not internal branches, the latter being independent of time. Secondly, they show the existence of a strong association between the high terminal branch mutation rates and the increased number of uncommon non-neutral mutation sites in both viruses.

Most of the variation is in the low-frequency sites for both viruses. They also found obvious dips and peaks in the rate of substitution at internal branches. A dip occurred over December 2009 with the pH1N1, with a later rise from February 2010 onwards, vs. August 2020 onwards for SARS-CoV-2.

By dividing the data into successively longer temporal intervals, we showed that the over-abundance of deleterious mutations at terminal branches are the main reason behind the gradual decay in mean substitution rate over time in SARS-CoV-2 and pH1N1.”

This calls for further research to accurately quantify the impact of purifying selection over time on substitution rates.

*Important notice

medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Ghafari, M. et al. (2021). Purifying selection determines the short-term time dependency of evolutionary rates in SARS-CoV-2 and pH1N1 influenza. medRxiv preprint doi:,  

Posted in: Medical Science News | Medical Research News | Disease/Infection News

Tags: Coronavirus Disease COVID-19, Evolution, Frequency, Gene, Genome, Genomic, Genomic Sequencing, High-Throughput Sequencing, Influenza, Mutation, Pandemic, Pathogen, Phylogeny, Quantitative Analysis, Research, SARS, SARS-CoV-2, Virus

Comments (0)

Written by

Dr. Liji Thomas

Dr. Liji Thomas is an OB-GYN, who graduated from the Government Medical College, University of Calicut, Kerala, in 2001. Liji practiced as a full-time consultant in obstetrics/gynecology in a private hospital for a few years following her graduation. She has counseled hundreds of patients facing issues from pregnancy-related problems and infertility, and has been in charge of over 2,000 deliveries, striving always to achieve a normal delivery rather than operative.

Source: Read Full Article