Principles of Virology, Volume 1. Jane Flint
such as tRNAs and aminoacyl-tRNA synthetases. Tupanviruses, isolated from soda lakes in Brazil and deep ocean sediments, encode all 20 aminoacyl-tRNA synthetases, 70 tRNAs, multiple translation proteins, and more. Only the ribosome is lacking. Why would large viral genomes carry these genes when they are available in their cellular hosts? Perhaps by producing a large part of the translational machinery, viral mRNAs can be more efficiently translated. This explanation is consistent with the finding that the codon and amino acid usage of tupanvirus is different from that of the amoeba that it infects.
EXPERIMENTS
Planaria and mollusks yield the biggest RNA genomes
In the past 20 years the development of high-throughput nucleic acid sequencing methods has rapidly increased the pace of virus discovery. Yet in that time, while the largest DNA genomes have increased nearly ten times, the largest known RNA viral genome has only increased in size by ten percent. This situation has now changed with the discovery of new RNA viruses of planarians and mollusks.
Until very recently, the biggest RNA virus genome known was 33.5 kb (ball python nidovirus), which is much larger than the average sized RNA virus genome of 10 kb. The reason for the difference is that RNA polymerases make errors, and most do not have proofreading capabilities. Nidovirus genomes encode a proofreading exoribonuclease which improves replication fidelity and presumably allows for larger genomes. Even with a proofreading enzyme, the biggest RNA virus genome is much smaller than the minimal cellular DNA genome, which is 200 kb. The results of two new studies show that we can find larger virus RNAs, suggesting that we have not yet reached the size limit of RNA genomes.
A close study of the transcriptome of a planarian revealed a new nidovirus, planarian secretory cell nidovirus, with an RNA genome of 41,103 nucleotides. This viral genome is unusual because it encodes a single, long open reading frame of 13,556 amino acids— the longest viral open reading frame (ORF) discovered so far. All the other known nidoviruses encode multiple open reading frames. Phylogenetic analysis of known nidoviruses suggests that the planarian virus arose from viruses with multiple ORFs, after which their single ORF expanded in size.
The other nidovirus with a large RNA genome was discovered by searching all the available RNA sequences of the mollusk Aplysia californica. With a simple nervous system of 20,000 neurons, this mollusk has been studied as a model system in many laboratories. Aplysia californica nido-like virus has an RNA genome of 35,906 nucleotides with ORFs that encode two polyproteins.
From the perspective of genome size, the discovery of these nidovirus genomes suggests that viruses with even larger RNAs remain to be discovered. In both cases the viruses were identified from sequences that had been deposited in public databases, although in both cases, infectious viruses were not reported. Nevertheless, many organisms have not yet had their genomes sequenced and it is likely that many RNA viruses remain to be discovered. Declaring an upper limit on RNA genome size does not seem reasonable if we have not sampled every species.
Saberi A, Gulyaeva AA, Brubacher JL, Newmark PA, Gorbalenya AE. 2018. A planarian nidovirus expands the limits of RNA genome size. PLoS Pathog 14:e1007314.
Debat HJ. 2018. Expanding the size limit of RNA viruses: evidence of a novel divergent nidovirus in California sea hare, with a ∼39.5 kb virus genome. bioRxiv 307678.
Another intriguing set of genes belongs to tetraselmis virus 1, which infects green algae. These hosts, found in nutrient-rich marine and fresh waters, are photosynthetic. The viral genome encodes pyruvate formate-lyase and pyruvate formate-lyase-activating enzyme, which are key members of cellular anaerobic respiration pathways and allow energy production when no oxygen is available. Green algae may use this system in waters depleted of oxygen by exuberant algal growth. If this process occurs in cells, why does the viral genome carry some of the genes involved? The answer is not known, but it is possible that the extra metabolic demands placed on cells during virus replication—especially at night— require additional fermentation enzymes for energy production. The presence of these genes suggests that tetraselmis virus 1 can change host metabolism, perhaps facilitating its reproduction.
These large viruses therefore have sufficient coding capacity to escape some restrictions imposed by host cell biochemistry. The smallest genome of a free-living cell is predicted to comprise <300 genes (based on bacterial genome sequences). Remarkably, this number is smaller than the genetic content of large viral DNA genomes. Nevertheless, the big viruses are not cells: their reproduction absolutely requires the cellular translation machinery, as well as host cell systems to make membranes and generate energy.
The parameters that limit the size of viral genomes are largely unknown. There are cellular DNA and RNA molecules that are much longer than those found in virus particles. Consequently, the rate of nucleic acid synthesis is not likely to be limiting. Nor does the capsid volume appear to limit genome size: the icosahedral shell of Mimivirus, which houses a 1.2 millionbase-pair DNA genome, is constructed mainly of a single major capsid protein. For larger genomes, the solution is helical symmetry, which can in principle accommodate very large genomes. The Pandoraviruses, with the largest known DNA viral genomes (2,500 kbp), are housed in decidedly nonisometric ovoid particles 1 μm in length and 0.5 μm in diameter.
There is no reason to believe that the upper limit in viral particle and genome size has been discovered. The core compartment of a mimivirus particle is larger than needed to accommodate the 1,200-kbp DNA genome. A particle of this size could, in principle, house a genome of 6 million bp if the DNA were packed at the same density as in polyomaviruses. Indeed, if the genome were packed into the particle at the density reached in some bacteriophages, it could be >12 million bp, the size of that of the smallest free-living unicellular eukaryote.
In cells, DNAs are much longer than RNA molecules. RNA is less stable than DNA, but in the cell, much of the RNA is used for the synthesis of proteins and therefore need not exceed the size needed to specify the largest polypeptide. However, this constraint does not apply to viral genomes. Yet the largest viral single-molecule RNA genomes, the 41-kb (+) strand RNAs of the nidoviruses (Box 3.4), are dwarfed by the largest (2,500kbp) DNA virus genomes. Susceptibility of RNA to chemical and nuclease attack might limit the size of viral RNA genomes. However, the most likely explanation is that there are few known enzymes that can correct errors introduced during RNA synthesis. An exonuclease encoded in the coronavirus genome is one exception: its presence could explain the large size of these RNAs. DNA polymerases can eliminate errors during polymerization, a process known as proofreading, and remaining errors can also be corrected after DNA synthesis is complete. The average error frequencies for RNA genomes are about 1 misincorporation in 104 or 105 nucleotides polymerized. In an RNA viral genome of 10 kb, a mutation frequency of 1 in 104 would produce about 1 mutation in every replicated genome. Hence, very long viral RNA genomes, perhaps longer than 40 kb, would sustain too many mutations that would be lethal. Even the 7.5-kb genome of poliovirus exists at the edge of infectivity: treatment of the virus with the RNA mutagen ribavirin causes a >99% loss in a single round of replication.
When new viral genomes are discovered, often many of the putative genes are previously unknown. For example, >93% of the >2,500 genes of Pandoravirus salinus resemble nothing known, and 453 of the 663 predicted open reading frames of tetraselmis virus 1 show no sequence similarity to known proteins. The implication of these findings is clear: our exploration of global genome sequences is far from complete, and viruses with larger genomes might yet be discovered.
The Origin of Viral Genomes
The absence of bona fide viral fossils, i.e., ancient material from which viral nucleic acids can be recovered, might appear to make the origin of viral genomes an impenetrable mystery. The oldest viruses recovered from environmental samples, the 30,000-year-old