Principles of Virology. Jane Flint
in which polysomes are treated with RNases and the 20- to 30-nucleotide ribosome-protected fragments are sequenced. The information provides insight into translational control of gene expression and the mechanism of protein synthesis and allows annotation of translated sequences.
A number of methods yield global views of protein-nucleic acid interactions at unprecedented levels of resolution. Chromatin-immunoprecipitation sequencing (ChiP-seq) can localize protein-DNA interactions with single-nucleotide precision (Fig. 2.20). In this method, protein-DNA complexes are immunoprecipitated with antibodies to DNA binding proteins, such as transcription proteins, histones, or even specific methyl groups on histones. The DNAs are then subjected to high-throughput sequencing to identify the sites on DNA to which these proteins bind. An early variant called ChiP on chip employed microarrays to identify protein binding sites on DNA.
Figure 2.20 Chromatin immunoprecipitation and DNA sequencing, ChiP-seq. This technique is used to identify the precise binding sites of proteins on DNA. DNA is cross-linked to proteins by treating cells with formaldehyde, followed by sonication to shear DNA to 200 to 1,000 bp. Beads coated with antibody to the DNA binding protein of interest are added and precipitated. The protein is removed and DNA purified and subjected to high-throughput sequencing to identify protein binding sites on the DNA.
Many protocols have been devised for genome-wide analysis of RNA-protein interactions that are based on cross-linking immunoprecipitation (CLIP). In CLIP-seq, RNA-protein complexes are cross-linked in cells in culture with UV light. Cells are lysed and proteins of interest are immunoprecipitated. Proteins are removed by digestion with protease, DNA is synthesized from the previously bound RNA with reverse transcriptase, and the product is subjected to high-throughput sequence analysis. Interaction sites are identified by mapping the nucleic acid sequence reads to the transcriptome. A modification of this technique is called photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation, PAR-CLIP. In this method, photoreactive ribonucleoside analogs such as 4-thiouridine are incorporated into RNA transcripts in living cells. Irradiation with UV light induces efficient cross-linking of RNAs containing these analogs to interacting proteins. Immunoprecipitation and sequencing are then carried out as in other CLIP methods.
Other genome-wide mapping analyses that can be performed include identifying the binding sites for long noncoding RNAs (lncRNA) on chromatin using capture hybridization analysis of RNA targets (CHART). In this method, biotin-linked oligonucleotides that are complementary to the target RNA are designed. These are added to reversibly cross-linked chromatin extracts, and the target RNA is purified with streptavidin beads, which bind with high afnity to biotin. The sequences of the RNA targets identify the genomic binding sites of endogenous RNAs. A related method is chromatin isolation by RNA purification (ChIRP), in which tiled oligonucleotides labeled with biotin are used to retrieve specific lncRNA bound to protein and DNAs.
How DNA is organized in virus particles and in the cell nucleus is being studied using chromosome conformation capture technology, abbreviated as 3C, 4C, 5C, and Hi-C, which differ in scope. For example, 3C identifies interactions between a single pair of genomic loci. Chromosome conformation capture on chip (4C) studies the interaction of one genomic locus and all other genomic loci, while chromosome conformation capture carbon copy (5C) detects interactions between all restriction fragments in a given region. In HiC, high-throughput sequencing is used to identify the restriction fragments studied. These methods begin with cross-linking of cell genomes with formaldehyde and digestion with restriction endonucleases, followed by random ligation under conditions where joining of cross-linked fragments is favored over those that are not. PCR is then used to amplify ligated junctions and identify interacting loci. The open or closed state of chromatin can be measured by DNaseI-seq (DNaseI hypersensitive sites sequencing) and FAIRE-seq (formaldehyde-assisted isolation of regulatory elements). These protocols are based on the use of formaldehyde to cross-link DNA: this reaction is more efficient in nucleosome-rich regions than in nucleosome-poor areas. The non-cross-linked DNA, typically from open chromatin, is then purified and its sequence is determined. The two protocols differ in that FAIRE-seq does not require permeabilization of cells or the isolation of nuclei. The methylation state of DNA can be assessed using bisulfite sequencing. Treatment of DNA with bisulfite converts C to U but does not affect 5-methylated cytosines. A variety of sequencing methods that can use this change to provide single-nucleotide resolution information about DNA methylation have been developed. As might be expected, interpreting the growing sets of data on chromatin structure has required the development of new statistical and computational approaches.
Mass Spectrometry
Mass spectrometry (MS) is a technique that can identify the chemical constituents of complex and simple mixtures. It has emerged as a powerful tool for detecting and quantifying thousands of proteins in biological samples, including viruses and virus-infected cells.
A mass spectrometer ionizes the chemical constituents of a mixture and then sorts the ions based on their mass-to-charge ratio. Identification of the components is done by comparison with the patterns generated by known materials.
The total protein content of a cell or a virus particle is called the proteome. Human cells have been estimated to contain from 500,000 to 3,000,000 proteins per cubic micrometer, encoded by ∼20,000 open reading frames, and their products are further diversified by transcriptional, posttranscriptional, translational, and posttranslational regulation. The cell proteome may be further altered during virus infection. The proteome of virus particles is far less complex, but the very largest viruses can still contain hundreds of proteins. Mass spectrometry can be used to identify proteins and their concentrations in cells and in virus particles and also to reveal protein localization, protein-protein interactions, and posttranslational modifications in infected and uninfected cells.
Mass spectrometry may be combined with biochemical and genomic techniques to provide global views of viral reproduction cycles. For example, changes in proteins secreted by host cells upon virus infection can be readily characterized by performing mass spectrometry on supernatants from infected cells. Another application is to identify protein-protein interactions in virus-infected cells: a promiscuous biotinylating enzyme can be directed to a subcellular compartment, where it biotinylates adjacent molecules. These can be purified by attachment to streptavidin-containing beads and identified by mass spectrometry. Integration of mass spectrometry with some of the methods described above for genome analysis can be used to identify proteins that participate in the regulation of gene expression.
At one time the mass spectrometer was a very expensive instrument restricted to chemistry laboratories. Recent advances in the instrumentation, including cost reduction, as well as sample preparation and computational biology have propelled this technology into the virology research laboratory.
Protein-Protein Interactions
A major goal of virology research is to understand how protein-protein interactions modulate reproduction cycles and pathogenesis. Consequently, multiple experimental approaches have been devised to identify the entire set of interactions among viral proteins and between viral and cell proteins. The yeast two-hybrid screen, a complementation assay which was designed to discover protein-protein interactions, has been adapted to high-throughput applications. In this assay, a transcriptional regulatory protein is split into two fragments, the DNA-binding domain and the activating domain. The coding sequences of two different proteins are fused with the two domains. If the two proteins interact, when the fusion proteins are produced in cells, transcriptional activation (leading to the transcription of a reporter gene) will take place. For high-throughput applications, libraries of protein-coding DNAs are screened against a single viral protein