Molecular Biotechnology. Bernard R. Glick

Molecular Biotechnology

or the end(s) of the cloned inserts (expressed sequence tags), using the dideoxynucleotide method. New developments in sequencing technologies circumvent the requirement for preparation of a clone library and enable high-throughput sequencing of cDNA.

For high-throughput RNA sequencing, total RNA is isolated and converted to cDNA using reverse transcriptase and a mixture of oligonucleotide primers composed of six random bases (random hexamers) that bind to multiple sites on all of the template RNA molecules (Fig. 2.48A). Because rRNA makes up a large fraction (>80%) of the total cellular RNA and levels are not expected to change significantly under different conditions, these molecules are often removed prior to cDNA synthesis by hybridization to complementary oligonucleotides that are covalently linked to magnetic beads for removal. Long RNA molecules are fragmented to pieces of about 200 bp by physical (e.g., nebulization), chemical (e.g., metal ion hydrolysis), or enzymatic (e.g., controlled RNase digestion) methods either before cDNA synthesis (RNA fragmentation) or after cDNA synthesis (cDNA fragmentation).

Figure 2.48 High-throughput RNA sequencing. (A) Total RNA is extracted from a sample and rRNA may be removed. The RNA is fragmented and then converted to cDNA using reverse transcriptase. Adaptors are added to the ends of the cDNA to provide binding sites for sequencing primers. High-throughput next-generation sequencing technologies are used to determine the sequences at the ends of the cDNA molecules (paired end reads). The sequence reads are aligned to a reference genome or assembled into contigs using the overlapping sequences. Shown is the alignment of paired end reads to a gene containing one intron. (B) RNA expression levels are determined by counting the reads that correspond to a gene. Adapted with permission from Wang et al., Nat Rev Genet. 10:57–63, 2009.

The cDNA fragments are ligated at one or both ends to an adaptor that serves as a binding site for a sequencing primer (Fig. 2.48A). High-throughput next-generation sequencing technologies are employed to sequence the cDNA fragments. The sequence reads are assembled in a manner similar to that for genomic DNA, which is by aligning the reads to a reference genome or by aligning overlapping sequences to generate contigs for de novo assembly when a reference genome is not available. The reads are expected to align uniformly across the transcript (Fig. 2.48A). Gene expression levels are determined by counting the reads that correspond to each nucleotide position in a gene and averaging these across the length of the transcript (Fig. 2.48B). Expression levels are typically normalized between samples by scaling to the total number of reads per sample (e.g., reads/kilobase pair/million reads). Appropriate coverage (i.e., the number of cDNA fragments sequenced) is more difficult to determine for RNA sequencing than for genome sequencing because the total complexity of the transcriptome is not known before the experiment. In general, larger genomes and genomes that have more RNA splicing variants have greater transcriptome complexity and therefore require greater coverage. Also, accurate measurement of transcripts from genes with low expression levels requires sequencing of a greater number of transcripts. Quantification may be confounded by the high GC content of some cDNA fragments which have a higher melting temperature and therefore are inefficiently sequenced, by overrepresentation of cDNA fragments from the 5′ end of transcripts due to the use of random hexamers, and by reads that map to more than one site in a genome due to the presence of repeated sequences. However, because each transcript is represented by many different reads, these biases are expected to have minimal effects on quantification of a transcript.

Proteomics

Proteins are the molecular machines of cells. They catalyze biochemical reactions, monitor the internal and external environments of the cell and mediate responses to perturbations, and make up the structural components of cells. Some proteins are present at more or less the same levels in all cells of a multicellular individual or a population of unicellular organisms under most conditions, for example, proteins that make up ribosomes or the cytoskeleton. The levels of other proteins differ among cells according to the cells’ functions or change in response to developmental or environmental cues. Thus, analysis of the proteins that are present under particular biological conditions can provide insight into the activities of a cell or tissue.

Proteomics is the comprehensive study of all the proteins of a cell, tissue, body fluid, or organism from a variety of perspectives, including structure, function, expression profiling, and protein−protein interactions. There are several advantages to studying the protein complement (proteome) of cells or tissues compared to other genomic approaches. Although analysis of genomic sequences can often identify protein coding sequences, in many cases the function of a protein, and the posttranslational modifications that influence protein activity and cellular localization, cannot be predicted from the sequence. On the other hand, it may be possible to infer a protein’s function by determining the conditions under which it is expressed and active. While expression profiles of protein coding sequences can be determined using transcriptomics, mRNA levels do not always correlate with protein levels and do not indicate the presence of active proteins, and interactions between proteins cannot be assessed by these methods. Generally, mRNA is turned over rapidly, and therefore, transcriptomics measures actively transcribed genes, whereas proteomics monitors relatively more stable proteins. From a practical standpoint, proteomics can be used to identify proteins associated with a clinical disorder (protein biomarkers), especially in the early stages of disease development, that can aid in disease diagnosis or provide targets for treatment of disease.

Identification of Proteins

A cell produces a large number of different proteins that must first be separated in order to identify individual components of the proteome. To reduce the complexity, proteins are sometimes extracted from particular subcellular locations such as the cell membrane, nucleus, Golgi apparatus, endosomes, or mitochondria. Two-dimensional polyacrylamide gel electrophoresis (2D PAGE) is an effective method to separate proteins in a population (Fig. 2.49A). Proteins in a sample are first separated on the basis of their net charge by electrophoresis through an immobilized pH gradient in one dimension (the first dimension) (Fig. 2.49A). Some amino acids in a polypeptide have side chains with ionizable groups that contribute to the net charge of a protein; the degree of ionization (protonation) is influenced by the pH of the solution. In a gel to which an electric current is applied, proteins migrate through a pH gradient until they reach a specific pH (the isoelectric point) where the overall charge of the protein is zero and they no longer move. A particular position in the pH gradient may be occupied by two or more proteins that have the same isoelectric point. However, the proteins often have different molecular weights and can be further separated according to their molecular mass by electrophoresis at right angles to the first dimension (the second dimension) through a sodium dodecyl sulfate (SDS)-polyacrylamide gel (Fig. 2.49B). The separated proteins form an array of spots in the gel that is visualized using Coomassie blue, silver, or fluorescent protein stains.

Figure 2.49 2D PAGE for separation of proteins. (A) First dimension. Isoelectric focusing is performed to first separate proteins in a mixture on the basis of their net charge. The protein mixture is applied to a pH gradient gel. When an electric current is applied, proteins will migrate either toward the anode (+) or cathode (–) depending on their net charge. As proteins move through the pH gradient, they will gain or lose protons until they reach a point in the gel where their net charge is zero. The pH in this position of the gel is known as the isoelectric point and is characteristic of a given protein. At that point, a protein no longer moves in the electric current. (B) Second dimension. Several proteins in a sample may have the same isoelectric point and therefore migrate to the same position in the gel in the first dimension.

Скачать книгу