Molecular Biotechnology. Bernard R. Glick

Molecular Biotechnology - Bernard R. Glick


Скачать книгу
alt="image"/>

      Figure 2.38 Real-time single-molecule sequencing. One molecule of DNA polymerase (orange shape) is attached to the bottom of a nanoscale well. A single-stranded DNA molecule (grey strand) bound to a primer (blue strand) is captured in the active site of the polymerase (A). Each of the four different nucleoside triphosphates is attached to a different fluorophore (colored stars) at the terminal phosphate, which is released during template-dependent nucleotide incorporation into the growing DNA strand. Fluorescence emission from a zeptoliter (10–21 l) volume at the bottom of the well is detected by a laser before the cleaved pyrophosphate with attached fluorophore diffuses away (B).

      The nucleotide added during the extension phase is detected in real time, as it is incorporated. For real-time sequencing, the nucleotides do not carry a blocking group on the 3′ hydroxyl group and therefore DNA synthesis is continuous. A different fluorescent tag is attached to the terminal phosphate of each nucleoside triphosphate, in a manner that does not interfere with the activity of the DNA polymerase. With each nucleotide addition to the growing DNA chain, pyrophosphate is cleaved and with it the fluorescent tag. Tag cleavage therefore corresponds to nucleotide addition. The laser used to measure fluorescence is narrowly focused on the immobilized DNA polymerase and therefore records a pulse of fluorescence only in the brief time (tens of milliseconds) when the tagged nucleotide is held in the enzyme’s active site (Fig. 2.38B). Following formation of a phosphodiester bond, the fluorescent tag cleaved from the nucleotide rapidly diffuses out of the range of the detector. Translocation of the DNA template positions DNA polymerase to accept the next nucleotide into the active site. Long sequence reads (greater than 10 kbp on average) can be generated rapidly by this method; however, accuracy is generally lower than other methods due to the short time interval between nucleotide additions, dissociation of a nucleotide before a phosphodiester bond forms, and simultaneous measurement of fluorescence from more than one nucleotide.

      Just as the sequence of a gene can provide information about the function of the encoded protein, the sequence of an entire genome can contribute to our understanding of the nature of an organism. Thousands of whole genomes have now been sequenced, from organisms of all domains of life. Initially, the sequenced genomes were relatively small, limited by the early sequencing technologies. The first DNA genome to be sequenced was from the E. coli bacteriophage ΦX174 (5,375 bp) in 1977, while the first sequenced genome from a cellular organism was that of the bacterium Haemophilus influenzae (1.8 Mbp) in 1995. Within 2 years, the sequence of the larger E. coli genome (4.6 Mbp) was reported, and the sequence of the human genome (3,000 Mbp), the first vertebrate genome, was completed in 2003.

      Most of these first genome sequences were generated using a shotgun cloning approach. In this strategy, a clone library of randomly generated, overlapping genomic DNA fragments is constructed in a bacterial host. The plasmids are isolated, and then the cloned inserts are sequenced using the dideoxynucleotide method. Using this approach, the first human genome was sequenced in 13 years at a cost of $2.7 billion. The aspiration to acquire genome sequences faster and at a much lower cost has driven the development of new genome sequencing strategies. Today, many large-scale sequencing projects have been completed and many more are under way, motivated by compelling biological questions. Some will contribute to our understanding of the microorganisms that cause infectious diseases and to the development of new techniques for their detection and treatment. Others are aimed at helping us to understand what it means to be human and how we evolved. Understanding the nucleotide polymorphisms among individuals with and without a specific disease will help us to determine the genetic basis of disease.

      Generally, DNA sequencing projects fall into two categories: de novo genome sequencing and resequencing. Sequencing the genome of an organism that has not previously been sequenced is de novo genome sequencing, whereas resequencing involves comparing a newly determined sequence with a known reference sequence. A large-scale sequencing project typically entails (i) preparing a library of template DNA fragments, (ii) amplifying the DNA fragments which will increase the detection signal from nucleotide addition during the sequencing reaction, (iii) sequencing the template DNA using one of the sequencing techniques describe above, and (iv) assembling the sequences generated from the fragments in the order in which they are found in the original genome. Sequencing massive amounts of DNA required not only the development of new technologies for nucleotide sequence determination but also new methods to reduce the time for preparation and processing of large libraries of sequencing templates. High-throughput next-generation sequencing approaches have circumvented the cloning steps of the shotgun sequencing strategy by attaching, amplifying, and sequencing the genomic DNA fragments directly on a solid support. Single molecule sequencing eliminates the need for an amplification step, further reducing the time for template preparation. In both cases, all of the templates are sequenced at the same time. The term used to describe this is massive parallelization.

      Although the shotgun cloning strategy has been used successfully to obtain the sequences of many whole genomes, preparation of clone libraries in bacterial cells is costly and time-consuming for routine sequencing of the large amounts of genomic DNA that are required for many research and clinical applications. To reduce the time and cost of large-scale sequencing, high-throughput next-generation sequencing strategies have been developed that use cell-free methods to generate a library of genomic DNA fragments. First, purified, genomic DNA is fragmented either mechanically by sonication (applying high-frequency sound energy) or nebulization (forcing DNA through a small hole using compressed air), or by enzymatic digestion. Physical fragmentation tends to leave extended single-stranded ends that must be blunted (end repaired or polished) by filling in 3′ recessed ends with DNA polymerase in the presence of the four deoxyribonucleotides (Fig. 2.39A) and removing protruding 3′ ends with an exonuclease (Fig. 2.39B). Next, different oligonucleotide adaptors are ligated to each end of the polished genomic fragments. To facilitate ligation, the 5′ ends of the DNA fragments are phosphorylated with T4 polynucleotide kinase (Fig. 2.39C). The 3′ ends may be adenylated (A-tailed) by enzymatic addition of a single deoxyadenosine monophosphate to facilitate ligation of adaptors that have a single complementary 3′ deoxythymidine monophosphate overhang (Fig. 2.39D). The adaptors have sequences that anneal to PCR primers for amplification of the genomic sequence and to sequencing primers that prime the sequencing reaction (Fig. 2.40). In addition, adaptors may contain an index (barcode) sequence for tagging a genomic library. A barcode is a short (usually 8−12 nucleotides), unique nucleotide sequence that is used to identify and sort sequence reads generated from a genomic library when multiple libraries are pooled prior to sequencing (multiplexing). Generally, genomic DNA fragments are size selected by removing fragments above and below a certain size (typically 200−500 bp) to facilitate assembly of the genome sequence (described below). Following size selection, the libraries may be amplified by PCR to enrich for genomic DNA fragments that have adaptors ligated to both ends and to increase the amount of template for sequencing.

      Figure 2.39 Preparation of genomic DNA fragments for ligation of adaptors. Ends of frayed DNA are repaired (end repaired) using DNA polymerase to fill in from recessed 3′ ends (A) and a 3′ exonuclease to degrade 3′ extensions (B). Note: Fragments with different combinations of extensions and recessed ends are not shown here. In all of these cases, the polymerase and/or 3′ exonuclease activities produce blunt-end DNA molecules. T4 polynucleotide kinase phosphorylates the 5′ ends of the blunt-end fragments (C). A single deoxyadenosine monophosphate may be added to the 3′ ends of blunted fragments (A-tailed) by DNA polymerase to facilitate ligation of adaptors that have a single complementary 3′ deoxythymidine monophosphate overhang (D).


Скачать книгу