Molecular Biotechnology. Bernard R. Glick
2.14). This is achieved by using either a low concentration of restriction endonuclease or shortened incubation times, and usually some optimization of these parameters is required to determine the conditions that yield fragments of suitable size. The range of fragment sizes depends on the goal of the experiment. For example, for genome sequencing, large (100- to 200-kb) fragments are often desirable. To identify genes that encode a particular enzymatic function, that is, genes that are expressed to produce proteins in the size range of an average protein, smaller (∼3- to 40-kb) fragments are cloned.
Figure 2.14 Construction of a genomic DNA library. Genomic DNA extracted from cells or tissues is partially digested with a restriction endonuclease. Conditions are set so that the enzyme does not cleave at all possible sites. This generates overlapping DNA fragments of various lengths that are cloned into a vector.
The number of clones in a genomic library depends on the size of the genome of the organism, the average size of the insert in the vector, and the average number of times each sequence is represented in the library (sequence coverage). To ensure that the entire genome, or most of it, is contained within the clones of a library, the sum of the inserted DNA in all of the clones of the library should be at least three times the amount of DNA in the genome. For example, the size of the E. coli genome is approximately 4.6 × 106 bp; if inserts of an average size of 1,000 bp are desired, then 13,800 clones are required for threefold coverage (i.e., 3[(4.6 × 106)/103]). For the human genome, which contains 3.3 × 109 bp, about 80,000 clones with an average insert size of 150,000 bp are required for fourfold coverage (i.e., 4[(3.3 × 109)/(15 × 104)]). Statistically, the number of clones required for a comprehensive genomic library can be estimated from the relationship N = ln(1 − P)/ln(1 − f), where N is the number of clones, P is the probability of finding a specific gene, and f is the ratio of the length of the average insert to the size of the entire genome. On this basis, about 700,000 clones are required for a 99% chance of discovering a particular sequence in a human genomic library with an average insert size of 20 kb.
Several strategies can be used to identify target DNA in a genomic library. Genomic or metagenomic libraries can be screened to identify members of the library that carry a gene encoding a particular protein function. In functional complementation, the host cell does not have the protein activity of interest, in some cases because the host gene encoding the protein carries a mutation that abolishes the activity of the protein. A DNA library is constructed that carries fragments of genomic DNA from an organism that has the desired protein activity. Host cells with the genetic deficiency are transformed with plasmids of the DNA library, and transformed cells that have restored normal protein function are selected. The genomic DNA that is used to prepare the library can be from a variety of donor organisms, such as the wild-type strain of the host bacterium that carries a functional copy of the gene encoding the protein, a different organism that can be either another prokaryote or perhaps a eukaryote, or uncultured organisms that are present in an environmental sample.
Many genes encoding enzymes that catalyze specific reactions have been isolated from a variety of organisms by plating the cells of a genomic library on medium supplemented with a specific substrate. For efficient screening of thousands of clones, colonies that carry a cloned gene encoding a functional catabolic enzyme must be readily identifiable, often by production of a colored product or a zone of substrate clearance around the colony. Alternatively, genes may be identified that confer desired properties on host cells. In one example, genes encoding proteins that confer resistance to salt were isolated from a metagenomic library of DNA from microbes present in a hypersaline environment (Fig. 2.15). A library of 192,000 clones with an average insert size of 3 kb representing 1.2 × 109 bp of metagenomic DNA was constructed in an osmosensitive strain of E. coli from DNA extracted from bacteria collected from the saline soil around the roots of the halophyte Arthrocnemum macrostachyum. The library was screened by plating the transformed E. coli cells on media containing 3% sodium chloride, which is normally lethal for the host E. coli strain. Eight different genes that conferred salt resistance were identified. Based on sequence similarity to known genes, these are predicted to encode enzymes involved in nucleic acid structure and repair, an outer membrane protein, a glycerol permease, a proton pump, and two hypothetical proteins.
Figure 2.15 Isolation of genes involved in salt resistance from a metagenomic DNA library of halophilic (thrive in high salt concentrations) bacteria. Genomic DNA extracted from bacteria found in the soil around the roots of a plant (rhizosphere) growing in a hypersaline soil was fragmented and cloned into a plasmid to generate a metagenomic library. E. coli host cells carrying the cloned DNA were plated on a solid medium containing 3% sodium chloride. The E. coli strain normally does not grow on 3% sodium chloride. The cloned DNA fragments from colonies that developed on the plates were sequenced to identify genes that confer resistance to salt. Mirete et al., Front. Microbiol. 6:1121, 2015.
The presence of particular proteins produced by a genomic library can also be detected using an immunological assay. Rather than screening for the function of a protein, the library is screened using an antibody that specifically binds to the protein encoded by a target gene. The colonies are arrayed on a solid medium, transferred to a matrix, and then lysed to release the cellular proteins (Fig. 2.16). An antibody (primary antibody) is applied that specifically binds to the target protein on the matrix. Following the interaction of the primary antibody with the target protein, any unbound antibody is washed away, and the matrix is treated with a second antibody (secondary antibody) that is specific for the primary antibody. The secondary antibody is attached to an enzyme, such as alkaline phosphatase, that converts a colorless substrate to a colored or light-emitting (chemiluminescent) product that can readily identify positive interactions.
Figure 2.16 Screening of a genomic DNA library using an immunological assay. Transformed cells are plated onto solid agar medium under conditions that permit transformed but not nontransformed cells to grow. (1) From the discrete colonies formed on this master plate, a sample from each colony is transferred to a solid matrix such as a nylon membrane. (2) The cells on the matrix are lysed, and their proteins are bound to the matrix. (3) The matrix is treated with a primary antibody that binds only to the target protein. (4) Unbound primary antibody is washed away, and the matrix is treated with a secondary antibody that binds only to the primary antibody. (5) Any unbound secondary antibody is washed away, and a colorimetric (or chemiluminescent) reaction is carried out. The reaction can occur only if the secondary antibody, which is attached to an enzyme (E) that performs the reaction, is present. (6) A colony on the master plate that corresponds to a positive response on the matrix is identified. Cells from the positive colony on the master plate are subcultured because they may carry the plasmid–insert DNA construct that encodes the protein that binds the primary antibody.
Genome Engineering Using CRISPR Technology
Recently, researchers have designed strategies to insert, replace, or disrupt sequences at targeted sites in intact genomes in vivo. The method is based on a prokaryotic system that protects bacteria against invasion by foreign DNA such as bacteriophage genomic DNA and plasmids. This is a type of bacterial adaptive immune system that consists of genomic clustered regularly interspaced short palindromic repeats (CRISPR) containing fragments of foreign DNA molecules that the bacterium was previously exposed to and CRISPR-associated (Cas) proteins, including an endonuclease that cleaves homologous foreign DNA upon subsequent exposures. The CRISPR-Cas system has been adapted to introduce or replace genes in the genomes of a variety of organisms, both prokaryotes and eukaryotes, and also to edit genomes, that is, remove or alter targeted nucleotides (chapters 6 and 12).