Molecular Biotechnology. Bernard R. Glick
cells, various proteins are not necessarily synthesized with the same efficiency. In fact, they may be produced at very different levels (up to several hundred-fold) even if they are encoded within the same polycistronic messenger RNA (mRNA). Differences in translational efficiency, as well as differences in transcriptional regulation, enable the cell to have hundreds or even thousands of copies of some proteins and only a few copies of others.
The molecular basis for differential translation is, in part, the sequence of a translational initiation signal called a ribosome-binding site in the transcribed mRNA. A ribosome-binding site is a sequence of 6 to 8 nucleotides (e.g., UAAGGAGG) in mRNA that can base pair with a complementary sequence (AUUCCUCC for E. coli) on the RNA component of the small ribosomal subunit. Generally, the greater the complementarity between the sequences, the stronger the binding of the mRNA to the ribosomal RNA, and the greater the efficiency of translational initiation.
Many E. coli expression vectors have been designed to ensure that the mRNA of a cloned gene contains a strong ribosome-binding site. However, for proper translation of heterologous genes in E. coli, certain other conditions must also be satisfied. First, the ribosome-binding sequence must be located within a short distance (usually 2–20 nucleotides) from the translational start codon of the cloned gene. Second, after transcription, the 5′ end of the mRNA sequence that includes the ribosome-binding site through the first few codons of the gene of interest must not contain nucleotide sequences that have regions of complementarity. Intrastrand base-pairing that leads to the formation of secondary structure in this region may shield the ribosome-binding site (Fig. 3.5) and therefore affect the extent to which the mRNA can bind to the appropriate sequence on the ribosome and initiate translation. Thus, for each cloned gene, it is important to establish that the ribosome-binding site is properly placed and that the secondary structure of the mRNA does not prevent its access to the ribosome.
Figure 3.5 Example of secondary structure of the 5′ end of an mRNA that would prevent efficient translation. The ribosome-binding site is GGGGG, the start codon is AUG (shown in red), and the first few codons are CAG-CAU-GAU-UUA-UUU. The mRNA is oriented with its 5′ end to the left and its 3′ end to the right. Note that in addition to the traditional A · U and G · C base pairs in mRNA, G can also base-pair to some extent with U.
A number of convenient vectors that incorporate both transcriptional and translational signals for the expression of cloned genes in E. coli have been developed. An example is the expression vector pKK233-2 that includes the tac promoter (a hybrid of the strong trp promoter and the regulatable lac promoter; see Milestone box on page 100), the lacZ ribosome-binding site, an ATG start codon located 8 nucleotides downstream from the ribosome-binding site, the transcription terminators T1 and T2 from bacteriophage λ, and an ampicillin resistance gene as a selectable marker (Fig. 3.6). The cloned gene is inserted into an NcoI, PstI, or HindIII site that lies between the ribosome-binding site and the transcription terminators so that it is in the same reading frame as the ATG start codon. After induction with IPTG and transcription, the mRNA of a cloned gene is efficiently translated. However, since the nucleotide sequences that encode the amino acids in the N-terminal region of the target protein vary from one gene to another, it is not possible to design a vector that will eliminate the possibility of mRNA secondary structure in all instances. Therefore, no single optimized translational initiation region can guarantee a high rate of translation initiation for all cloned genes. Consequently, the expression vectors described above are merely starting points for the optimization of translation initiation.
milestone The tac Promoter: a Functional Hybrid Derived from the trp and lac Promoters
De Boer and his colleagues began their efforts to construct the tac promoter with the idea of combining portions of two different strong and regulatable promoters to create an even stronger promoter that would direct very high levels of foreign-gene expression. When they undertook their studies, although the DNA sequences of a number of prokaryotic promoters, mostly from E. coli, were known, the precise features that enabled a promoter to be efficient at directing transcription were not well understood. It was known that almost all mutations that affected the strength of a prokaryotic promoter were found in either the −10 region or the −35 region, which is approximately 10 or 35 bp upstream of the mRNA transcription start site, respectively. Moreover, only mutations that made an existing promoter more like the consensus sequences for each of these regions (i.e., 5′-TATAAT-3′ for the −10 region and 5′-TTGACA-3′ for the −35 region) increased the strength of the promoter. The consensus sequences had been deduced by comparing the DNA sequences of all known promoters and determining which nucleotides occurred most often. de Boer and his colleagues also knew that the lacUV5 promoter, which is a stronger variant of the lac promoter, had a consensus sequence at its −10 but not its −35 region, while the trp promoter, which normally controls the transcription of genes involved in the biosynthesis of tryptophan, has a consensus sequence at its −35 but not its −10 region. They decided to create a fusion promoter that included the −10 region from the lac promoter and the −35 region from the trp promoter. They tested this new “tac” promoter, as they called it, for its ability to direct the synthesis of the enzyme galactose kinase in E. coli and compared it in the same assay system with the trp and lac promoters. In agreement with their initial idea, the tac promoter was found to be approximately 5 times stronger than the trp promoter and 10 times stronger than the lac promoter (de Boer et al., Proc. Natl. Acad. Sci. USA 80:21–25, 1983). In addition, like the lac promoter, the tac promoter was repressed by the lac repressor and derepressed by IPTG. Thus, this new promoter was not only strong, but also regulatable.
Figure 3.6 The expression vector pKK233-2. The plasmid pKK233-2 codes for the ampicillin resistance (Ampr) gene as a selectable marker gene, the tac promoter (ptac), the lacZ ribosome-binding site (rbs), three restriction endonuclease cloning sites (NcoI, PstI, and HindIII), and two transcription termination sequences (T1 and T2). The arrow indicates the direction of transcription. The plasmid is not drawn to scale.
While the genetic code is universal among organisms, the codons that specify amino acids are used to different extents in various organisms (Table 3.4). Cells produce transfer RNAs (tRNAs) corresponding to a specific codon in approximately the same relative amount as that particular codon is used in the production of proteins. When a cloned gene has codons that are rarely used by the host cell, a cellular incompatibility occurs that decreases translation efficiency. For example, AGG, AGA, AUA, CUA, and CGA are the least-used codons in E. coli. The host cell may not produce enough of the tRNAs that recognize these rarely used codons, and consequently the yield of the protein produced from the cloned gene will be much lower than expected. Any codon that is used less than 5% to 10% of the time by the host organism may cause problems. An insufficient supply of tRNAs may also lead to the incorporation of incorrect amino acids into the protein. Errors in the amino acid sequence of a protein may diminish its usefulness if the specific activity and stability are reduced.
Table 3.4 Genetic code and codon usage in E. coli and humans
To alleviate this problem, the target gene may be expressed in a different host, or chemically synthesized (chapter 2) or engineered by directed mutagenesis (described later in this chapter) to contain codons that are more commonly used by the host cell. Codon optimization has enabled the production of large quantities of a variety of heterologous proteins that are otherwise difficult to express in E. coli (Table 3.5). Alternatively, a host cell that has been engineered to overexpress several