Snyder and Champness Molecular Genetics of Bacteria. Tina M. Henkin
in mRNA. Each of the different reading frames in the two strands of DNA may contain open reading frames, but generally, only one ORF in each region is translated to yield a polypeptide product."/>
Figure 2.44 Relationship between gene structure in DNA and the coding sequence in mRNA. Each of the different reading frames in the two strands of DNA may contain open reading frames (ORFs), but generally, only one ORF in each region (reading frame 6 in the diagram) is translated to yield a polypeptide product.
What You Need To Know
We have introduced a lot of detail in this chapter, so it is worth reviewing some of the most important concepts and terms. As with any field, molecular genetics has its own jargon, and in order to follow a paper or seminar that includes some molecular genetics, familiarity with this jargon is very helpful.
Figure 2.44 shows a typical gene with a promoter and transcription terminator. The mRNA is transcribed beginning at the promoter and ending at the transcription terminator. The direction on the DNA or RNA is indicated by the direction of the phosphate bonds between the carbons on the ribose or deoxyribose sugars in the backbone of the polynucleotide. These carbons are labeled with a prime to distinguish them from the carbons in the bases of the nucleotides. On one end of the RNA, the 5′ carbon of the terminal nucleotide is not joined to another nucleotide by a phosphate bond. Therefore, this is called the 5′ end. Similarly, the other end is called the 3′ end, because the 3′ carbon of the last nucleotide on this end is not joined to another nucleotide by a phosphate bond. The direction on DNA or RNA from the 5′ end to the 3′ end is called the 5′-to-3′ direction. An RNA polymerase molecule synthesizes mRNA in the 5′-to-3′ direction, moving 3′ to 5′ on the transcribed strand (or template strand) of DNA. The opposite strand of DNA from the transcribed strand has the same sequence and 5′-to-3′ polarity as the RNA, so it is called the coding strand (or nontemplate strand). Sequences of DNA in the region of a gene are usually shown as the sequence of the coding strand. A sequence that is located in the 5′ direction of another sequence on the coding strand is upstream of that sequence, while a sequence in the 3′ direction is downstream. Therefore, the promoter for a gene and the S-D sequences are both upstream of the initiation codon, while the termination codon and the transcription termination sites are both downstream.
The positions of nucleotides in a promoter region are numbered as shown in Figure 2.6. The position of the first nucleotide in the RNA is called the start point and is given the number +1; the distance in nucleotides from this point to another point is numbered negatively or positively, depending on whether the second site is upstream or downstream of the start point, respectively. Note that these definitions can be used to describe only a region of DNA that is known to encode an RNA or protein, where we know which is the coding strand and which is the transcribed strand. Otherwise, what is upstream on one strand of DNA is downstream on the other strand.
Because mRNAs are both made and translated in the 5′-to-3′ direct ion, an mRNA can (and usually will) be translated while it is still being made, at least in bacteria and archaea, in which there is no nuclear membrane separating the DNA from the cytoplasm, where the ribosomes reside. We have discussed how this can lead to phenomena unique to bacteria, such as ρ-dependent polarity, and it is used to regulate expression of some genes in bacteria (see chapter 11).
It is important to distinguish promoters from TIRs and to distinguish transcription termination sites from translation termination sites. Figure 2.44 illustrates this difference. Transcription begins at the promoter and defines the 5′ end of the mRNA, but the place where translation begins, the TIR, can be some distance from the 5′ end. The untranslated region on the 5′ end of an mRNA upstream of the TIR is called the 5′ untranslated region or leader region and can be quite long. Similarly, a nonsense codon in the reading frame for the protein is a translation terminator, not a transcription terminator. The transcription terminator, and therefore the 3′ end of the mRNA, may be some distance downstream from the nonsense codon that terminates transition of the mRNA. The distance from the last termination codon to the 3′ end of the mRNA is the 3′ untranslated region. Polycistronic mRNAs encode more than one polypeptide. These mRNAs have a separate TIR and termination codon for each gene and can have noncoding or untranslated sequences upstream of, downstream of, and between the genes. Eukaryotes generally do not have polycistronic mRNAs, which is related to the dependence on ribosome binding to the 5′ end of the mRNA for translation initiation.
Open Reading Frames
The concept of an open reading frame, or ORF, is very important, particularly in this age of genomics. As discussed above, a reading frame in DNA is a succession of nucleotides in the DNA taken three at a time, the same way the genetic code is translated. Each DNA sequence has six reading frames, three on each strand, as illustrated in Figure 2.44. An ORF is a string of potential codons for amino acids in DNA unbroken by termination codons in one of the reading frames. Computer software can show where all the ORFs in a sequence are located, and most DNA sequences have many ORFs on both strands, although most of them are short. The region shown in Figure 2.44 contains many ORFs, but only the longest, in frame 6, is likely to encode a polypeptide. However, the presence of even a long ORF in a DNA sequence does not necessari ly indicate that the sequence encodes a protein, and fairly long ORFs often occur by chance. Furthermore, it has become evident recently that even very short ORFs can encode short peptides with important biological functions.
If an ORF does encode a polypeptide, it will begin with a TIR, but as discussed above, TIRs are sometimes difficult to identify. Clues to whether an ORF is likely to encode a protein may come from the choice of the third base in the codon for each amino acid in the ORF. Because of the redundancy of the code, an organism has many choices of codons for each amino acid, but each organism prefers to use some codons over others (see “Codon Usage” above) (Table 2.2).
A more direct way to determine if an ORF actually encodes a protein is to ask which polypeptides are made from the DNA in an in vitro transcription-translation system. These systems use extracts of cells, typically of E. coli, from which the DNA has been removed but the RNA polymerase, ribosomes, and other components of the translation apparatus remain. When DNA with the ORFs under investigation is added to these extracts, polypeptides can be synthesized from the added DNA. If the size of one of these polypeptides corresponds to the size of an ORF on the DNA, the ORF probably encodes a protein. Another way to determine if an ORF encodes a protein is to make a translation fusion of a reporter gene to the ORF and to determine whether the reporter gene is expressed (see below).
Transcriptional and Translational Fusions
Probably the most convenient way to determine which of the possible ORFs on the two strands of DNA in a given region are translated into proteins is to make transcriptional and translational fusions to the ORFs. These methods make use of reporter genes, such as lacZ (β-galactosidase), gfp (green fluorescent protein), lux (luciferase), or other genes whose products are easy to detect. Figure 2.45 illustrates the concepts of transcriptional and translational fusions.
An ORF can be translated only if it is transcribed into RNA. Transcriptional fusions can be used to determine whether this has occurred. To make a transcriptional fusion, a reporter gene containing its TIR sequence but without its own promoter is fused immediately downstream of the promoter of the gene to be tested. If the promoter is active, and its