Synthesis of RNA from a DNA template is called transcription. Genes are transcribed by enzymes called RNA polymerases that generate a single-stranded RNA identical in sequence (with the exception of U in place of T) to one of the strands of the double-stranded DNA. The DNA strand that directs the sequence of nucleotides in the RNA by complementary base pairing is the template strand. The RNA strand that is initially generated is the primary transcript. The DNA template is copied in the 3′-to-5′ direction, and the RNA transcript is synthesized in the 5′-to-3′ direction. RNA polymerases differ from DNA polymerases in that they can initiate the synthesis of new strands in the absence of a primer.
In addition to catalyzing the polymerization of ribonucleotides, RNA polymerases must be able to recognize the appropriate gene to transcribe, the appropriate strand of the double-stranded DNA to copy, and the start point of transcription (Fig. 13.1). Specific sequences on DNA, called promoters, determine where the RNA polymerase binds and how frequently it initiates transcription. Other regulatory sequences, such as promoter-proximal elements and enhancers, also affect the frequency of transcription.
In bacteria, a single RNA polymerase produces the primary transcript precursors for all three major classes of RNA: messenger RNA (mRNA), ribosomal RNA (rRNA), and transfer RNA (tRNA). Because bacteria do not contain nuclei, ribosomes bind to mRNA as it is being transcribed, and protein synthesis occurs simultaneously with transcription.
Eukaryotic genes are transcribed in the nucleus by three different RNA polymerases, each principally responsible for one of the major classes of RNA. The primary transcripts are modified and trimmed to produce the mature RNAs. The precursors of mRNA (called pre-mRNA) have a guanosine “cap” added at the 5′-end and a poly(A) “tail” at the 3′-end. Exons, which contain the coding sequences for the proteins, are separated in pre-mRNA by introns, regions that have no coding function. During splicing reactions, introns are removed and the exons connected to form the mature mRNA. In eukaryotes, tRNA and rRNA precursors are also modified and trimmed, although not as extensively as pre-mRNA.
THE WAITING ROOM
Lisa N. is a 4-year-old girl of Mediterranean ancestry whose height and body weight are below the 20th percentile for girls of her age. She tires easily and complains of loss of appetite and shortness of breath on exertion. A dull pain has been present in her right upper quadrant for the last 3 months and she appears pale. Initial laboratory studies indicate a severe anemia (decreased red blood cell count) with a hemoglobin of 7.0 g/dL (reference range, 12 to 16 g/dL). A battery of additional hematologic tests reveals that Lisa N. has β+-thalassemia, intermediate type.
Isabel S., a patient with HIV (see Chapters 11 and 12), has developed a cough with gray, slightly blood-tinged sputum. A chest X-ray indicates a cavitary infiltrate in the right upper lung field. A stain of sputum shows the presence of acid-fast bacilli, suggesting a diagnosis of pulmonary tuberculosis caused by Mycobacterium tuberculosis.
Catherine T. picked mushrooms in a wooded area near her home. A few hours after eating one small mushroom, she experienced mild nausea and diarrhea. She brought a mushroom with her to the hospital emergency room. A poison expert identified it as Amanita phalloides (the “death cap”). These mushrooms contain the toxin α-amanitin.
Sarah L., a 28-year-old computer programmer, notes increasing fatigue, pleuritic chest pain, and a nonproductive cough. In addition, she complains of joint pains, especially in her hands. A rash on both cheeks and the bridge of her nose (“butterfly rash”) has been present for the last 6 months. Initial laboratory studies reveal a subnormal white blood cell count and a mild reduction in hemoglobin. Tests result in a diagnosis of systemic lupus erythematous (SLE) (frequently called lupus).
I. Action of RNA Polymerase
Transcription, the synthesis of RNA from a DNA template, is carried out by RNA polymerases (Fig. 13.2). Like DNA polymerases, RNA polymerases catalyze the formation of ester bonds between nucleotides that base-pair with the complementary nucleotides on the DNA template. Unlike DNA polymerases, RNA polymerases can initiate the synthesis of new chains in the absence of primers. They also lack the 3′-to-5′ exonuclease activity found in DNA polymerases, although they do perform rudimentary error-checking through a different mechanism. A strand of DNA serves as the template for RNA synthesis and is copied in the 3′-to-5′ direction. Synthesis of the new RNA molecule occurs in the 5′-to-3′ direction. The ribonucleoside triphosphates adenosine triphosphate (ATP), guanosine triphosphate (GTP), cytidine triphosphate (CTP), and uridine triphosphate (UTP) serve as the precursors. Each nucleotide base sequentially pairs with the complementary deoxyribonucleotide base on the DNA template (A, G, C, and U pair with T, C, G, and A, respectively). The polymerase forms an ester bond between the α-phosphate on the ribose 5′-hydroxyl of the nucleotide precursor and the ribose 3′-hydroxyl at the end of the growing RNA chain. The cleavage of a high-energy phosphate bond in the nucleotide triphosphate and release of pyrophosphate (from the β- and γ-phosphates) provides the energy for this polymerization reaction. Subsequent cleavage of the pyrophosphate by a pyrophosphatase also helps to drive the polymerization reaction forward by removing a product. The overall error rate of RNA polymerase is 1 in 100,000 bases.
RNA polymerases must be able to recognize the start point for transcription of each gene and the appropriate strand of DNA to use as a template. They also must be sensitive to signals that reflect the need for the gene product and control the frequency of transcription. A region of regulatory sequences called the promoter (often composed of smaller sequences called boxes or elements), usually contiguous with the transcribed region, controls the binding of RNA polymerase to DNA and identifies the start point (see Fig. 13.1). The frequency of transcription is controlled by regulatory sequences within the promoter and nearby the promoter (promoter-proximal elements) and by other regulatory sequences, such as enhancers (also called distal-promoter elements), that may be located at considerable distances—sometimes thousands of nucleotides—from the start point. Both the promoter-proximal elements and the enhancers interact with proteins that stabilize RNA polymerase binding to the promoter.
II. Types of RNA Polymerases
Bacterial cells have a single RNA polymerase that transcribes DNA to generate all of the different types of RNA (mRNA, rRNA, and tRNA). The RNA polymerase of Escherichia coli contains five subunits (2α, β, β′, and ω), which form the core enzyme. Another protein called a σ (sigma) factor binds the core enzyme and directs binding of RNA polymerase to specific promoter regions of the DNA template. The σ factor dissociates shortly after transcription begins. E. coli has a number of different σ factors that recognize the promoter regions of different groups of genes. The major σ factor is σ70, a designation related to its molecular weight of 70,000 Da.
In contrast to prokaryotes, eukaryotic cells have three nuclear RNA polymerases (Table 13.1). Polymerase I produces most of the rRNAs, polymerase II produces mRNA and microRNAs (microRNAs regulate gene expression and are discussed in more detail in Chapter 15), and polymerase III produces small RNAs, such as tRNA and 5S rRNA. All of these RNA polymerases have the same mechanism of action. However, they recognize different types of promoters. The mitochondria has its own RNA polymerase to transcribe genes located on the mitochondrial genome.
POLYMERASE | PRODUCT |
RNA polymerase I | rRNA |
RNA polymerase II | mRNA + micro RNA (miRNA) |
RNA polymerase III | tRNA + other small RNAs |
A. Sequences of Genes
Double-stranded DNA consists of a coding strand and a template strand (Fig. 13.3). The DNA template strand is the strand that is actually used by RNA polymerase during the process of transcription. It is complementary and antiparallel both to the coding (nontemplate) strand of the DNA and to the RNA transcript produced from the template. Thus, the coding strand of the DNA is identical in base sequence and direction to the RNA transcript, except, of course, that wherever this DNA strand contains a T, the RNA transcript contains a U. By convention, the nucleotide sequence of a gene is represented by the letters of the nitrogenous bases of the coding strand of the DNA duplex. It is written from left to right in the 5′-to-3′ direction.
During translation, mRNA is read 5′ to 3′ in sets of three bases, called codons, that determine the amino acid sequence of the protein (see Fig. 13.3). Thus, the base sequence of the coding strand of the DNA can be used to determine the amino acid sequence of the protein. For this reason, when gene sequences are given, they refer to the coding strand.
A gene consists of the transcribed region and the regions that regulate transcription of the gene (e.g., promoter and enhancer regions) (Fig. 13.4). The base in the coding strand of the gene serving as the start point for transcription is numbered +1. This nucleotide corresponds to the first nucleotide incorporated into the RNA at the 5′-end of the transcript. Subsequent nucleotides within the transcribed region of the gene are numbered +2, +3, and so on, toward the 3′-end of the gene. Untranscribed sequences to the left of the start point, known as the 5′-flanking region of the gene, are numbered −1, −2, −3, and so on, starting with the nucleotide (−1) immediately to the left of the start point (+1) and moving from right to left. By analogy to a river, the sequences to the left of the start point are said to be upstream from the start point and those to the right are said to be downstream.
B. Recognition of Genes by RNA Polymerase
For genes to be expressed, RNA polymerase must recognize the appropriate point at which to start transcription and the strand of the DNA to transcribe (the template strand). RNA polymerase also must recognize which genes to transcribe because transcribed genes are only a small fraction of the total DNA. The genes that are transcribed differ from one type of cell to another and change with alterations in physiologic conditions. These signals in DNA that RNA polymerase recognizes are called promoters. Promoters are sequences in DNA (often composed of smaller sequences called boxes or elements) that determine the start point and the frequency of transcription. Because they are located on the same molecule of DNA and near the gene they regulate, they are said to be cis-acting (i.e., “cis” refers to acting on the same side). Proteins that bind to these DNA sequences and facilitate or prevent the binding of RNA polymerase are said to be trans-acting.
C. Promoter Regions of Genes for mRNA
The binding of RNA polymerase and the subsequent initiation of gene transcription involves a number of consensus sequences in the promoter regions of the gene (Fig. 13.5). A consensus sequence is the sequence that is most commonly found in a given region when many genes are examined. In prokaryotes, an adenine- and thymine-rich consensus sequence in the promoter determines the start point of transcription by binding proteins that facilitate the binding of RNA polymerase. In the prokaryote E. coli, this consensus sequence is TATAAT, which is known as the TATA or Pribnow box. It is centered about −10 and is recognized by the sigma factor σ70. A similar sequence in the −25 region of about 12.5% of eukaryotic genes has a consensus sequence of TATA(A/T)A. (The [A/T] in the fifth position indicates that either A or T occurs with equal frequency.) This eukaryotic sequence is also known as a TATA box, but it is sometimes named the Hogness or Hogness–Goldberg box after its discoverers. Other consensus sequences involved in binding of RNA polymerase are found farther upstream in the promoter region (see Fig. 13.5) or downstream after the transcriptional start signal. Bacterial promoters contain a sequence TTGACA in the −35 region. Eukaryotes frequently have disparate sequences, such as the TFIIB-recognition element (a GC-rich sequence, abbreviated as BRE), the initiator element, the downstream promoter element (DPE), and the motif ten element (MTE). The DPE and MTE are found downstream from the transcription start site. Eukaryotic genes also contain promoter-proximal elements (in the region of −100 to −200), which are sites that bind other gene regulatory proteins. Genes vary in the number of such sequences present. An analysis of close to 10,000 promoter sequences indicated that the initiator element was the most common element in these promoters (about 50%), whereas BRE and DPE were present in about 15% of the promoters, and TATA the least abundant, at 12.5% of the promoters.
In bacteria, a number of protein-producing genes may be linked together and controlled by a single promoter. This genetic unit is called an operon (Fig. 13.6). One mRNA is produced that contains the coding information for all of the proteins encoded by the operon. Proteins bind to the promoter and either inhibit or facilitate transcription of the operon. Repressors are proteins that bind to a region in the promoter known as the operator and inhibit transcription by preventing the binding of RNA polymerase to DNA. Activators are proteins that stimulate transcription by binding within the −35 region or upstream from it, facilitating the binding of RNA polymerase. (Operons are described in more detail in Chapter 15.)
In eukaryotes, proteins known as general transcription factors (or basal factors) bind to the TATA box (or other promoter elements, in the case of TATA-less promoters) and facilitate the binding of RNA polymerase II, the polymerase that transcribes mRNA (Fig. 13.7). This binding process involves at least six basal transcription factors (labeled as TFIIs, transcription factors for RNA polymerase II). The TATA-binding protein (TBP), which is a component of TFIID, initially binds to the TATA box. TFIID consists of both the TBP and a number of transcriptional coactivators. Components of TFIID will also recognize initiator and DPE boxes in the absence of a TATA box. TFIIA and TFIIB interact with TBP. RNA polymerase II binds to the complex of transcription factors and to DNA and is aligned at the start point for transcription. TFIIE, TFIIF, and TFIIH subsequently bind, cleaving adenosine triphosphate (ATP), and transcription of the gene is initiated.
With only these transcription (or basal) factors and RNA polymerase II attached (the basal transcription complex), the gene is transcribed at a low or basal rate. TFIIH plays a number of roles in both transcription and DNA repair. In both processes, it acts as an ATP-dependent DNA helicase, unwinding DNA for either transcription or repair to occur. Two of the forms of xeroderma pigmentosum (XPB and XPD; see Chapter 12) arise from mutations within two different helicase subunits of TFIIH. TFIIH also contains a kinase activity, and RNA polymerase II is phosphorylated by this factor during certain phases of transcription.
The rate of transcription can be further increased by binding of other regulatory DNA-binding proteins to additional gene regulatory sequences (such as the promoter-proximal or enhancer regions). These regulatory DNA-binding proteins are called gene-specific transcription factors (or transactivators) because they are specific to the gene involved (see Chapter 15). They interact with coactivators in the basal transcription complex. These are depicted in Figure 13.7 under the general term “coactivators.” Coactivators consist of transcription associated factors (TAFs) that interact with transcription factors through an activation domain on the transcription factor (which is bound to DNA). The TAFs interact with other factors (described as the mediator proteins), which in turn interact with the RNA polymerase complex. These interactions are further discussed in Chapter 15.
III. Transcription of Bacterial Genes
In bacteria, binding of RNA polymerase with a σ factor to the promoter region of DNA causes the two DNA strands to unwind and separate within a region approximately 10 to 20 nucleotides in length. As the polymerase transcribes the DNA, the untranscribed region of the helix continues to separate, whereas the transcribed region of the DNA template rejoins its DNA partner (Fig. 13.8). The σ factor is released when the growing RNA chain is approximately 10 nucleotides long. The elongation reactions continue until the RNA polymerase encounters a transcription termination signal. One type of termination signal involves the formation of a hairpin loop in the transcript, preceding a number of U residues. The second type of mechanism for termination involves the binding of a protein, the rho factor, which causes release of the RNA transcript from the template in an energy-requiring mechanism. The signal for both termination processes is the sequence of bases in the newly synthesized RNA.
A cistron is a region of DNA that encodes a single polypeptide chain. In bacteria, mRNA is usually generated from an operon as a polycistronic transcript (one that contains the information to produce a number of different proteins). Because bacteria do not contain a nucleus, the polycistronic transcript is translated as it is being transcribed. This process is known as coupled transcription translation. This transcript is not modified and trimmed, and it does not contain introns (regions within the coding sequence of a transcript that are removed before translation occurs). Several different proteins are produced during translation of the polycistronic transcript, one from each cistron (see Fig. 13.6).
In prokaryotes, rRNA is produced as a single, long transcript that is cleaved to produce the 16S, 23S, and 5S ribosomal RNAs. tRNA is also cleaved from larger transcripts (Fig. 13.9). One of the cleavage enzymes, RNase P, is a protein containing an RNA molecule. This RNA actually catalyzes the cleavage reaction.
IV. Transcription of Eukaryotic Genes
The process of transcription in eukaryotes is similar to that in prokaryotes. RNA polymerase binds to the transcription factor complex in the promoter region and to the DNA, the helix unwinds within a region near the start point of transcription, DNA strand separation occurs, synthesis of the RNA transcript is initiated, and the RNA transcript is elongated, copying the DNA template. The DNA strands separate as the polymerase approaches and rejoin as the polymerase passes.
One of the major differences between eukaryotes and prokaryotes is that eukaryotes have more elaborate mechanisms for processing the transcripts, particularly the precursors of mRNA (pre-mRNA). Eukaryotes also have three polymerases, rather than just the one present in prokaryotes. Other differences include the facts that eukaryotic mRNA usually contains the coding information for only one polypeptide chain and that eukaryotic RNA is transcribed in the nucleus and migrates to the cytoplasm where translation occurs. Thus, coupled transcription translation does not occur in eukaryotes.
A. Synthesis of Eukaryotic mRNA
In eukaryotes, extensive processing of the primary transcript occurs before the mature mRNA is formed and can migrate to the cytosol where it is translated into a protein product. RNA polymerase II synthesizes a large primary transcript from the template strand that is capped at the 5′-end as it is transcribed (Fig. 13.10). The transcript also rapidly acquires a poly(A) tail at the 3′-end. Pre-mRNAs thus contain untranslated regions at both the 5′- and 3′-ends (the leader and trailing sequences, respectively). These untranslated regions are retained in the mature mRNA. The coding region of the pre-mRNA, which begins with the start codon for protein synthesis and ends with the stop codon, contains both exons and introns. Exons consist of the nucleotide codons that dictate the amino acid sequence of the eventual protein product. Between the exons, interspersing regions called introns contain nucleotide sequences that are removed by splicing reactions to form the mature RNA. The mature RNA thus contains a leader sequence (that includes the cap), a coding region comprising exons, and a trailing sequence that includes the poly(A) tail.