CHAPTER OUTLINE
Classes of RNA and RNA Polymerases
Catalytically Active RNAs: Ribozymes
Regulation of Eukaryotic Transcription
High-Yield Terms
Transcription: describes the process by which the genetic information in a gene is converted into polyribonucleotides, the RNAs
Promoter: sequences in the gene that promote the ability of RNA polymerases to recognize the nucleotide at which initiation begins; must reside close to the gene and in a specific orientation for activity
Enhancer: sequences that act in cis by binding proteins that result in enhancement of transcription; can be located long distance from the gene or even on different chromosomes, also do not require orientation for activity
Splicing: the process whereby the introns are removed from heteronuclear RNA (hnRNA)
RNA editing: a novel enzymatic mechanism for the modification of nucleotide sequences of RNA, resulting in altered coding capacity
Small noncoding RNAs: includes the U small nuclear RNAs (snRNAs) of the splicing machinery, the microRNAs (miRNAs) derived from noncoding genes that are involved in processes of expression control, and the small interfering RNAs (siRNAs) responsible for the process of RNA-mediated interference (RNAi) in gene expression
ribozyme: an enzyme activity solely associated with an RNA molecule, such as the peptidyltransferase activity of the large ribosomal subunit
Classes of RNA and RNA Polymerases
Transcription is the mechanism by which a template strand of DNA is utilized by specific RNA polymerases to generate 1 of the 4 distinct classifications of RNA (Table 36-1).
In the eukaryotic cells, there are 3 distinct classes of RNA polymerase: RNA polymerase (pol) I, II, and III. Each polymerase is responsible for the synthesis of a different class of RNA. RNA pol I is responsible for rRNA synthesis (excluding the 5S rRNA). There are 4 major rRNAs in eukaryotic cells designated by their sedimentation size. The 28S, 5S, and 5.8S rRNAs are associated with the large ribosomal subunit and the 18S rRNA is associated with the small ribosomal subunit. The rRNAs are synthesized as long precursors termed preribosomal RNAs and the 45S preribosomal RNA serves as the precursor for the 18S, 28S, and 5.8S rRNAs. RNA pol II synthesizes the mRNAs and some of the small nuclear RNAs (snRNAs) and some of the miRNAs. RNA pol III synthesizes the tRNAs, the 5S rRNA, some snRNAs, and some miRNAs.
RNA Transcription
Synthesis of RNA exhibits several features that are synonymous with DNA replication. RNA synthesis requires accurate and efficient initiation, elongation proceeds in the 5′→3′ direction (ie, the polymerase moves along the template strand of DNA in the 3′→5′ direction), and RNA synthesis requires distinct and accurate termination. In contrast to DNA polymerases, RNA polymerases need not exhibit the same high level of fidelity. This is allowable since the aberrant RNA molecules can simply be turned over and new correct RNAs made.
Initiation of transcription, particularly the transcription of mRNA genes is tightly regulated. Sequences in the template DNA act in cis to stimulate the initiation of transcription (Figure 36-1). These sequence elements are termed promoters. Promoter sequences promote the ability of RNA polymerases to recognize the nucleotide at which initiation begins. Additional sequence elements are present within genes that act in cis to enhance polymerase activity even further. These sequence elements are termed enhancers (Table 36-2). Transcriptional promoter and enhancer elements are important sequences used in the control of gene expression.
FIGURE 36-1: Schematic diagram showing the transcription control regions in a hypothetical mRNA-producing, eukaryotic gene transcribed by RNA polymerase II. Such a gene can be divided into its coding and regulatory regions, as defined by the transcription start site (arrow; +1). The coding region contains the DNA sequence that is transcribed into mRNA, which is ultimately translated into protein. The regulatory region consists of 2 classes of elements. One class is responsible for ensuring basal expression. The “promoter,” is often composed of the TATA box and/or INR and/or DPE elements direct RNA polymerase II to the correct site (fidelity). However, in certain genes that lack TATA, the so-called TATA-less promoters, an initiator (INR) and/or DPE elements may direct the polymerase to this site. Another component, the upstream elements, specifies the frequency of initiation; such elements can either be proximal (50–200 bp) or distal (1000–105 bp) to the promoter as shown. Among the best studied of the proximal elements is the CAAT box, but several other elements (bound by the transactivator proteins Sp1, NF1, AP1, etc) may be used in various genes. The distal elements enhance or repress expression, several of which mediate the response to various signals, including hormones, heat shock, heavy metals, and chemicals. Tissue-specific expression also involves specific sequences of this sort. The orientation dependence of all the elements is indicated by the arrows within the boxes. For example, the proximal promoter elements (TATA box, INR, DPE) must be in the 5′–3′ orientation, while the proximal upstream elements often work best in the 5′–3′ orientation, but some can be reversed. The locations of some elements are not fixed with respect to the transcription start site. Indeed, some elements responsible for regulated expression can be located interspersed with the upstream elements or can be located downstream from the start site. Murray RK, Bender DA, Botham KM, Kennelly PJ, Rodwell VW, Weil PA. Harper’s Illustrated Biochemistry. 29th ed. New York, NY: McGraw-Hill; 2012.
The process of eukaryotic mRNA transcriptional initiation is an extremely complex event. There are numerous protein factors controlling initiation, some of which are basal factors (Figure 36-2) present in all cells and others are specific to cell type and/or the differentiation state of the cell. Two basal promoter elements that are found in essentially all eukaryotic mRNA genes are the TATA-box and the CAAT-box (Figure 36-1). These elements are so called because of the DNA sequences that constitute the promoter element. The TATA-box is found approximately 20-25 bases upstream of the start site for transcription and the CAAT-box is around 100 bases upstream. Many of the basal transcription factors are identified by the fact that they control the activity of RNA pol II. Thus, the nomenclature of these proteins is TFII, for transcription factor of RNA pol II (Figures 36-2 and 36-3). TFIID is the factor that binds to the TATA-box and its binding is facilitated by another factor called TFIIA.
FIGURE 36-2: The eukaryotic basal transcription complex. Formation of the basal transcription complex begins when TFIID binds to the TATA box. It directs the assembly of several other components by protein-DNA and protein-protein interactions; TFIIA, B, E, F, H, and polymerase II (pol II). The entire complex spans DNA from position −30 to +30 relative to the transcription start site (TSS; +1, marked by bent arrow). The atomic level, x-ray-derived structures of RNA polymerase II alone and of the TBP subunit of TFIID bound to TATA promoter DNA in the presence of either TFIIB or TFIIA have all been solved at 3 Å resolution. The structures of TFIID and TFIIH complexes have been determined by electron microscopy at 30 Å resolution. Thus, the molecular structures of the transcription machinery are beginning to be elucidated. Much of this structural information is consistent with the models presented here. Murray RK, Bender DA, Botham KM, Kennelly PJ, Rodwell VW, Weil PA. Harper’s Illustrated Biochemistry. 29th ed. New York, NY: McGraw-Hill; 2012.
FIGURE 36-3: Models for the formation of an RNA polymerase II preinitiation complex. Shown at top is a typical mRNA encoding transcription unit: enhancer-promoter (TATA)-initiation site (bent arrow) and transcribed region (ORF; open reading frame). PICs have been shown to form by at least 2 distinct mechanisms: (A) the stepwise binding of GTFs, pol II, and mediator, or (B) by the binding of a single multiprotein complex composed of pol II, Med, and the 6 GTFs. DNA-binding transactivator proteins specifically bind enhancers and in part facilitate PIC formation (or PIC function) by binding directly to the TFIID-TAF subunits or Med subunits of mediator (not shown, see Figure 36-10); the molecular mechanism(s) by which such protein-protein interactions stimulate transcription remain a subject of intense investigation. Murray RK, Bender DA, Botham KM, Kennelly PJ, Rodwell VW, Weil PA. Harper’s Illustrated Biochemistry. 29th ed. New York, NY: McGraw-Hill; 2012.
High-Yield Concept
All RNA polymerases are dependent upon a DNA template in order to synthesize RNA. The resultant RNA is, therefore, complimentary to the template strand of the DNA duplex and identical to the nontemplate strand. The nontemplate strand is called the coding strand because its sequences are identical to those of the mRNA. However, in RNA, U is substituted for T.
Another critical TFII is TFIIH, which is in fact a complex of proteins. The TFIIH complex is not only involved in transcription but also in certain steps of DNA repair. The role of TFIIH in DNA repair can be seen as critical since defects in its function are responsible for certain forms of xeroderma pigmentosum (see Clinical Box 35-2). The critical role of TFIIH in transcription initiation is due to the fact that one of the proteins of the complex is a kinase that phosphorylates serine residues in the C-terminal domain (CTD) of the large subunit of RNA pol II. The CTD contains a tandem repeat sequence that is composed of the consensus heptad of amino acids: Y1S2 P3T4 S5P6 S7 which can be repeated from 25 to 52 times. It is Ser5 and Ser7 that become phosphorylated during transcriptional initiation. These serines are different from the serine (Ser2) phosphorylated in the CTD by pTEF-b involved in the capping process as discussed later.
Elongation involves the addition of the 5′ phosphate of ribonucleotides to the 3′–OH of the elongating RNA with the concomitant release of pyrophosphate. Nucleotide addition continues until specific termination signals are encountered. Following termination, the core polymerase dissociates from the template (Figure 36-4).
FIGURE 36-4: The transcription cycle. Transcription can be described in 6 steps: (1) Template binding and closed RNA polymerase promoter complex formation: RNA polymerase (RNAP) binds to DNA and then locates a promoter (P), (2) Open promoter complex formation: once bound to the promoter, RNAP melts the 2 DNA strands to form an open promoter complex; this complex is also referred to as the preinitiation complex or PIC. Strand separation allows the polymerase to access the coding information in the template strand of DNA, (3) Chain initiation: using the coding information of the template RNAP catalyzes the coupling of the first base (often a purine) to the second, template-directed ribonucleoside triphosphate to form a dinucleotide (in this example forming the dinucleotide 5′ pppApNOH 3′). (4) Promoter clearance: after RNA chain length reaches ~10–20 nt, the polymerase undergoes a conformational change and then is able to move away from the promoter, transcribing down the transcription unit. (5) Chain elongation: Successive residues are added to the 3′-OH terminus of the nascent RNA molecule until a transcription termination signal (T) is encountered. (6) Chain termination and RNAP release: Upon encountering the transcription termination site, RNAP undergoes an additional conformational change that leads to release of the completed RNA chain, the DNA template and RNAP. RNAP can rebind to DNA beginning the promoter search process and the cycle is repeated. Note that all of the steps in the transcription cycle are facilitated by additional proteins, and indeed are often subjected to regulation by positive and/or negative-acting factors. Murray RK, Bender DA, Botham KM, Kennelly PJ, Rodwell VW, Weil PA. Harper’s Illustrated Biochemistry. 29th ed. New York, NY: McGraw-Hill; 2012.
Processing of RNAs
Eukaryotic RNAs undergo significant co- and posttranscriptional processing. Almost all mRNA, tRNA, and rRNA genes RNA are transcribed from genes that contain introns. The sequences encoded by the intronic DNA must be removed from the primary transcript prior to the RNAs being biologically active. The process of intron removal is called RNA splicing. In addition to intron removal in tRNAs, extra nucleotides at both the 5′ and 3′ ends are cleaved, the sequence 5′–CCA–3′ is added to the 3′ end of all tRNAs, and several nucleotides undergo modification. Additional processing occurs to mRNAs. The 5′ end of all eukaryotic mRNAs are capped with a unique 5′→5′ linkage to a 7-methylguanosine residue (Figure 36-5). The capped end of the mRNA is thus, protected from exonucleases and more importantly is recognized by specific proteins of the translational machinery (Chapter 37).
FIGURE 36-5: Structure of the 5′ cap found on eukaryotic mRNAs.
The capping process occurs after the newly synthesizing mRNA is around 20-30 bases long. At this point, RNA pol II pauses and the kinase, positive transcription elongation factor b (pTEF-b), phosphorylates RNA pol II on the serine-2 residues (Ser2) in the repeat unit of the CTD of the large subunit of the enzyme. The pTEF-b complex is also called C-terminal domain kinase 1 (CTDK1). This pausing and regulatory phosphorylation event allows for the potential of attenuation in the rate of transcription.
Most eukaryotic mRNAs are also polyadenylated at the 3′ end. A specific sequence, AAUAAA, is recognized by the endonuclease activity of polyadenylate polymerase, which cleaves the primary transcript approximately 11-30 bases 3′ of the sequence element. A stretch of 20-250 A residues is then added to the 3′ end by the polyadenylate polymerase activity.
Splicing of RNAs
Mammalian mRNA genes contain highly conserved consensus sequences that ultimately reside at the 5′ and 3′ ends of the introns in the mRNA (Figure 36-6). Intron removal from nascent RNA molecules involves a process termed splicing (Figure 36-7). The process of intron removal is catalyzed by a specialized RNA-protein complex called the splicesome. The splicesome is composed of small nuclear ribonucleoprotein particles (snRNPs, pronounced snurps) that contain proteins and several snRNAs identified as U1, U2, U4, U5, and U6. The U1 RNA has sequences that are complimentary to sequences near the 5′ end of the intron. The binding of U1 snRNA distinguishes the GU at the 5′ end of the intron from other randomly placed GU sequences in mRNAs. The U2 RNA also recognizes sequences in the intron, in this case near the 3′ end.
FIGURE 36-6: Consensus sequences at splice junctions. The 5′ (donor; left) and 3′ (acceptor; right) sequences are shown. Also shown is the yeast consensus sequence (UACUA A C) for the branch site. In mammalian cells, this consensus sequence is PyNPyPyPuAPy, where Py is a pyrimidine, Pu is a purine, and N is any nucleotide. The branch site is located 20-40 nucleotides upstream from the 3′ splice site. Murray RK, Bender DA, Botham KM, Kennelly PJ, Rodwell VW, Weil PA. Harper’s Illustrated Biochemistry. 29th ed. New York, NY: McGraw-Hill; 2012.
FIGURE 36-7: The processing of the primary transcript to mRNA. In this hypothetical transcript, the 5′ (left) end of the intron is cut (↓) and a lariat forms between the G at the 5′ end of the intron and an A near the 3′ end, in the consensus sequence UACUAAC. This sequence is called the branch site, and it is the 3′ most A that forms the 5′–2′ bond with the G. The 3′ (right) end of the intron is then cut (⇓). This releases the lariat, which is digested, and exon 1 is joined to exon 2 at G residues. Murray RK, Bender DA, Botham KM, Kennelly PJ, Rodwell VW, Weil PA. Harper’s Illustrated Biochemistry. 29th ed. New York, NY: McGraw-Hill; 2012.
Alternative Splicing
The presence of introns in RNAs allows for alternative splicing to occur, the result of which is the potential for an increase in phenotypic diversity without increasing the overall number of genes. By altering the pattern of exons that are spliced together, different proteins can arise from the processed mRNA from a single gene (Figure 36-8). Alternative splicing can occur either at specific developmental stages or in different cell types. Depending upon the site of transcription, the calcitonin gene yields an mRNA that synthesizes calcitonin (thyroid) or calcitonin gene–related peptide (CGRP, brain): 2 proteins with distinctly different functions. Even more complex alternative splicing occurs in the α-tropomyosin mRNA such that at least 8 different alternatively spliced α-tropomyosin mRNAs are formed.
FIGURE 36-8: Mechanisms of alternative processing of mRNA precursors. This form of mRNA processing involves the selective inclusion or exclusion of exons, the use of alternative 5′ donor or 3′ acceptor sites, and the use of different polyadenylation sites. Murray RK, Bender DA, Botham KM, Kennelly PJ, Rodwell VW, Weil PA. Harper’s Illustrated Biochemistry. 29th ed. New York, NY: McGraw-Hill; 2012.
Abnormalities in the splicing process can lead to various disease states. Many defects in the β-globin genes are known to exist leading to β-thalassemias. Some of these defects are caused by mutations in the sequences of the mRNA required for intron recognition and, therefore, result in abnormal processing of the β-globin primary transcript.
RNA Editing
RNA editing refers to a process by which there is posttranscriptional modification of nucleotide sequences of a given RNA transcripts via an enzymatic reaction. There are 2 types of RNA editing reactions that occur in mammalian cells. One involves the substitution of a uridine (U) for a cytidine (C) and the other involves the substitution of an inosine (I) for an adenosine (A). The C-to-U substitutions involve a cytidine deaminase that deaminates a cytidine base into a uridine base. The A-to-I substitutions involve an adenosine deaminase acting on RNA (ADAR). This enzyme is not the same as the adenosine deaminase involved in the salvage of purine nucleotides (Chapter 32).
An example of a physiologically and clinically significant C-to-U editing involves the apolipoprotein B (apoB) gene in humans (Figure 36-9). ApoB-100 is expressed in the liver and apoB-48 is expressed in the intestines. The apoB-100 form has a CAA sequence that is edited to UAA, a stop codon, in the intestines but is left unedited in the liver. The process of A-to-I editing is the most common form of RNA editing in humans. Due to the fact that I behaves as if it is G both in translation and when forming secondary structures, the effects of these types of RNA edits include alteration of coding capacity, altered splicing, cytoplasmic sequestration, endonucleolytic cleavage, and inhibition of miRNA and siRNA processing.
FIGURE 36-9: C-to-U editing of the apoB mRNA. In humans (and other mammals) the apoB gene is expressed in both hepatocytes and intestinal epithelial cells. In liver cells, the protein product is a 500-kD protein identified as apoB-100, whereas in intestinal cells the protein product is a smaller protein identified as apoB-48. The apoB-100 protein is translated from a nonedited mRNA, but the apoB-48 is translated from an edited mRNA. The C-to-U editing occurs in a CAA codon in exon 26 that results in the generation of a stop codon at that location in the edited mRNA, resulting in the truncation apoB-48 protein. The editing of the apoB mRNA is catalyzed by a cytidine deaminase. Reproduced with permission of themedicalbiochemistrypage, LLC
Catalytically Active RNAs: Ribozymes
Ribozymes are RNAs with catalytic activity. The catalytic properties of ribozymes are exclusively due to the capacity of these RNA molecules to assume particular structures. Ribozymes function during protein synthesis, in RNA processing reactions, and in the regulation of gene expression. The processes of RNA-mediated catalysis were first identified by the discovery of self-splicing RNAs. Subsequently, numerous natural RNA motifs endowed with catalytic activity have been identified in mammalian tissues. Almost all ribozymes carry out a phosphoryl transfer reaction, catalyzing the cleavage or ligation of the RNA phosphodiester backbone. A notable exception is the peptidyl transferase activity of the ribozyme associated with the 28S subunit of the ribosome.
The mammalian cytoplasmic polyadenylation element–binding protein 3 (CPEB3) ribozyme is a self-cleaving noncoding RNA located in the second intron of the CPEB3 gene. This ribozyme is involved in the process of mRNA polyadenylation. Other mammalian ribozymes have been characterized that are involved in regulating gene expression by inducing RNA cleavage. Each of the ribozymes of this latter class is called hammerhead ribozymes due to the presence of secondary structures resembling a hammerhead shark. Hammerhead ribozymes can cleave mRNAs and they can remove the polyA tail from mRNAs, resulting in defective nuclear export. Both processes ultimately lead to reduced protein expression.
Regulation of Eukaryotic Transcription
The controls that act on eukaryotic gene expression are highly complex and occur at multiple points from the chromosome to ultimately the biologically active protein (Table 36-3).
In order for RNA polymerases to access the transcriptional start site of a given gene, they must obtain access to the DNA template. Due to the organization of chromatin, the state of histone modification, and the presence or absence of transcriptional accessory proteins (transcription factors), RNA polymerase may or may not be able to activate transcription.
With respect to chromatin, the broadly descriptive of forms, heterochromatin and euchromatin, determine overall transcriptional potential. Heterochromatin is more densely packed and is generally transcriptionally silent; whereas euchromatin is more loosely packed and is where active gene transcription will be found to be taking place. As discussed in Chapter 35, chromatin structure can be modified from a state which prevents transcription to one which promotes transcription by either of 2 major mechanisms: DNA methylation and histone modification. DNA methylation occurs on cytosine present in CpG dinucleotides and the promoter regions of genes contain 10-20 times as many CpGs when compared to the rest of the genome. Generally, the higher the state of CpG methylation of a gene, the less transcriptionally active it will be. Histone acetylation is another major regulator of transcriptional initiation, where the presence of acetylation is associated with higher transcriptional activity. Many transcriptional activator complexes contain histone acetylase activity, whereas transcriptional repressor complexes contain histone deacetylase activity.
The most complex controls observed in eukaryotic genes are those that regulate the expression of RNA pol II–transcribed genes, the mRNA genes. Eukaryotic mRNA genes contain a basic structure (see Figure 36-1) consisting of coding exons and noncoding introns and any number of different transcriptional regulatory domains (Table 36-4) that are sites for the interaction of various classes of transcription factor (Table 36-5). Several conserved structural motifs (Table 36-6) are found in most transcription factors including the helix-turn-helix, the zinc finger (Figure 36-10), and the leucine zipper (Figure 36-11).
TABLE 36-4: Some of the Mammalian RNA Polymerase II Transcription Control Elements, Their Consensus Sequences, and the Factors That Bind to Them