In accordance with this convention, the complete sequence of approximately 2.0 kb of chromosome 11 that includes the β-globin gene is shown in Figure 3-7. (It is sobering to reflect that a printout of the entire human genome at this scale would require over 300 books the size of this textbook!) Within these 2.0 kb are contained most, but not all, of the sequence elements required to encode and regulate the expression of this gene. Indicated in Figure 3-7 are many of the important structural features of the β-globin gene, including conserved promoter sequence elements, intron and exon boundaries, 5′ and 3′ UTRs, RNA splice sites, the initiator and termination codons, and the polyadenylation signal, all of which are known to be mutated in various inherited defects of the β-globin gene (see Chapter 11).
Initiation of Transcription
The β-globin promoter, like many other gene promoters, consists of a series of relatively short functional elements that interact with specific regulatory proteins (generically called transcription factors) that control transcription, including, in the case of the globin genes, those proteins that restrict expression of these genes to erythroid cells, the cells in which hemoglobin is produced. There are well over a thousand sequence-specific, DNA-binding transcription factors in the genome, some of which are ubiquitous in their expression, whereas others are cell type– or tissue-specific.
One important promoter sequence found in many, but not all, genes is the TATA box, a conserved region rich in adenines and thymines that is approximately 25 to 30 bp upstream of the start site of transcription (see Figs. 3-4 and 3-7). The TATA box appears to be important for determining the position of the start of transcription, which in the β-globin gene is approximately 50 bp upstream from the translation initiation site (see Fig. 3-6). Thus in this gene, there are approximately 50 bp of sequence at the 5′ end that are transcribed but are not translated; in other genes, the 5′ UTR can be much longer and can even be interrupted by one or more introns. A second conserved region, the so-called CAT box (actually CCAAT), is a few dozen base pairs farther upstream (see Fig. 3-7). Both experimentally induced and naturally occurring mutations in either of these sequence elements, as well as in other regulatory sequences even farther upstream, lead to a sharp reduction in the level of transcription, thereby demonstrating the importance of these elements for normal gene expression. Many mutations in these regulatory elements have been identified in patients with the hemoglobin disorder β-thalassemia (see Chapter 11).
Not all gene promoters contain the two specific elements just described. In particular, genes that are constitutively expressed in most or all tissues (so-called housekeeping genes) often lack the CAT and TATA boxes, which are more typical of tissue-specific genes. Promoters of many housekeeping genes contain a high proportion of cytosines and guanines in relation to the surrounding DNA (see the promoter of the BRCA1 breast cancer gene in Fig. 3-4). Such CG-rich promoters are often located in regions of the genome called CpG islands, so named because of the unusually high concentration of the dinucleotide 5′-CpG-3′ (the p representing the phosphate group between adjacent bases; see Fig. 2-3) that stands out from the more general AT-rich genomic landscape. Some of the CG-rich sequence elements found in these promoters are thought to serve as binding sites for specific transcription factors. CpG islands are also important because they are targets for DNA methylation. Extensive DNA methylation at CpG islands is usually associated with repression of gene transcription, as we will discuss further later in the context of chromatin and its role in the control of gene expression.
Transcription by RNA polymerase II (RNA pol II) is subject to regulation at multiple levels, including binding to the promoter, initiation of transcription, unwinding of the DNA double helix to expose the template strand, and elongation as RNA pol II moves along the DNA. Although some silenced genes are devoid of RNA pol II binding altogether, consistent with their inability to be transcribed in a given cell type, others have RNA pol II poised bidirectionally at the transcriptional start site, perhaps as a means of fine-tuning transcription in response to particular cellular signals.
In addition to the sequences that constitute a promoter itself, there are other sequence elements that can markedly alter the efficiency of transcription. The best characterized of these “activating” sequences are called enhancers. Enhancers are sequence elements that can act at a distance from a gene (often several or even hundreds of kilobases away) to stimulate transcription. Unlike promoters, enhancers are both position and orientation independent and can be located either 5′ or 3′ of the transcription start site. Specific enhancer elements function only in certain cell types and thus appear to be involved in establishing the tissue specificity or level of expression of many genes, in concert with one or more transcription factors. In the case of the β-globin gene, several tissue-specific enhancers are present both within the gene itself and in its flanking regions. The interaction of enhancers with specific regulatory proteins leads to increased levels of transcription.
Normal expression of the β-globin gene during development also requires more distant sequences called the locus control region (LCR), located upstream of the ε-globin gene (see Fig. 3-2), which is required for establishing the proper chromatin context needed for appropriate high-level expression. As expected, mutations that disrupt or delete either enhancer or LCR sequences interfere with or prevent β-globin gene expression (see Chapter 11).