Gene expression, the generation of a protein or RNA product from a particular gene, is controlled by complex mechanisms. Normally, only a fraction of the genes in a cell are expressed at any time. Gene expression is regulated differently in prokaryotes and eukaryotes.
Regulation of Gene Expression in Prokaryotes. In prokaryotes, gene expression is regulated mainly by controlling the initiation of gene transcription. Sets of genes that encode proteins with related functions are organized into operons, and each operon is under the control of a single promoter (or regulatory region). Regulatory proteins called repressors bind to the promoter and inhibit the binding of RNA polymerase (negative control), whereas activator proteins facilitate RNA polymerase binding (positive control). Repressors are controlled by nutrients or their metabolites and are classified as inducers or corepressors. Regulation also may occur through attenuation of transcription.
Eukaryotes: Regulation of Gene Expression at the Level of DNA. In eukaryotes, activation of a gene requires changes in the state of chromatin (chromatin remodeling) that are facilitated by acetylation of histones and methylation of bases. These changes in DNA determine which genes are available for transcription.
Regulation of Eukaryotic Gene Transcription. Transcription of specific genes is regulated by proteins (called specific transcription factors or transactivators) that bind to gene regulatory sequences (called promoter-proximal elements, response elements, or enhancers) that activate or inhibit assembly of the basal transcription complex and RNA polymerase at a TATA box or similar regulatory element. These specific transcription factors, which may bind to DNA sequences some distance from the promoter, interact with coactivators or corepressors that bind to components of the basal transcription complex. These protein factors are said to work in “trans”; the DNA sequences to which they bind are said to work in “cis.”
Other Sites for Regulation of Eukaryotic Gene Expression. Regulation also occurs during the processing of RNA, during RNA transport from the nucleus to the cytoplasm, and at the level of translation in the cytoplasm. Regulation can occur simultaneously at multiple levels for a specific gene, and many factors act in concert to stimulate or inhibit the expression of a gene.
THE WAITING ROOM
Charles F., a 68-year-old man, complained of fatigue, loss of appetite, and a low-grade fever. An open biopsy of a lymph node indicated the presence of non-Hodgkin lymphoma, follicular type. Computed tomography and other noninvasive procedures showed a diffuse process with bone marrow involvement. He is receiving multidrug chemotherapy with R-CHOP (rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone).
Mannie W. is a 56-year-old man who complains of weight loss related to a decreased appetite and increased fatigue. He notes discomfort in the left upper quadrant of his abdomen. On physical examination, he is noted to be pale and to have ecchymoses (bruises) on his arms and legs. His spleen is markedly enlarged.
Initial laboratory studies show a hemoglobin of 10.4 g/dL (normal, 13.5 to 17.5 g/dL) and a leukocyte (white blood cell) count of 106,000 cells/mm3 (normal, 4,500 to 11,000 cells/mm3). The majority of the leukocytes are granulocytes (white blood cells arising from the myeloid lineage), some of which have an “immature” appearance. The percentage of lymphocytes in the peripheral blood is decreased. A bone marrow aspiration and biopsy show the presence of an abnormal chromosome (the Philadelphia chromosome) in dividing marrow cells.
Ann R., who has anorexia nervosa, has continued on an almost meat-free diet (see Chapters 1, 3, 9, and 10). She now appears emaciated and pale. Her hemoglobin is 9.7 g/dL (normal = 12 to 15 g/dL), her hematocrit (volume of packed red cells) is 31% (reference range for women, 36% to 46%), and her mean corpuscular volume (the average volume of a red cell) is 70 femtoliters (fL; 1 fL is 10−15 L) (reference range = 80 to 100 fL). These values indicate an anemia that is microcytic (small red cells) and hypochromic (light in color, indicating a reduced amount of hemoglobin per red cell). Her serum ferritin (the cellular storage form of iron) was also subnormal. Her plasma level of transferrin (the iron transport protein in plasma) is higher than normal, but its percentage saturation with iron is below normal. This laboratory profile is consistent with changes that occur in an iron deficiency state.
I. Gene Expression Is Regulated for Adaptation and Differentiation
Virtually all cells of an organism contain identical sets of genes. However, at any given time, only a small number of the total genes in each cell are expressed (i.e., generate a protein or RNA product). The remaining genes are inactive. Organisms gain a number of advantages by regulating the activity of their genes. For example, both prokaryotic and eukaryotic cells adapt to changes in their environment by turning the expression of genes on and off. Because the processes of RNA transcription and protein synthesis consume a considerable amount of energy, cells conserve fuel by making proteins only when they are needed.
In addition to regulating gene expression to adapt to environmental changes, eukaryotic organisms alter the expression of their genes during development. As a fertilized egg becomes a multicellular organism, different kinds of proteins are synthesized in varying quantities. In the human, as the child progresses through adolescence and then into adulthood, physical and physiologic changes result from variations in gene expression and, therefore, of protein synthesis. Even after an organism has reached the adult stage, regulation of gene expression enables certain cells to undergo differentiation to assume new functions.
II. Regulation of Gene Expression in Prokaryotes
Prokaryotes are single-celled organisms and therefore require less complex regulatory mechanisms than multicellular eukaryotes (Fig. 15.1). The most extensively studied prokaryote is the bacterium Escherichia coli, an organism that thrives in the human colon, usually enjoying a symbiotic relationship with its host. Based on the size of its genome (4 × 106 base pairs), E. coli should be capable of making several thousand proteins. However, under normal growth conditions, E. coli synthesizes only about 600 to 800 different proteins. Thus, many genes are inactive, and E. coli will only synthesize those genes that generate the proteins required for growth in that particular environment.
All E. coli cells of the same strain are morphologically similar and contain an identical circular chromosome (see Fig. 15.1). As in other prokaryotes, DNA is not complexed with histones, no nuclear envelope separates the genes from the contents of the cytoplasm, and gene transcripts do not contain introns. In fact, as messenger RNA (mRNA) is being synthesized, ribosomes bind and begin to produce proteins, so that transcription and translation occur simultaneously (known as coupled transcription–translation). The mRNA molecules in E. coli have a very short half-life and are degraded within a few minutes. mRNA molecules must be constantly generated from transcription to maintain synthesis of its proteins. Thus, regulation of transcription, principally at the level of initiation, is sufficient to regulate the level of proteins within the cell.
A. Operons
The genes encoding proteins are called structural genes. In the bacterial genome, the structural genes for proteins involved in performing a related function (such as the enzymes of a biosynthetic pathway) are often grouped sequentially into units called operons (Fig. 15.2, and see Fig. 13.6). The genes in an operon are coordinately expressed; that is, they are either all turned on or all turned off. When an operon is expressed, all of its genes are transcribed (refer to Chapter 13, Section III). A single polycistronic mRNA that codes for all the proteins of the operon is produced. This polycistronic mRNA contains multiple sets of start and stop codons that allow a number of different proteins to be produced from this single transcript at the translational level. Transcription of the genes in an operon is regulated by the promoter, which is located in the operon at the 5′-end, upstream from the structural genes.
B. Regulation of RNA Polymerase Binding by Repressors
In bacteria, the principal means of regulating gene transcription is through repressors, which are regulatory proteins that prevent the binding of RNA polymerase to the promoter and thus act on initiation of transcription (Fig. 15.3). In general, regulatory mechanisms such as repressors, which work through inhibition of gene transcription, are referred to as negative control, and mechanisms that work through stimulation of gene transcription are called positive control.
The repressor is encoded by a regulatory gene (see Fig. 15.3). Although this gene is considered part of the operon, it is not always located near the remainder of the operon. Its product, the repressor protein, diffuses to the promoter and binds to a region of the operon called the operator. The operator is located within the promoter or near its 3′-end, just upstream from the transcription start point. When a repressor is bound to the operator, the operon is not transcribed because the repressor protein either physically blocks the binding of RNA polymerase to the promoter or prevents the RNA polymerase from initiating transcription. Two regulatory mechanisms work through controlling repressors: induction (an inducer inactivates the repressor) and repression (a corepressor is required to activate the repressor).
1. Inducers
Induction involves a small molecule, known as an inducer, which stimulates expression of the operon by binding to the repressor and changing its conformation so that it can no longer bind to the operator (Fig. 15.4). The inducer is either a nutrient or a metabolite of the nutrient. In the presence of the inducer, RNA polymerase can therefore bind to the promoter and transcribe the operon. The key to this mechanism is that in the absence of the inducer, the repressor is active, transcription is repressed, and the genes of the operon are not expressed.
Consider, for example, induction of the lac operon of E. coli by lactose (Fig. 15.5). The enzymes for metabolizing glucose by glycolysis are produced constitutively; that is, they are constantly being made. If the milk sugar lactose is available, the cells adapt and begin to produce the three additional enzymes required for lactose metabolism, which are encoded by the lac operon. A metabolite of lactose (allolactose) serves as an inducer, binding to the repressor and inactivating it. Because the inactive repressor no longer binds to the operator, RNA polymerase can bind to the promoter and transcribe the structural genes of the lac operon, producing a polycistronic mRNA that encodes for the three additional proteins. However, the presence of glucose can prevent activation of the lac operon (see Section II.C). It is important to realize that the lac operon is expressed at very low levels (basal levels) even in the absence of inducer. Thus, even in the absence of lactose, a small amount of permease is present in the cellular membrane. Therefore, when lactose does become available in the environment, a few molecules of lactose are able to enter the cell and can be metabolized to allolactose. The few molecules of allolactose produced are sufficient to induce the operon. As the amount of permease increases, more lactose can be transported into the cell to be used as an energy source.
2. Corepressors
In a regulatory model called repression, the repressor is inactive until a small molecule called a corepressor (a nutrient or its metabolite) binds to the repressor, activating it (Fig. 15.6). The repressor–corepressor complex then binds to the operator, preventing binding of RNA polymerase and gene transcription. Consider, for example, the trp operon, which encodes the five enzymes required for the synthesis of the amino acid tryptophan. When tryptophan is available, E. coli cells save energy by no longer making these enzymes. Tryptophan is a corepressor that binds to the inactive repressor, causing it to change conformation and bind to the operator, thereby inhibiting transcription of the operon. Thus, in the repression model, the repressor is inactive without a corepressor; in the induction model, the repressor is active unless an inducer is present.
C. Stimulation of RNA Polymerase Binding
In addition to regulating transcription by means of repressors that inhibit RNA polymerase binding to promoters (negative control), bacteria regulate transcription by means of activating proteins that bind to the promoter and stimulate the binding of RNA polymerase (positive control). Transcription of the lac operon, for example, can be induced by allolactose only if glucose is absent. The presence or absence of glucose is communicated to the promoter by a regulatory protein named the cyclic adenosine monophosphate (cAMP) receptor protein (CRP) (Fig. 15.7). This regulatory protein is also called a catabolite activator protein (CAP). A decrease in glucose levels increases levels of the intracellular second messenger cAMP by a mechanism that involves glucose transport into the bacteria. cAMP binds to CRP and the cAMP–CRP complex binds to a regulatory region of the operon, stimulating binding of RNA polymerase to the promoter and transcription. When glucose is present, cAMP levels decrease, CRP assumes an inactive conformation that does not bind to the operon, and the recruitment of RNA polymerase to the promoter is reduced, resulting in inhibition of transcription. Thus, the enzymes encoded by the lac operon are not produced if cells have an adequate supply of glucose, even if lactose is present at very high levels.
D. Regulation of RNA Polymerase Binding by Sigma Factors
E. coli has only one RNA polymerase. Sigma factors bind to this RNA polymerase, stimulating its binding to certain sets of promoters, thus simultaneously activating transcription of several operons. The standard sigma factor in E. coli is σ70, a protein with a molecular weight of 70,000 daltons (see Chapter 13). Other sigma factors also exist. For example, σ32 helps RNA polymerase recognize promoters for the different operons that encode the heat-shock proteins. Thus, increased transcription of the genes for heat-shock proteins, which prevent protein denaturation at high temperatures, occurs in response to elevated temperatures.
E. Attenuation of Transcription
Some operons are regulated by a process that interrupts (attenuates) transcription after it has been initiated (Fig. 15.8). For example, high levels of tryptophan attenuate transcription of the E. coli trp operon, as well as repress its transcription. As mRNA is being transcribed from the trp operon, ribosomes bind and rapidly begin to translate the transcript. Near the 5′-end of the transcript are a number of codons for tryptophan. Initially, high levels of tryptophan in the cell result in high levels of Trp-tRNATrp and rapid translation of the transcript. However, rapid translation generates a hairpin loop in the mRNA that serves as a termination signal for RNA polymerase, and transcription terminates. Conversely, when tryptophan levels are low, levels of Trp-tRNATrp are low, and ribosomes stall at codons for tryptophan. A different hairpin loop forms in the mRNA that does not terminate transcription, and the complete mRNA is transcribed. Attenuation requires coupled transcription and translation, so this mechanism is not applicable to eukaryotic systems.
The tryptophan, histidine, leucine, phenylalanine, and threonine operons are regulated, in part, by attenuation. Repressors and activators also act on the promoters of some of these operons, allowing the levels of these amino acids to be very carefully and rapidly regulated.
III. Regulation of Gene Expression in Eukaryotes
Multicellular eukaryotes are much more complex than single-celled prokaryotes. As the human embryo develops into a multicellular organism, different sets of genes are turned on and different groups of proteins are produced, resulting in differentiation into morphologically distinct cell types that are able to perform different functions. Even beyond development, certain cells within the organism continue to differentiate, such as those that produce antibodies in response to an infection, renew the population of red blood cells, and replace digestive cells that have been sloughed into the intestinal lumen. All of these physiologic changes are dictated by complex alterations in gene expression.
A. Regulation at Multiple Levels
Differences between eukaryotic and prokaryotic cells result in different mechanisms for regulating gene expression. DNA in eukaryotes is organized into the nucleosomes of chromatin, and genes must be in an active structure to be expressed in a cell. Furthermore, operons are not present in eukaryotes, and the genes that encode proteins that function together are usually located on different chromosomes. For example, the gene for α-globin is on chromosome 15, whereas the gene for β-globin is on chromosome 11. Thus, each gene needs its own promoter. In addition, the processes of transcription and translation are separated in eukaryotes by intracellular compartmentation (nucleus and cytosol, or endoplasmic reticulum [ER]) and by time (eukaryotic heterogenous nuclear RNA [hnRNA], also known as pre-RNA) must be processed and translocated out of the nucleus before it is translated). Thus, regulation of eukaryotic gene expression occurs at multiple levels:
- DNA and the chromosome, including chromosome remodeling and gene rearrangement
- Transcription, primarily through transcription factors that affect binding of RNA polymerase to the promoter
- Processing of transcripts
- Initiation of translation and stability of mRNA
Once a gene is activated through chromatin remodeling, the major mechanism of regulating expression affects initiation of transcription at the promoter.
B. Regulation of Availability of Genes for Transcription
Once a haploid sperm and egg combine to form a diploid cell, the number of genes in human cells remains approximately the same. As cells differentiate, different genes are available for transcription. A typical nucleus contains chromatin that is condensed (heterochromatin) and chromatin that is diffuse (euchromatin) (see Chapter 11). The genes in heterochromatin are inactive, whereas those in euchromatin produce mRNA. Long-term changes in the activity of genes occur during development as chromatin goes from a diffuse to a condensed state or vice versa.
The cellular genome is packaged together with histones into nucleosomes, and initiation of transcription is prevented if the promoter region is part of a nucleosome. Thus, activation of a gene for transcription requires changes in the state of the chromatin, called chromatin remodeling. The availability of genes for transcription also can be affected in certain cells, or under certain circumstances, by gene rearrangements, amplification, or deletion. For example, during lymphocyte maturation, genes are rearranged to produce a variety of different antibodies. The term epigenetics is used to refer to changes in gene expression without altering the sequence of the DNA. Chromatin remodeling and DNA methylation are such changes that can be inherited and contribute to the regulation of gene expression.
1. Chromatin Remodeling
The remodeling of chromatin generally refers to the displacement of the nucleosome from specific DNA sequences so that transcription of the genes in that sequence can be initiated. This occurs through two different mechanisms. The first mechanism is by an adenosine triphosphate (ATP)–driven chromatin remodeling complex, which uses energy from ATP hydrolysis to unwind certain sections of DNA from the nucleosome core. The second mechanism is by covalent modification of the histone tails through acetylation (Fig. 15.9). Histone acetyltransferases (HAT) transfer an acetyl group from acetyl coenzyme A (acetyl CoA) to lysine residues in the histone tails (the amino-terminal ends of histones H2A, H2B, H3, and H4) of the histone octamer. This reaction removes a positive charge from the ε-amino group of the lysine, thereby reducing the electrostatic interactions between the histones and the negatively charged DNA, making it easier for DNA to unwind from the histones. The acetyl groups can be removed by histone deacetylases (HDAC). Each histone has a number of lysine residues that may be acetylated and, through a complex mixing of acetylated and nonacetylated sites, different segments of DNA can be freed from the nucleosome. A number of transcription factors and coactivators contain histone acetyltransferase activity, which facilitates the binding of these factors to the DNA and simultaneous activation of the gene and initiation of its transcription.
2. Methylation of DNA
Cytosine residues in DNA can be methylated to produce 5-methylcytosine. The methylated cytosines are located in CG-rich sequences (called CG or CpG islands), which are often near or in the promoter region of a gene. In certain instances, genes that are methylated are less readily transcribed than those that are not methylated. For example, globin genes are more extensively methylated in nonerythroid cells (cells that are not a part of the erythroid, or red blood cell, lineage) than in cells in which these genes are expressed (such as erythroblasts and reticulocytes). Methylation is a mechanism for regulating gene expression during differentiation, particularly in fetal development.
3. Gene Rearrangement
Segments of DNA can move from one location to another in the genome, associating with each other in various ways so that different proteins are produced (Fig. 15.10). The most thoroughly studied example of gene rearrangement occurs in cells that produce antibodies. Antibodies contain two light chains and two heavy chains, each of which contains both a variable region and a constant region (see Chapter 7, Section VIII, Fig. 7.19). Cells called B-cells make antibodies. In the precursors of B-cells, hundreds of VH sequences, approximately 20 DH sequences, and approximately 6 JH sequences are located in clusters within a long region of the chromosome (see Fig. 15.10). During the production of the immature B-cells, a series of recombinational events occur that joins one VH, one DH, and one JH sequence into a single exon. This exon now encodes the variable region of the heavy chain of the antibody. Given the large number of immature B-cells that are produced, virtually every recombinational possibility occurs, so all VDJ combinations are represented within this cell population. Later in development, during the differentiation of mature B-cells, recombinational events join a VDJ sequence to one of the nine heavy-chain elements. When the immune system encounters an antigen, the one immature B-cell that can bind to that antigen (because of its unique manner of forming the VDJ exon) is stimulated to proliferate (clonal expansion) and to produce antibodies against the antigen.
4. Gene Amplification
Gene amplification is not the usual physiologic means of regulating gene expression in normal cells, but it does occur in response to certain stimuli if the cell can obtain a growth advantage by producing large amounts of protein. In gene amplification, certain regions of a chromosome undergo repeated cycles of DNA replication. The newly synthesized DNA is excised and forms small, unstable chromosomes called double minutes. The double minutes integrate into other chromosomes throughout the genome, thereby amplifying the gene in the process. Normally, gene amplification occurs through errors during DNA replication and cell division and, if the environmental conditions are appropriate, cells containing amplified genes may have a growth advantage over those without the amplification.