All genetic diseases involve defects at the level of the cell. For this reason, one must understand basic cell biology to understand genetic disease. Errors can occur in the replication of genetic material or in the translation of genes into proteins. Such errors commonly produce single-gene disorders. In addition, errors that occur during cell division can lead to disorders involving entire chromosomes. To provide the basis for understanding these errors and their consequences, this chapter focuses on the processes through which genes are replicated and translated into proteins, as well as the process of cell division.
In the 19th century, microscopic studies of cells led scientists to suspect that the nucleus of the cell ( Fig. 2.1 ) contains the important mechanisms of inheritance. They found that chromatin, the substance that gives the nucleus a granular appearance, is observable in the nuclei of nondividing cells. Just before a cell undergoes division, the chromatin condenses to form microscopically observable, threadlike structures called chromosomes (from the Greek words for “colored bodies”). With the rediscovery of Mendel’s breeding experiments at the beginning of the 20th century, it soon became apparent that chromosomes contain genes. Genes are transmitted from parent to offspring and are considered the basic unit of inheritance. It is through the transmission of genes that physical traits such as eye color are inherited in families. Diseases can also be transmitted through genetic inheritance.
Physically, genes are composed of deoxyribonucleic acid (DNA). DNA provides the genetic blueprint for all proteins in the body. Thus genes ultimately influence all aspects of body structure and function. Humans are estimated to have approximately 21,000 genes (sequences of DNA that code for proteins). An error (or mutation ) in one of these genes often leads to a recognizable genetic disease. In addition, there are thousands of genes that encode a ribonucleic acid (RNA) product but not a protein.
Genes, the basic unit of inheritance, are contained in chromosomes and consist of DNA.
Each human somatic cell (cells other than the gametes, or sperm and egg cells) contains 23 pairs of different chromosomes, for a total of 46. One member of each pair is derived from the individual’s father, and the other member is derived from the mother. One of the chromosome pairs consists of the sex chromosomes. In normal males, the sex chromosomes are a Y chromosome inherited from the father and an X chromosome inherited from the mother. Two X chromosomes are found in normal females, one inherited from each parent. The other 22 pairs of chromosomes are autosomes. The members of each pair of autosomes are said to be homologs, or homologous, because their DNA is very similar. The X and Y chromosomes are not homologs of each other.
Somatic cells, having two of each chromosome, are diploid cells. Human gametes have the haploid number of chromosomes, 23. The diploid number of chromosomes is maintained in successive generations of somatic cells by the process of mitosis, whereas the haploid number is obtained through the process of meiosis. Both of these processes are discussed in detail later in this chapter.
Somatic cells are diploid, having 23 pairs of chromosomes (22 pairs of autosomes and one pair of sex chromosomes). Gametes are haploid and have a total of 23 chromosomes.
DNA, RNA, and Proteins: Heredity at the Molecular Level
DNA
Composition and Structure of DNA
The DNA molecule has three basic components: the pentose sugar, deoxyribose; a phosphate group; and four types of nitrogenous bases (so named because they can combine with hydrogen ions in acidic solutions). Two of the bases, cytosine and thymine, are single carbon–nitrogen rings called pyrimidines. The other two bases, adenine and guanine, are double carbon–nitrogen rings called purines ( Fig. 2.2 ). The four bases are commonly represented by their first letters: C, T, A, and G.
One of the contributions of Watson and Crick in the mid-20th century was to demonstrate how these three components are physically assembled to form DNA. They proposed the now-famous double helix model, in which DNA can be envisioned as a twisted ladder with chemical bonds as its rungs ( Fig. 2.3 ). The two sides of the ladder are composed of the sugar and phosphate components, held together by strong phosphodiester bonds. Projecting from each side of the ladder, at regular intervals, are the nitrogenous bases. The base projecting from one side is bound to the base projecting from the other side by relatively weak hydrogen bonds. The paired nitrogenous bases therefore form the rungs of the ladder.
Fig. 2.2 illustrates the chemical bonds between bases and shows that the ends of the ladder terminate in either 3′ or 5′. These labels are derived from the order in which the five carbon atoms composing deoxyribose are numbered. Each DNA subunit, consisting of one deoxyribose, one phosphate group, and one base, is called a nucleotide.
Different sequences of nucleotide bases (e.g., ACCAAGTGC) specify different proteins. Specification of the body’s many proteins must require a great deal of genetic information. Indeed, each haploid human cell contains approximately 3 billion nucleotide pairs, more than enough information to specify the composition of all human proteins.
The most important constituents of DNA are the four nucleotide bases: adenine, thymine, cytosine, and guanine. DNA has a double helix structure.
DNA Coiling
Textbook illustrations usually depict DNA as a double helix molecule that continues in a long, straight line. However, if the DNA in a cell were actually stretched out in this way, it would be about 2 meters long. To package all of this DNA into a tiny cell nucleus, it is coiled at several levels. First, the DNA is wound around a histone protein core to form a nucleosome ( Fig. 2.4 ). About 140 to 150 DNA bases are wound around each histone core, and then 20 to 60 bases form a spacer element before the next nucleosome complex. The nucleosomes in turn form a helical solenoid; each turn of the solenoid includes about six nucleosomes. The solenoids themselves are organized into chromatin loops, which are attached to a protein scaffold. Each of these loops contains approximately 100,000 base pairs (bp), or 100 kilobases (kb) of DNA. The end result of this coiling and looping is that the DNA, at its maximum stage of condensation, is only about 1/10,000 as long as it would be if it were fully stretched out.
DNA is a tightly coiled structure. This coiling occurs at several levels: the nucleosome, the solenoid, and 100-kb loops.
Replication of DNA
As cells divide to make copies of themselves, identical copies of DNA must be made and incorporated into the new cells. This is essential if DNA is to serve as the fundamental genetic material. DNA replication begins as the weak hydrogen bonds between bases break, producing single DNA strands with unpaired bases. The consistent pairing of adenine with thymine and guanine with cytosine, known as complementary base pairing, is the key to accurate replication. The principle of complementary base pairing dictates that the unpaired base will attract a free nucleotide only if that nucleotide has the proper complementary base. For example, a portion of a single strand with the base sequence ATTGCT will bond with a series of free nucleotides with the bases TAACGA. The single strand is said to be a template upon which the complementary strand is built. When replication is complete, a new double-stranded molecule identical to the original is formed ( Fig. 2.5 ).
Several different enzymes are involved in DNA replication. One enzyme unwinds the double helix and another holds the strands apart. Still another enzyme, DNA polymerase, travels along the single DNA strand, adding free nucleotides to the 3′ end of the new strand. Nucleotides can be added only to the 3′ end of the strand, so replication always proceeds from the 5′ to the 3′ end. When referring to the orientation of sequences along a gene, the 5′ direction is termed upstream, and the 3′ direction is termed downstream.
In addition to adding new nucleotides, DNA polymerase performs part of a proofreading procedure, in which a newly added nucleotide is checked to make certain that it is in fact complementary to the template base. If it is not, the nucleotide is excised and replaced with a correct complementary nucleotide base. This process substantially enhances the accuracy of DNA replication. When a DNA replication error is not successfully repaired, a mutation has occurred. As will be seen in Chapter 3 , many such mutations cause genetic diseases.
DNA replication is critically dependent on the principle of complementary base pairing (A with T; C with G). This allows a single strand of the double-stranded DNA molecule to form a template for the synthesis of a new, complementary strand.
The rate of DNA replication in humans, about 40 to 50 nucleotides per second, is comparatively slow. In bacteria the rate is much higher, reaching 500 to 1000 nucleotides per second. Given that some human chromosomes have as many as 250 million nucleotides, replication would be an extraordinarily time-consuming process if it proceeded linearly from one end of the chromosome to the other: For a chromosome of this size, a single round of replication would take almost 2 months. Instead, replication begins at many different points along the chromosome, termed replication origins. The resulting multiple separations of the DNA strands are called replication bubbles ( Fig. 2.6 ). By occurring simultaneously at many different sites along the chromosome, the replication process can proceed much more quickly.
Replication bubbles allow DNA replication to take place at multiple locations on the chromosome, greatly speeding the replication process.
From Genes to Proteins
While DNA is formed and replicated in the cell nucleus, protein synthesis takes place in the cytoplasm. The information contained in DNA must be transported to the cytoplasm and then used to dictate the composition of proteins. This involves two processes, transcription and translation. Briefly, the DNA code is transcribed into messenger RNA, which then leaves the nucleus to be translated into proteins. These processes, summarized in Fig. 2.7 , are discussed at length later in this chapter. Transcription and translation are both mediated by ribonucleic acid (RNA), a type of nucleic acid that is chemically similar to DNA. Like DNA, RNA is composed of sugars, phosphate groups, and nitrogenous bases. It differs from DNA in that the sugar is ribose instead of deoxyribose, and uracil rather than thymine is one of the four bases. Uracil is structurally similar to thymine, so like thymine, it can pair with adenine. Another difference between RNA and DNA is that whereas DNA usually occurs as a double strand, RNA usually occurs as a single strand.
DNA sequences encode proteins through the processes of transcription and translation. These processes both involve RNA, a single-stranded molecule that is similar to DNA except that it has a ribose sugar rather than deoxyribose and a uracil base rather than thymine.
Transcription
Transcription is the process by which an RNA sequence is formed from a DNA template ( Fig. 2.8 ). The type of RNA produced by the transcription process is messenger RNA (mRNA). To initiate mRNA transcription, one of the RNA polymerase enzymes (RNA polymerase II) binds to a promoter site on the DNA (a promoter is a nucleotide sequence that lies just upstream of a gene). The RNA polymerase then pulls a portion of the DNA strands apart from each other, exposing unattached DNA bases. One of the two DNA strands provides the template for the sequence of mRNA nucleotides. Although either DNA strand could in principle serve as the template for mRNA synthesis, only one is chosen to do so in a given region of the chromosome. This choice is determined by the promoter sequence, which orients the RNA polymerase in a specific direction along the DNA sequence. Because the mRNA molecule can be synthesized only in the 5′ to 3′ direction, the promoter, by specifying directionality, determines which DNA strand serves as the template. This template DNA strand is also known as the antisense strand. RNA polymerase moves in the 3′ to 5′ direction along the DNA template strand, assembling the complementary mRNA strand from 5′ to 3′ (see Fig. 2.8 ). Because of complementary base pairing, the mRNA nucleotide sequence is identical to that of the DNA strand that does not serve as the template—the sense strand —except for the substitution of uracil for thymine.
Soon after RNA synthesis begins, the 5′ end of the growing RNA molecule is capped by the addition of a chemically modified guanine nucleotide. This 5 ′ cap appears to help prevent the RNA molecule from being degraded during synthesis, and later it helps to indicate the starting position for translation of the mRNA molecule into protein. Transcription continues until a group of bases called a termination sequence is reached. Near this point, a series of 100 to 200 adenine bases are added to the 3′ end of the RNA molecule. This structure, known as the poly-A tail, may be involved in stabilizing the mRNA molecule so that it is not degraded when it reaches the cytoplasm. RNA polymerase usually continues to transcribe DNA for several thousand additional bases, but the mRNA bases that are attached after the poly-A tail are eventually lost. Finally, the DNA strands and the RNA polymerase separate from the RNA strand, leaving a transcribed single mRNA strand. This mRNA molecule is termed the primary transcript.
In some human genes, such as the one that can cause Duchenne muscular dystrophy, several different promoters exist and are located in different parts of the gene. Thus transcription of the gene can start in different places, resulting in the production of somewhat different proteins. This allows the same gene sequence to code for variations of a protein in different tissues (e.g., muscle tissue versus brain tissue).
In the process of transcription, RNA polymerase II recognizes a promoter site near the 5′ end of a gene on the sense strand and, through complementary base pairing, helps to produce an mRNA strand from the antisense DNA strand.
Transcription and the Regulation of Gene Expression
Some genes are transcribed in all cells of the body. These housekeeping genes encode products that are required for a cell’s maintenance and metabolism. Most genes, however, are transcribed only in specific tissues at specific points in time. Therefore in most cells only a small fraction of genes are actively transcribed. This specificity explains why there is a large variety of different cell types making different protein products, even though almost all cells have exactly the same DNA sequence. For example, the globin genes are transcribed in the progenitors of red blood cells (where they help to form hemoglobin), and the low-density lipoprotein receptor genes are transcribed in liver cells.
Many different proteins participate in the process of transcription. Some of these are required for the transcription of all genes, and these are termed general transcription factors. Others, labeled specific transcription factors, have more specialized roles, activating only certain genes at certain stages of development. A key transcriptional element is RNA polymerase II, which was described previously. Although this enzyme plays a vital role in initiating transcription by binding to the promoter region, it cannot locate the promoter region on its own. Furthermore, it is incapable of producing significant quantities of mRNA by itself. Effective transcription requires the interaction of a large complex of approximately 50 different proteins. These include general (basal) transcription factors, which bind to RNA polymerase and to specific DNA sequences in the promoter region (sequences such as TATA and others needed for initiating transcription). The general transcription factors allow RNA polymerase to bind to the promoter region so that it can function effectively in transcription ( Fig. 2.9 ).
The transcriptional activity of specific genes can be greatly increased by interaction with sequences called enhancers, which may be located thousands of bases upstream or downstream of the gene. Enhancers do not interact directly with genes. Instead, they are bound by a class of specific transcription factors that are termed activators. Activators bind to a second class of specific transcription factors called coactivators, which in turn bind to the general transcription factor complex described previously (see Fig. 2.9 ). This chain of interactions, from enhancer to activator to coactivator to the general transcription complex and finally to the gene itself, increases the transcription of specific genes at specific points in time. Whereas enhancers help to increase the transcriptional activity of genes, other DNA sequences, known as silencers, help to repress the transcription of genes through a similar series of interactions.
Mutations in enhancer, silencer, or promoter sequences, as well as mutations in the genes that encode transcription factors, can lead to faulty expression of vital genes and consequently to genetic disease. Examples of such diseases are discussed in the following chapters.
Transcription factors are required for the transcription of DNA to mRNA. General transcription factors are used by all genes, and specific transcription factors help to initiate the transcription of genes in specific cell types at specific points in time. Transcription is also regulated by enhancer and silencer sequences, which may be located thousands of bases away from the transcribed gene.
The large number and complexity of transcription factors allow fine-tuned regulation of gene expression. But how do the transcription factors locate specific DNA sequences? This is achieved by DNA-binding motifs: configurations in the transcription-factor protein that allow it to fit snugly and stably into a unique portion of the DNA double helix. Several examples of these binding motifs are listed in Table 2.1 , and Fig. 2.10 illustrates the binding of one such motif to DNA. Each major motif contains many variations that allow specificity in DNA binding.
Motif | Description | Human Disease Examples |
---|---|---|
Helix–turn–helix | Two α helices are connected by a short chain of amino acids, which constitute the turn. The carboxyl-terminal helix is a recognition helix that binds to the DNA major groove. | Homeodomain proteins (HOX): mutations in human HOXD13 and HOXA13 cause synpolydactyly and hand–foot–genital syndrome, respectively. |
Helix–loop–helix | Two α helices (one short and one long) are connected by a flexible loop. The loop allows the two helices to fold back and interact with one another. The helices can bind to DNA or to other helix–loop–helix structures. | Mutations in the TWIST gene cause Saethre–Chotzen syndrome (acrocephalosyndactyly type III) |
Zinc finger | Zinc molecules are used to stabilize amino acid structures (e.g., α helices, β sheets), with binding of the α helix to the DNA major groove. | BRCA1 (breast cancer gene); WT1 (Wilms tumor gene); GL13 (Greig syndrome gene); vitamin D receptor gene (mutations cause rickets) |
Leucine zipper | Two leucine-rich α helices are held together by amino acid side chains. The α helices form a Y-shaped structure whose side chains bind to the DNA major groove. | RB1 (retinoblastoma gene); JUN and FOS oncogenes |
β Sheets | Side chains extend from the two-stranded β sheet to form contacts with the DNA helix. | TBX family of genes: TBX5 (Holt–Oram syndrome); TBX3 (ulnar–mammary syndrome) |
An intriguing type of DNA-binding motif is contained in the high-mobility group (HMG) class of proteins. These proteins are capable of bending DNA and can facilitate interactions between distantly located enhancers and the appropriate basal factors and promoters (see Fig. 2.9 ).
Transcription factors contain DNA-binding motifs that allow them to interact with specific DNA sequences. In some cases, they bend DNA so that distant enhancer sequences can interact with target genes.
Gene activity can be related to patterns of chromatin coiling or condensation ( chromatin is the combination of DNA and the histone proteins around which the DNA is wound). Decondensed or open chromatin regions, termed euchromatin, are typically characterized by histone acetylation, the attachment of acetyl groups to lysine residues in the histones. Acetylation of histones reduces their binding to DNA, helping to decondense the chromatin so that it is more accessible to transcription factors. Euchromatin is thus transcriptionally active. In contrast, heterochromatin is usually less acetylated, more condensed, and transcriptionally inactive.
Transcriptional silencing is also associated with methylation of promoter regions, in which methyl groups are attached to the DNA molecule. Methylation renders promoters less accessible to transcription factors. Factors such as methylation and histone modification, which alter the expression of genes but do not change the DNA sequence itself, are examples of epigenetic (“over” genetics) modification. Epigenetics, which plays key roles in development and cancer, is discussed further in Chapter 11, Chapter 8 .
Gene expression can also be influenced by microRNAs (miRNA), which are small RNA molecules (17–27 nucleotides) that are not translated into proteins but can bind to and downregulate mRNA. MicroRNAs, which are considered another form of epigenetic modifier, have been found to play important roles in gene regulation and in a number of types of cancer (see Chapter 11 ).
Heterochromatin, which is highly condensed and hypoacetylated, tends to be transcriptionally inactive, whereas euchromatin, which is acetylated and less condensed, tends to be transcriptionally active. Transcription is also regulated by methylation and by the attachment of microRNAs to mRNA. These factors are all examples of epigenetic modification.