Humans display a substantial amount of genetic variation. This is reflected in traits such as height, blood pressure, and skin color. Included in the spectrum of genetic variation are disease states, such as cystic fibrosis or type 1 neurofibromatosis (see Chapter 4 ). This aspect of genetic variation is the focus of medical genetics.
All genetic variation originates from the process known as mutation, which is defined as a change in DNA sequence. Mutations can affect either germline cells (cells that produce gametes) or somatic cells (all cells other than germline cells). Mutations in somatic cells can lead to cancer and are thus of significant concern. However, this chapter is focused primarily on germline mutations, because they can be transmitted from one generation to the next.
As a result of mutations, a gene can differ among individuals in terms of its DNA sequence. The differing sequences are referred to as alleles. A gene’s location on a chromosome is termed a locus (from the Latin word for “place”). For example, it might be said that a person has a certain allele at the β-globin locus on chromosome 11. If a person has the same allele on both members of a chromosome pair, they are said to be a homozygote. If the alleles differ in DNA sequence, they are a heterozygote. The combination of alleles that is present at a given locus is termed the genotype.
In human genetics, the term mutation has often been reserved for DNA sequence changes that cause genetic diseases and are consequently relatively rare, with a population frequency less than 1%. DNA sequence variants that are more common in populations are conventionally described as polymorphisms (“many forms,” describing multiple alleles at a locus). Loci (plural of locus) that contain multiple alleles are termed polymorphic. Nowadays, however, alleles that have a frequency less than 1% are sometimes called polymorphisms as well. In addition, many common polymorphisms are now known to influence the risks for complex, common diseases such as diabetes and heart disease (see Chapter 12 ), so the traditional distinction between mutation (rare and disease-causing) and polymorphism (common and benign) has become increasingly blurred.
One of Gregor Mendel’s important contributions to genetics was to show that the effects of one allele at a locus can mask those of another allele at the same locus. He performed crosses (matings) between pea plants homozygous for a “tall” allele (i.e., having two identical copies of an allele that we will label H ) and plants homozygous for a “short” allele (having two copies of an allele labeled h ). This cross, which can produce only heterozygous (Hh) offspring, is illustrated in the Punnett square shown in Fig. 3.1 . Mendel found that the offspring of these crosses were all tall, even though they were heterozygotes. This is because the H allele is dominant, and the h allele is recessive. (It is conventional to label the dominant allele in upper case and the recessive allele in lower case.) The term recessive comes from a Latin root meaning “to hide.” This describes the behavior of recessive alleles well: in heterozygotes, the consequences of a recessive allele are hidden. A dominant allele exerts its effect in both the homozygote (HH) and the heterozygote (Hh), whereas the presence of the recessive allele is physically observable only when it occurs in homozygous form (hh). Thus short pea plants can be created only by crossing parent plants that each carry at least one h allele. An example is a heterozygote × heterozygote cross, shown in Fig. 3.2 .
In this chapter we examine mutation as the source of genetic variation. We discuss the types of mutation, the causes and consequences of mutation, and the biochemical and molecular techniques that are now used to detect genetic variation in human populations.
Mutation: The Source of Genetic Variation
Types of Mutation
Some mutations consist of an alteration of the number or structure of chromosomes in a cell. These major chromosome abnormalities can be observed microscopically and are the subject of Chapter 6 . Here the focus is on mutations that affect only single genes and are not microscopically observable. Most of our discussion centers on mutations that take place in coding DNA or in regulatory sequences, because mutations that occur in other parts of the genome usually have no clinical consequences.
One important type of single-gene mutation is the base-pair substitution, in which one base pair is replaced by another. ∗ This can result in a change in the amino acid sequence. However, because of the redundancy of the genetic code, many of these mutations do not change the amino acid sequence and thus typically have no effect. Such mutations are called silent substitutions. Base-pair substitutions that alter amino acids consist of two basic types: missense mutations, which produce a change in a single amino acid, and nonsense mutations, which produce one of the three stop codons (UAA, UAG, or UGA) in the messenger RNA (mRNA) ( Fig. 3.3 ). Because stop codons terminate translation of the mRNA, a premature nonsense mutation, also termed a stop-gain , results in early termination of the polypeptide chain or in destruction of the transcript through a process known as nonsense-mediated mRNA decay . Conversely, if a stop codon is altered so that it encodes an amino acid (a stop-loss ), an abnormally elongated polypeptide can be produced. Alterations of amino acid sequences can have profound consequences, and many of the serious genetic diseases discussed later are the result of such alterations.
∗ In molecular genetics, base-pair substitutions are also termed point mutations. However, “point mutation” was used in classical genetics to denote any mutation small enough to be unobservable under a microscope.
A second major type of mutation consists of deletions or insertions of one or more base pairs. These mutations, which can result in extra or missing amino acids in a protein, are often detrimental. An example of such a mutation is the 3-bp deletion that is found in most persons with cystic fibrosis (see Chapter 4 ). Deletions and insertions tend to be especially harmful when the number of missing or extra base pairs is not a multiple of three. Because codons consist of groups of three base pairs, such insertions or deletions can alter the downstream codons. This is a frameshift mutation ( Fig. 3.4 ). For example, the insertion of a single base (an A in the second codon) converts a DNA sequence read as 5′-ACT GAT TGC GTT-3′ to 5′-ACT GAA TTG CGT-3′. This changes the amino acid sequence from Thr-Asp-Cys-Val to Thr-Glu-Leu-Arg. Often a frameshift mutation produces a stop codon downstream of the insertion or deletion, resulting in a truncated polypeptide.
On a larger scale, duplications of whole genes can also lead to genetic disease. A good example is given by Charcot–Marie–Tooth disease. This disorder, named after the three physicians who described it more than a century ago, is a peripheral nervous system condition that leads to progressive atrophy of the distal limb muscles. It affects approximately 1 in 2500 persons and exists in several different forms. About 70% of patients who have the most common form (type 1A) display a 1.5 million-bp (base pair) duplication on one copy of chromosome 17. As a result, they have three, rather than two, copies of the genes in this region. One of these genes, PMP22, encodes a component of peripheral myelin. The increased dosage of the gene product contributes to the demyelination that is characteristic of this form of the disorder. Interestingly, a deletion of this same region produces a different disease, hereditary neuropathy with liability to pressure palsies (paralysis). Because either a reduction (to 50%) or an increase (to 150%) in the gene product produces disease, this gene is said to display dosage sensitivity. Base-pair substitutions in PMP22 itself can produce yet another disease, Dejerine–Sottas syndrome, which is characterized by distal muscle weakness, sensory alterations, muscular atrophy, and enlarged spinal nerve roots.
Other types of mutation can alter the regulation of transcription or translation. A promoter mutation can decrease the affinity of RNA polymerase for a promoter site, often resulting in reduced production of mRNA and thus decreased production of a protein. Mutations of transcription factor genes or enhancer sequences can have similar effects.
Mutations can also interfere with the splicing of introns as mature mRNA is formed from the primary mRNA transcript. Splice-site mutations, those that occur at intron–exon boundaries, alter the splicing signal that is necessary for proper excision of an intron. Splice-site mutations can occur at the GT sequence that defines the 5′ splice site (the donor site ) or at the AG sequence that defines the 3′ splice site (the acceptor site ). They can also take place in the sequences that lie near the donor and acceptor sites. When such mutations occur, the excision is often made within the next exon, at a splice site located in the exon. These splice sites, whose DNA sequences differ slightly from those of normal splice sites, are ordinarily unused and hidden within the exon. They are thus termed cryptic splice sites. The use of a cryptic site for splicing results in partial deletion of the exon, or in other cases, the deletion of an entire exon. As Fig. 3.5 shows, splice-site mutations can also result in the abnormal inclusion of part or all of an intron in the mature mRNA. Finally, a mutation can occur at a cryptic splice site, causing it to appear as a normal splice site and thus to compete with the normal splice site.
Several types of DNA sequences are capable of propagating copies of themselves; these copies are then inserted in other locations on chromosomes (examples include the LINE and Alu repeats, discussed in Chapter 2 ). Such insertions can cause frameshift mutations. The insertion of mobile elements has been shown to cause isolated cases of type 1 neurofibromatosis, Duchenne muscular dystrophy, β-thalassemia, familial breast cancer, familial polyposis (colon cancer), and hemophilia A and B (clotting disorders) in humans.
The final type of mutation to be considered here affects tandem repeated DNA sequences (see Chapter 2 ) that occur within or near certain disease-related genes. The repeat units are usually 3 bp long, so a typical example would be CAGCAGCAG. A normal person has a relatively small number of these tandem repeats (e.g., 10 to 30 CAG consecutive elements) at a specific chromosome location. Occasionally the number of repeats increases during meiosis or possibly during early fetal development, so that a newborn might have hundreds or even thousands of tandem repeats. When this occurs in certain regions of the genome, it causes genetic disease. Like other mutations, these expanded repeats can be transmitted to the patient’s offspring. More than 20 genetic diseases are now known to be caused by expanded repeats (see Chapter 4, Chapter 5 ).
Mutations are the ultimate source of genetic variation. Some mutations result in genetic disease, but most have no physical effects. The principal types of mutation are missense, nonsense, frameshift, promoter, and splice-site mutations. Mutations can also be caused by the random insertion of mobile elements, and some genetic diseases are known to be caused by expanded repeats.
Molecular Consequences of Mutation
It is useful to think of mutations in terms of their effects on the protein product. Broadly speaking, mutations can produce either a gain of function or a loss of function of the protein product ( Fig. 3.6 ). Gain-of-function mutations occasionally result in a completely novel protein product. More commonly, they result in overexpression of the product or inappropriate expression (i.e., in the wrong tissue or in the wrong stage of development). Gain-of-function mutations produce dominant disorders. Charcot–Marie–Tooth disease can result from overexpression of the protein product, which is considered a gain-of-function mutation. Huntington disease, discussed in Chapter 4 , is another example.
Loss-of-function mutations are often seen in recessive diseases, where the mutation results in the loss of 50% of the protein product (e.g., a metabolic enzyme), but the 50% that remains is sufficient for normal function. The heterozygote is thus unaffected, but the homozygote, having little or no protein product, is affected. In some cases, however, 50% of the gene’s protein product is not sufficient for normal function (haploinsufficiency), and a dominant disorder can result. Haploinsufficiency is seen, for example, in the autosomal dominant disorder known as familial hypercholesterolemia (see Chapter 12 ). In this disease, a single copy of a mutation (heterozygosity) reduces the number of low-density lipoprotein (LDL) receptors by 50%. Cholesterol levels in heterozygotes are approximately double those of normal homozygotes, resulting in a substantial increase in the risk of heart disease. As with most disorders involving haploinsufficiency, the disease is more serious in affected homozygotes (who have few or no functional LDL receptors) than in heterozygotes.
A dominant negative mutation results in a protein product that not only is abnormal but also inhibits the function of the protein produced by the normal allele in the heterozygote. Typically, dominant negative mutations are seen in genes that encode multimeric proteins (i.e., proteins composed of two or more subunits). Type I collagen (see Chapter 2 ), which is composed of three helical subunits, is an example of such a protein. An abnormal helix created by a single mutation can combine with the other helices, distorting them and producing a seriously compromised triple-helix protein.
Mutations can result in either a gain of function or a loss of function of the protein product. Gain-of-function mutations are sometimes seen in dominant diseases. Loss of function is seen in recessive diseases and in diseases involving haploinsufficiency, in which 50% of the gene product is insufficient for normal function. In dominant negative mutations, the abnormal protein product interferes with the normal protein product.
Clinical Consequences of Mutation: The Hemoglobin Disorders
Genetic disorders of human hemoglobin are the most common group of single-gene diseases; an estimated 7% of the world’s population carries one or more mutations of the genes involved in hemoglobin synthesis. Because almost all of the types of mutation described in this chapter have been observed in the hemoglobin disorders, these diseases serve as an important illustration of the clinical consequences of mutation.
The hemoglobin molecule is a tetramer composed of four polypeptide chains, two labeled α and two labeled β. The β chains are encoded by a gene on chromosome 11, and the α chains are encoded by two genes on chromosome 16 that are very similar to each other. Typically, an individual has two normal β genes and four normal α genes ( Fig. 3.7 ). Ordinarily tight regulation of these genes ensures that roughly equal numbers of α and β chains are produced. Each of these globin chains is associated with a heme group, which contains an iron atom and binds with oxygen. This property allows hemoglobin to perform the vital function of transporting oxygen in erythrocytes (red blood cells).
The hemoglobin disorders can be classified into two broad categories: structural abnormalities, in which the hemoglobin molecule is altered, and thalassemias, a group of conditions in which either the α- or the β-globin chain is structurally normal but reduced in quantity. Another condition, hereditary persistence of fetal hemoglobin (HPFH), occurs when fetal hemoglobin, encoded by the α-globin genes and by two β-globin–like genes called A γ and G γ (see Fig. 3.7 ), continues to be produced after birth (normally γ-chain production ceases and β-chain production begins at the time of birth). HPFH does not cause disease but instead can compensate for a lack of normal adult hemoglobin.
A large array of different hemoglobin disorders have been identified. The discussion that follows is a greatly simplified presentation of the major forms of these disorders. The hemoglobin disorders, the mutations that cause them, and their major features are summarized in Table 3.1 .
|Disease||Mutation Type||Major Disease Features|
|Sickle cell disease||β-globin missense mutation||Anemia, tissue infarctions, infections|
|HbH disease||Deletion or abnormality of three of the four α-globin genes||Moderately severe anemia, splenomegaly|
|Hydrops fetalis (Hb Barts)||Deletion or abnormality of all four α-globin genes||Severe anemia or hypoxemia, congestive heart failure; stillbirth or neonatal death|
|β 0 -Thalassemia||Usually nonsense, frameshift, or splice-site donor or acceptor mutations; no β-globin produced||Severe anemia, splenomegaly, skeletal abnormalities, infections; often fatal during first decade if untreated|
|β + -Thalassemia||Usually missense, regulatory, or splice-site consensus sequence or cryptic splice-site mutations; small amount of β-globin produced||Features similar to those of β 0 -thalassemia but often somewhat milder|
Sickle Cell Disease
Sickle cell disease, which results from an abnormality of hemoglobin structure, is seen in approximately 1 in 400 to 1 in 600 African American births. It is even more common in parts of Africa, where it can affect up to 1 in 50 births, and it is also seen in Mediterranean, Middle Eastern, and South Asian populations. Sickle cell disease is typically caused by a single missense mutation that affects a substitution of valine for glutamic acid at position 6 of the β-globin polypeptide chain ( Fig. 3.8 ). In homozygotes, this amino acid substitution alters the structure of hemoglobin molecules such that they form aggregates, causing erythrocytes to assume a characteristic sickle shape under conditions of low oxygen tension (see Fig. 3.8 , A ). These conditions are experienced in capillaries, the tiny vessels whose diameter is smaller than that of the erythrocyte. Normal erythrocytes (see Fig. 3.8 , B ) can squeeze through capillaries, but sickled erythrocytes are less flexible and are unable to do so. In addition, the abnormal erythrocytes tend to stick to the vascular endothelium (the innermost lining of blood vessels).
The resultant vascular obstruction produces localized hypoxemia (lack of oxygen), painful sickling crises, and infarctions of various tissues, including bone, spleen, kidneys, brain, and lungs (an infarction is tissue death due to hypoxemia). Premature destruction of the sickled erythrocytes decreases the number of circulating erythrocytes and the hemoglobin level, producing anemia. The spleen becomes enlarged (splenomegaly), but infarctions eventually destroy this organ, producing some loss of immune function. This contributes to the recurrent and sometimes fatal bacterial infections (especially pneumonia) that are commonly seen in persons with sickle cell disease. About 10% of persons with sickle cell disease experience a stroke before age 20 years. In North America, it is estimated that the life expectancy of persons with sickle cell disease is reduced by about 30 years.
Sickle cell disease, which causes anemia, tissue infarctions, and multiple infections, is the result of a single missense mutation that produces an amino acid substitution in the β-globin chain.
The term thalassemia is derived from the Greek word thalassa (“sea”). Thalassemia was first described in populations living near the Mediterranean Sea, although it is also common in portions of Africa, the Mideast, India, and Southeast Asia. In contrast to sickle cell disease, in which a mutation alters the structure of the hemoglobin molecule, the mutations that cause thalassemia reduce the quantity of either α globin or β globin. Thalassemia can be divided into two major groups, α-thalassemia and β-thalassemia, depending on the globin chain that is reduced in quantity. When one type of chain is decreased in number, the other chain type, unable to participate in normal tetramer formation, tends to form molecules consisting of four chains of the excess type only. These are termed homotetramers, in contrast to the heterotetramers normally formed by α and β chains. In α-thalassemia, the α-globin chains are deficient, so the β chains (or γ chains in the fetus) are found in excess. They form homotetramers that have a greatly reduced oxygen-binding capacity, producing hypoxemia. In β-thalassemia, the excess α chains form homotetramers that precipitate and damage the cell membranes of red blood cell precursors (i.e., the cells that form erythrocytes). This leads to premature erythrocyte destruction and anemia.
Most cases of α-thalassemia are caused by deletions of the α-globin genes. The loss of one or two of these genes has no clinical effect. The loss or abnormality of three of the α genes produces moderately severe anemia and splenomegaly (hemoglobin H [HbH] disease). Loss of all four α genes, a condition seen primarily among Southeast Asians, produces hypoxemia in the fetus and hydrops fetalis (a condition in which there is a massive buildup of fluid). Severe hydrops fetalis often causes stillbirth or neonatal death.
The α-thalassemia conditions are usually caused by deletions of α-globin genes. The loss of three of these genes leads to moderately severe anemia, and the loss of all four is fatal.
Persons with a β-globin mutation in one copy of chromosome 11 (heterozygotes) are said to have β-thalassemia minor, a condition that involves little or no anemia and does not ordinarily require clinical management. Those in whom both copies of the chromosome carry a β-globin mutation develop either β-thalassemia major (also called Cooley’s anemia) or a less serious condition, β-thalassemia intermedia. Beta-globin may be completely absent (β 0 -thalassemia), or it may be reduced to about 10% to 30% of the normal amount (β + -thalassemia). Typically, β 0 -thalassemia produces a more severe disease phenotype, but because disease features are caused by an excess of α-globin chains, patients with β 0 -thalassemia are less severely affected when they also have α-globin mutations that reduce the quantity of α-globin chains.
Beta-globin is not produced until after birth, so the effects of β-thalassemia major are not seen clinically until the age of 2 to 6 months. These patients develop severe anemia. If the condition is left untreated, substantial growth retardation can occur. The anemia causes bone marrow expansion, which in turn produces skeletal changes, including a protuberant upper jaw and cheekbones and thinning of the long bones (making them susceptible to fracture). Splenomegaly ( Fig. 3.9 ) and infections are common, and patients with untreated β-thalassemia major often die during the first decade of life. Beta-thalassemia can vary considerably in severity, depending on the precise nature of the responsible mutation.
In contrast to α-thalassemia, gene deletions are relatively rare in β-thalassemia. Instead, most cases are caused by single-base mutations. Nonsense mutations, which result in premature termination of translation of the β-globin chain, usually produce β 0 -thalassemia. Frameshift mutations also typically produce the β 0 form. In addition to mutations in the β-globin gene itself, alterations in regulatory sequences can occur. Beta-globin transcription is regulated by a promoter, two enhancers, and an upstream region known as the locus control region (LCR) (see Fig. 3.7 ). Mutations in these regulatory regions usually result in reduced synthesis of mRNA and a reduction, but not complete absence, of β-globin (β + -thalassemia). Several types of splice-site mutations have also been observed. If a point mutation occurs at a donor or acceptor site, normal splicing is destroyed completely, producing β 0 -thalassemia. Mutations in the surrounding consensus sequences usually produce β + -thalassemia. Mutations also occur in the cryptic splice sites found in introns or exons of the β-globin gene, which cause these sites to be available to the splicing mechanism. These additional splice sites then compete with the normal splice sites, producing some normal and some abnormal β-globin chains. The result is usually β + -thalassemia.
Many different types of mutations can produce β-thalassemia. Nonsense, frameshift, and splice-site donor and acceptor mutations tend to produce more severe disease. Regulatory mutations and those involving splice-site consensus sequences and cryptic splice sites tend to produce less severe disease.
Hundreds of different β-globin mutations have been reported. Consequently most patients with β-thalassemia are not homozygotes in the strict sense; they usually have a different β-globin mutation on each copy of chromosome 11 and are termed compound heterozygotes ( Fig. 3.10 ). Even though the mutations differ, each of the two β-globin genes is altered, producing a disease state. It is common to apply the term homozygote loosely to compound heterozygotes.
Patients with sickle cell disease or β-thalassemia major are sometimes treated with blood transfusions and with chelating agents that remove excess iron introduced by the transfusions. Prophylactic administration of antibiotics and antipneumococcal vaccine help to prevent bacterial infections in patients with sickle cell disease, and analgesics are administered for pain relief during sickling crises. Bone marrow transplantation, which provides donor stem cells that produce genetically normal erythrocytes, has been performed on patients with severe β-thalassemia or sickle cell disease. However, it is often impossible to find a suitably matched donor, and the mortality rate from this procedure is still fairly high (approximately 5% to 30%, depending on the severity of disease and the age of the patient). A lack of normal adult β-globin can be compensated for by reactivating the genes that encode fetal β-globin (the γ-globin genes, discussed previously). Agents such as hydroxycarbamide and butyrate can reactivate these genes and are being investigated. In addition, the transcription factor encoded by BCL11A , which ordinarily silences γ-globin expression after birth, is being explored as a target of drug and/or gene therapy for sickle cell disease. Beta-thalassemia is also a strong candidate for gene therapy (see Chapter 13 ).
Causes of Mutation
A large number of agents are known to cause induced mutations. These mutations, which are attributed to known environmental causes, can be contrasted with spontaneous mutations, which arise naturally in cells, for example during DNA replication. Agents that cause induced mutations are known collectively as mutagens. Animal studies have shown that radiation is an important class of mutagen ( Clinical Commentary 3.1 ). Ionizing radiation, such as that produced by x-rays and nuclear fallout, can eject electrons from atoms, forming electrically charged ions. When these ions are situated within or near the DNA molecule, they can promote chemical reactions that change DNA bases. Ionizing radiation can also break the bonds of double-stranded DNA. This form of radiation can reach all cells of the body, including the germline cells.
Because mutation is a rare event, it is difficult to measure directly in humans. The relationship between radiation exposure and mutation is similarly difficult to assess. For a person living in a developed country, a typical lifetime exposure to ionizing radiation is about 6 to 7 rem. ∗ About one-third to one-half of this amount is thought to originate from medical and dental x-ray procedures.
∗ A rem is a traditional unit for measuring radiation exposure. It is roughly equal to 0.01 joule of absorbed energy per kilogram of tissue. Parental exposures have also been estimated in Gray (Gy), which is another measure of the deposition of energy in tissue. In the minisatellite mutation study, the average gonadal dosage among “exposed” parents was 1.47 Gy, which is many times higher than lifetime exposure in the general population.
Unfortunately, a few human populations have received much larger radiation doses. The most thoroughly studied such population is the survivors of the atomic bomb blasts that occurred in Hiroshima and Nagasaki, Japan, at the close of World War II. Many of those who were exposed to high doses of radiation died from radiation sickness. Others survived, and many of the survivors produced offspring.
To study the effects of radiation exposure in this population, a large team of Japanese and American scientists conducted medical and genetic investigations of some of the survivors. A significant number developed cancers and chromosome abnormalities in their somatic cells, probably as a consequence of radiation exposure. To assess the effects of radiation exposure on the subjects’ germlines, the scientists compared the offspring of those who suffered substantial radiation exposure with the offspring of those who did not. Although it is difficult to establish radiation doses with precision, there is no doubt that, in general, those who were situated closer to the blasts suffered much higher exposure levels. It is estimated that the exposed group received roughly 30 to 60 rem of radiation, many times the average lifetime radiation exposure.
In a sample of 77,000 offspring of these survivors, researchers assessed a large number of factors, including stillbirths, chromosome abnormalities, birth defects, cancer before 20 years of age, death before 26 years of age, and various measures of growth and development (e.g., intelligence quotient). There were no statistically significant differences between the offspring of persons who were exposed to radiation and the offspring of those who were not exposed. In addition, direct genetic studies of mutations have been carried out using minisatellite polymorphisms and protein electrophoresis, a technique that detects mutations that lead to amino acid changes (discussed elsewhere in this chapter). Parents and offspring were compared to determine whether germline mutations had occurred at various loci. The numbers of mutations detected in the exposed and unexposed groups were statistically equivalent.
More recently, studies of those who were exposed to radiation from the Chernobyl nuclear power plant accident have demonstrated a significant increase in thyroid cancers among children exposed to radiation. This reflects the effects of somatic mutations. The evidence for increased frequencies of germline mutations in protein-coding DNA, however, remains unclear. A number of other studies of the effects of radiation on humans have been reported, including investigations of those who live near nuclear power plants. The radiation doses received by these persons are substantially smaller than those of the populations discussed previously, and the results of these studies are equivocal.
It is remarkable that even though there was substantial evidence for radiation effects on somatic cells in the Hiroshima and Nagasaki studies, no detectable effect could be seen for germline cells. What could account for this? Because large doses of radiation are lethal, many of those who would have been most strongly affected would not be included in these studies. Furthermore, because germline mutation rates are very small, even relatively large samples of radiation-exposed persons may be insufficient to detect increases in mutation rates. It is also possible that DNA repair compensated for some radiation-induced germline damage.
These results argue that radiation exposure, which is clearly associated with somatic mutations, should not be taken lightly. Above-ground nuclear testing in the American Southwest has produced increased rates of leukemia and thyroid cancer in a segment of the population. Radon, a radioactive gas that is produced by the decay of naturally occurring uranium, can be found at dangerously high levels in some homes and poses a risk for lung cancer. Any unnecessary exposure to radiation, particularly to the gonads or to developing fetuses, should be avoided.
Nonionizing radiation does not form charged ions but can move electrons from inner to outer orbits within an atom. The atom becomes chemically unstable. Ultraviolet (UV) radiation, which occurs naturally in sunlight, is an example of nonionizing radiation. UV radiation causes the formation of covalent bonds between adjacent pyrimidine bases (i.e., cytosine or thymine). These pyrimidine dimers (a dimer is a molecule having two subunits) are unable to pair properly with purines during DNA replication; this results in a base-pair substitution ( Fig. 3.11 ). Because UV radiation is absorbed by the skin, it does not reach the germline, but it can cause skin cancer ( Clinical Commentary 3.2 ).
An inevitable consequence of exposure to UV radiation is the formation of potentially dangerous pyrimidine dimers in the DNA of skin cells. Fortunately the highly efficient nucleotide excision repair (NER) system removes these dimers in normal persons. Among those affected with the rare autosomal recessive disease xeroderma pigmentosum (XP) this system does not work properly, and the resulting DNA replication errors lead to base-pair substitutions in skin cells. XP varies substantially in severity, but early symptoms are usually seen in the first 1 to 2 years of life. Patients develop dry, scaly skin (xeroderma) along with extensive freckling and abnormal skin pigmentation (pigmentosum). Skin tumors, which can be numerous, typically appear by 10 years of age. It is estimated that the risk of skin tumors in persons with XP is elevated approximately 1000-fold. These cancers are concentrated primarily in sun-exposed parts of the body. Patients are advised to avoid sources of UV light (e.g., sunlight), and cancerous growths are removed surgically. Neurological abnormalities are seen in about 30% of persons with XP. Severe, potentially lethal malignancies can occur before 20 years of age.
The NER system is encoded by at least 28 different genes, and inherited mutations in any of 7 of these genes can give rise to XP. These genes encode helicases that unwind the double-stranded DNA helix: an endonuclease that cuts the DNA at the site of the dimer; an exonuclease that removes the dimer and nearby nucleotides; a polymerase that fills the gap with DNA bases (using the complementary DNA strand as a template); and a ligase that rejoins the corrected portion of DNA to the original strand.
It should be emphasized that the expression of XP requires inherited germline mutations of an NER gene as well as subsequent uncorrected somatic mutations of genes in skin cells. Some of these somatic mutations can affect genes that promote cancer (see Chapter 11 ), resulting in tumor formation. The skin-cell mutations themselves are somatic and thus are not transmitted to future generations.
NER is but one type of DNA repair. The table below provides examples of a number of other diseases that result from defects in various types of DNA repair mechanisms ( Fig. 3.12 ) ( Table 3.2 ).
A variety of chemicals can also induce mutations, sometimes because of their chemical similarity to DNA bases. Because of this similarity, these base analogs, such as 5-bromouracil, can be substituted for a true DNA base during replication. The analog is not exactly the same as the base it replaces, so it can cause pairing errors during subsequent replications. Other chemical mutagens, such as acridine dyes, can physically insert themselves between existing bases, distorting the DNA helix and causing frameshift mutations. Still other mutagens can directly alter DNA bases, causing replication errors. An example of the latter is nitrous acid, which removes an amino group from cytosine, converting it to uracil. Although uracil is normally found in RNA, it mimics the pairing action of thymine in DNA. Thus it pairs with adenine instead of guanine, as the original cytosine would have done. The end result is a base-pair substitution.
Hundreds of chemicals are now known to be mutagenic in laboratory animals. Among these are nitrogen mustard, vinyl chloride, alkylating agents, formaldehyde, sodium nitrite, and saccharin. Some of these chemicals are much more potent mutagens than others. Nitrogen mustard, for example, is a powerful mutagen, whereas saccharin is a relatively weak one. Although some mutagenic chemicals are produced by humans, many occur naturally in the environment (e.g., aflatoxin B 1 , a common contaminant of foods).
Many substances in the environment are known to be mutagenic, including ionizing and nonionizing radiation and hundreds of different chemicals. These mutagens are capable of causing base substitutions, deletions, and frameshifts. Ionizing radiation can induce double-stranded DNA breaks. Some mutagens occur naturally, and others are generated by humans.
Considering that 3 billion DNA base pairs must be replicated in each cell division, and considering the large number of mutagens to which we are exposed, DNA replication is surprisingly accurate. A primary reason for this accuracy is the process of DNA repair, which takes place in all normal cells of higher organisms. Several dozen enzymes are involved in the repair of damaged DNA. They collectively recognize an altered base, excise it by cutting the DNA strand, replace it with the correct base (determined from the complementary strand), and reseal the DNA. These repair mechanisms are estimated to correct at least 99.9% of initial errors.
Because DNA repair is essential for the accurate replication of DNA, defects in DNA repair systems can lead to many types of disease. For example, inherited mutations in genes responsible for DNA mismatch repair result in the persistence of cells with replication errors (i.e., mismatches ) and can lead to some types of cancer (see Chapter 11 ). A diminished capacity to repair double-stranded DNA breaks can lead to ovarian and/or breast cancer. Nucleotide excision repair is necessary for the removal of larger changes in the DNA helix (e.g., pyrimidine dimers); defects in excision repair lead to a number of diseases, of which xeroderma pigmentosum is an important example (see Clinical Commentary 3.2 ).
DNA repair helps to ensure the accuracy of the DNA sequence by correcting replication errors (mismatches), repairing double-stranded DNA breaks, and excising damaged nucleotides.
How often do spontaneous mutations occur? At the nucleotide level, the mutation rate is estimated to be about 1.3 x 10 −8 per base pair per generation (this figure represents mutations that have escaped the process of DNA repair). Thus each gamete contains approximately 35 new mutations, the great majority of which occur in noncoding DNA. At the level of the gene, the mutation rate is quite variable, ranging from 10 −4 to 10 −7 per locus per cell division. There are at least two reasons for this large range of variation: the size of the gene and the susceptibility of certain nucleotide sequences.
First, genes vary tremendously in size. The somatostatin gene, for example, is quite small, containing 1480 bp. In contrast, the gene responsible for Duchenne muscular dystrophy (DMD) spans more than 2 million bp. As might be expected, larger genes present larger targets for mutation and usually experience mutation more often than do smaller genes. The DMD gene, as well as the genes responsible for hemophilia A and type 1 neurofibromatosis, are all very large and have high mutation rates.
Second, it is well established that certain nucleotide sequences are especially susceptible to mutation. These are termed mutation hot spots. The best-known example is the two-base (dinucleotide) sequence CG. In mammals, about 80% of CG dinucleotides are methylated; a methyl group is attached to the cytosine base (these dinucleotide sequences are also labeled CpG [cytosine-phosphate-guanine], to distinguish the two-base DNA sequence from a single pair of complementary bases, C and G). A methylated cytosine, 5-methylcytosine, easily loses an amino group, converting it to thymine. The end result is a mutation from cytosine to thymine ( Fig. 3.13 ). Surveys of mutations in human genetic diseases have shown that the mutation rate at CG dinucleotides is about 12 times higher than at other dinucleotide sequences. Mutation hot spots, in the form of CG dinucleotides, have been identified in a number of important human disease genes, including the procollagen genes responsible for osteogenesis imperfecta (see Chapter 2 ). Other disease examples are discussed in Chapter 4, Chapter 5 .
Parental age is strongly correlated with the probability of transmitting a mutation to one’s offspring. Some chromosome abnormalities increase dramatically with maternal age (see Chapter 6 ), and single-base mutations increase with paternal age. The latter is seen in several single-gene disorders, including Marfan syndrome and achondroplasia. As Fig. 3.14 shows, the risk of producing a child with Marfan syndrome is several times higher for a father older than 40 years than for a father in his 20s. This paternal age effect is usually attributed to the fact that the stem cells giving rise to sperm cells continue to divide throughout life, which allows a progressive buildup of DNA replication errors. Recent comparisons of whole genome sequences in parents and offspring estimate that approximately one to two additional mutations are transmitted with each additional year of paternal age. Most studies show a smaller but significant effect of maternal age on the single-base mutation rate, on the order of approximately 0.5 additional mutations per year of maternal age.
Large genes, because of their size, are generally more likely to experience mutations than are small genes. Mutation hot spots, particularly methylated CG dinucleotides, experience elevated mutation rates. For many single-gene disorders there is a substantial increase in mutation risk with advanced paternal age.
Detection and Measurement of Genetic Variation
For centuries humans have been intrigued by the differences that can be seen among individuals. Attention was long focused on observable differences such as skin color or body shape and size. Only in the 20th century did it become possible to examine variation in genes, the consequence of mutations accumulated through time. The evaluation and measurement of this variation in populations and families are important for mapping genes to specific locations on chromosomes, a key step in determining gene function (see Chapter 8 ). The evaluation of genetic variation also provides the basis for genetic diagnosis, and it is highly useful in forensics. In this section, several key approaches to detecting genetic variation in humans are discussed in historical sequence.
Several dozen blood group systems have been defined on the basis of antigens located on the surfaces of erythrocytes. Some are involved in determining whether a person can receive a blood transfusion from a specific donor. Because individuals differ extensively in terms of blood groups, these systems provided an important early means of assessing genetic variation.
Each of the blood group systems is determined by a different gene or set of genes. The various antigens that can be expressed within a system are the result of different DNA sequences in these genes. Two blood-group systems that have special medical significance—the ABO and Rh blood groups—are discussed here. The ABO and Rh systems are both important in determining the compatibility of blood transfusions and tissue grafts. Some combinations of these blood groups can produce maternal-fetal incompatibility, sometimes with serious results for the fetus. These issues are discussed in detail in Chapter 9 .
The ABO Blood Group
Human blood transfusions were carried out as early as 1818, but they were often unsuccessful. After transfusion, some recipients suffered a massive, sometimes fatal, hemolytic reaction. In 1900 Karl Landsteiner discovered that this reaction was caused by the ABO antigens located on erythrocyte surfaces. The ABO system consists of two major antigens, labeled A and B. A person can have one of four major blood types: people with blood type A carry the A antigen on their erythrocytes, those with type B carry the B antigen, those with type AB carry both A and B, and those with type O carry neither antigen. Each individual has antibodies that react against any antigens that are not found on their own red blood cell surfaces. For example, a person with type A blood has anti-B antibodies, and transfusing type B blood into this person provokes a severe antibody reaction. It is straightforward to determine ABO blood type in the laboratory by mixing a small sample of a person’s blood with solutions containing different antibodies and observing which combinations cause the observable clumping that is characteristic of an antibody–antigen interaction.
The ABO system, which is encoded by a single gene on chromosome 9, consists of three primary alleles, labeled I A , I B , and I O . (There are also subtypes of both the I A and I B alleles, but they are not addressed here.) Persons with the I A allele have the A antigen on their erythrocyte surfaces (blood type A), and those with I B have the B antigen on their cell surfaces (blood type B). Those with both alleles express both antigens (blood type AB), and those with two copies of the I O allele have neither antigen (type O blood). Because the I O allele produces no antigen, persons who are I A I O or I B I O heterozygotes have blood types A and B, respectively ( Table 3.3 ).
|Disease||Features||Type of Repair Defect|
|Xeroderma pigmentosum||Skin tumors, photosensitivity, cataracts, neurological abnormalities||Nucleotide excision repair defects, including mutations in helicase and endonuclease genes|
|Cockayne syndrome||Reduced stature, skeletal abnormalities, optic atrophy, deafness, photosensitivity, intellectual disability||Defective repair of UV-induced damage in transcriptionally active DNA; considerable etiological and symptomatic overlap with xeroderma pigmentosum and trichothiodystrophy|
|Fanconi anemia||Anemia; leukemia susceptibility; limb, kidney, and heart malformations; chromosome instability||As many as eight different genes may be involved, but their exact role in DNA repair is not yet known|
|Bloom syndrome||Growth deficiency, immunodeficiency, chromosome instability, increased cancer incidence||Mutations in the reqQ helicase family|
|Werner syndrome||Cataracts, osteoporosis, atherosclerosis, loss of skin elasticity, short stature, diabetes, increased cancer incidence; sometimes described as “premature aging”||Mutations in the reqQ helicase family|
|Ataxia-telangiectasia||Cerebellar ataxia, telangiectases, ∗ immune deficiency, increased cancer incidence, chromosome instability||Normal gene product is likely to be involved in halting the cell cycle after DNA damage occurs|
|Hereditary nonpolyposis colorectal cancer||Proximal bowel tumors, increased susceptibility to several other types of cancer||Mutations in any of six DNA mismatch-repair genes|
∗ Telangiectases are vascular lesions caused by the dilatation of small blood vessels. This typically produces discoloration of the skin.
|Genotype||Blood Type||Antibodies Present|
|I A I A||A||Anti-B|
|I A I O||A||Anti-B|
|I B I B||B||Anti-A|
|I B I O||B||Anti-A|
|I A I B||AB||None|
|I O I O||O||Anti-A and anti-B|
Because populations vary substantially in terms of the frequency with which the ABO alleles occur, the ABO locus was the first blood group system to be used extensively in studies of genetic variation among individuals and populations. For example, early studies showed that the A antigen is relatively common in western European populations, and the B antigen is especially common among Asians. Neither antigen is common among native South American populations, the great majority of whom have blood type O.
The Rh System
Like the ABO system, the Rh system is defined on the basis of antigens that are present on erythrocyte surfaces. This system is named after the rhesus monkey, the experimental animal in which it was first isolated by Landsteiner in the late 1930s. It is typed in the laboratory by a procedure similar to the one described for the ABO system. Rh alleles vary considerably among individuals and populations and thus have been another highly useful tool for assessing genetic variation. The molecular basis of variation in both the ABO and the Rh systems has been elucidated (for further details, see the suggested readings at the end of this chapter), and it is becoming increasingly common to type these systems by directly examining an individual’s DNA sequence rather than by assessing an antibody-antigen reaction.
The blood groups, of which the ABO and Rh systems are examples, have provided an important means of studying human genetic variation. Blood group variation is the result of antigens that occur on the surface of erythrocytes.
Protein electrophoresis, developed first in the 1930s and applied widely to humans in the 1950s and 1960s, increased the number of detectable polymorphic loci considerably. This technique makes use of the fact that a single amino acid difference in a protein (the result of a mutation in the corresponding DNA sequence) can cause a slight difference in the electrical charge of the protein.
An example is the common sickle cell disease mutation discussed earlier. The replacement of glutamic acid with valine in the β-globin chain produces a difference in electrical charge because glutamic acid has two carboxyl groups, whereas valine has only one carboxyl group. Electrophoresis can be used to determine whether a person has normal hemoglobin (HbA) or the mutation that causes sickle cell disease (HbS) ( Fig. 3.15 ). The hemoglobin is placed in an electrically charged gel composed of starch, agarose, or polyacrylamide (see Fig. 3.15, A ). The slight difference in charge resulting from the amino acid difference causes the HbA and HbS forms to migrate at different rates through the gel. The protein molecules are allowed to migrate for several hours and are then stained with chemical solutions so that their positions can be seen (see Fig. 3.15, B ). From the resulting pattern it can be determined whether the person is an HbA homozygote, an HbS homozygote, or a heterozygote having HbA on one chromosome copy and HbS on the other.
Protein electrophoresis has been used to detect amino acid variation in hundreds of human proteins. However, silent substitutions, which do not alter amino acids, cannot be detected by this approach. In addition, some amino acid substitutions do not alter the electrical charge of the protein molecule. For these reasons, protein electrophoresis detects only about one-third of the mutations that occur in coding DNA. In addition, single-base substitutions in noncoding DNA are not usually detected by protein electrophoresis.
Protein electrophoresis detects variations in genes that encode certain serum proteins. These variations are observable because proteins with slight differences in their amino acid sequence migrate at different rates through electrically charged gels.
Detecting Variation at the DNA Level
Each human haploid DNA sequence (i.e., the sequence inherited from one parent) differs from any other human haploid sequence by at least 3 to 4 million DNA base pairs, which amounts to one single-base difference every 1000 base pairs. Because there are only a few hundred or so blood group and protein electrophoretic polymorphisms, these approaches have detected only a tiny fraction of human DNA variation. Yet the assessment of this variation is critical to gene identification and genetic diagnosis (see Chapter 13, Chapter 8 ). Fortunately, molecular techniques developed since the 1980s have enabled the detection of millions of new polymorphisms at the DNA level. These types of variants, and the techniques used to detect them, have revolutionized both the practice and the potential of medical genetics.
Single Nucleotide Polymorphisms
The most numerous type of polymorphism in the human genome consists of variants at single nucleotide positions on a chromosome, or single nucleotide polymorphisms (SNPs). For example, one individual might have an A-T base pair at a given position, while another would have a G-C base pair. Increasingly the term single nucleotide variant (SNV) is used, reflecting the fact that many of these variants are rare (< 1% frequency) in populations. SNPs or SNVs, when they occur in functional DNA sequences, can cause inherited diseases, although most are harmless. Increasingly they are being detected by microarray and direct sequencing methods, which are discussed later in this chapter.
Tandem Repeat Polymorphisms
The great majority of SNPs have only two possible alleles, placing a limit on the amount of genetic diversity they can reveal. More diversity at a single locus could be observed if it had many alleles, rather than just two. This type of diversity is seen in microsatellite and minisatellite DNA. As discussed in Chapter 2 , microsatellites and minisatellites are regions in which the same DNA sequence is repeated over and over in tandem ( Fig. 3.16 ). Microsatellites are composed of units that are only 1 to 10 bp long, which are termed short tandem repeats (STRs) , whereas minisatellites contain longer repeat units. The number of repeat units in a given region varies substantially from individual to individual; a specific region could have as few as two or three repeats or as many as 20 or more. These polymorphisms can therefore reveal a high degree of genetic variation. STRs are easy to assay using polymerase chain reaction (PCR) (see below), and more than 1 million of them are distributed throughout the human genome. These properties make them useful for mapping genes by the process of linkage analysis, discussed in Chapter 8 . They are also useful in forensic applications, such as paternity testing and the identification of criminal suspects (see Box 3.1 ) ( Fig. 3.17 ).
Single nucleotide polymorphisms (SNPs), or single nucleotide variants (SNVs), represent nucleotide positions whose DNA bases vary among individuals. STRs are a type of polymorphism that results from varying numbers of microsatellite repeats in a specific DNA region. Because they can have many different alleles in a population, they are highly useful in forensics and in gene mapping.
Because of the large number of polymorphisms observed in the human genome, it is virtually certain that each of us is genetically unique (with the exception of identical twins, whose DNA sequences are nearly always identical). It follows that genetic variation could be used to identify individuals, much as a conventional fingerprint does. Because DNA can be found in any tissue sample, including blood, semen, and hair, ∗ genetic variation has substantial potential in forensic applications (e.g., criminal cases, paternity suits, identification of accident victims). STRs, with their many alleles, are very useful in establishing a highly specific DNA profile.
The principle underlying a DNA profile is quite simple. If we examine enough polymorphisms in a given individual, the probability that any other individual in the population has the same allele at each examined locus becomes extremely small. DNA left at the scene of a crime in the form of blood or semen, for example, can be typed for a series of STRs. Because of the extreme sensitivity of the PCR approach, even a tiny sample several years old can yield enough DNA for laboratory analysis (although extreme care must be taken to avoid contamination when using PCR with such samples). The detected alleles are then compared with the alleles of a suspect. If the alleles in the two samples match, the suspect is implicated (see Fig. 3.17 ).
A key question is whether another person in the general population might have the same alleles as the suspect. Could the DNA profile then falsely implicate the wrong person? In criminal cases, the probability of obtaining an allele match with a random member of the population is calculated. Because of the high degree of allelic variation in STRs, this probability is usually very small. The use of 13 or more STRs, which is now common practice, yields random match probabilities in the neighborhood of 1 in 1 trillion. Provided that a large enough number of loci are used under well-controlled laboratory conditions, and provided that the data are collected and evaluated carefully, DNA profiles can furnish highly useful forensic evidence. DNA profiles are now used in many thousands of criminal court cases each year.
Although we tend to think of such evidence in terms of identifying the guilty party, it should be kept in mind that when a match is not obtained, a suspect may be exonerated. In addition, postconviction DNA testing has resulted in the release of hundreds of persons who were wrongly imprisoned. Thus DNA profiles can also benefit the innocent.
In recent years, panels of SNPs have been developed that can predict an individual’s genetic ancestry, eye color, and hair color with a fair degree of accuracy. Methylation profiles can estimate age within several years. Some predictions about the identity of a perpetrator can be made on the basis of the DNA sample alone. In addition, a number of criminal suspects have been identified because their DNA profile, or that of one or more relatives, is already stored in a national database and matches the DNA profile from an evidentiary sample. Although these developments increase the potential for forensic identification, they also pose new legal and ethical questions.
∗ Even fingerprints left at a crime scene sometimes contain enough DNA for PCR amplification and DNA profiling.
Insertions, Deletions, and Copy Number Variants
Throughout the human genome, there are sections of DNA that vary in their number of copies from one individual to another. Variants whose size is less than 50 bp are caused by insertions or deletions and are termed indels (unlike STRs, indels do not occur in multiple tandem copies). On average, each human genome contains approximately 600,000 indels, most of which are less than 10 bp in size. Variants larger than 50 bp are termed structural variants and can be as large as entire chromosomes (see Chapter 6 ). An important class of structural variants are copy number variants (CNVs), typically defined as DNA sequences larger than 500–1,000 base pairs (definitions of the sizes of indels, structural variants, and CNVs vary somewhat and have not yet been standardized). CNVs may be present in zero to more than a dozen copies in a haploid genome, and each human is heterozygous for at least 100 CNVs (i.e., they inherited a different number of copies from the mother than from the father). Although CNVs are much less numerous than SNPs, their large individual size means that they account for at least several million total base pair differences between any pair of haploid DNA sequences (roughly the same amount as SNPs). Some CNVs have been shown to be associated with inherited diseases, and some are associated with response to specific therapeutic drugs. Fig. 3.18 highlights the differences among SNPs, tandem repeats, indels, and CNVs.
Indels are defined as insertions or deletions of segments of DNA smaller than 50 bp. CNVs are a type of structural variant that consists of differences in the number of repeated DNA sequences longer than 500–1000 bp.
Southern Blotting and Restriction Fragment Analysis
An early approach to the detection of genetic variation at the DNA level took advantage of the existence of bacterial enzymes known as restriction endonucleases, or restriction enzymes. These enzymes cleave human DNA at specific sequences, termed restriction sites. For example, the intestinal bacterium Escherichia coli produces a restriction enzyme called Eco RI, which recognizes the DNA sequence GAATTC. Each time this sequence is encountered, the enzyme cleaves the sequence between the G and the A ( Fig. 3.19 ). A restriction digest of human DNA using this enzyme will produce more than 1 million DNA fragments (restriction fragments). These fragments are then subjected to gel electrophoresis, in which the smaller ones migrate more quickly through the gel than do the larger ones ( Fig. 3.20 ). The DNA is denatured (i.e., converted from a double-stranded to a single-stranded form) by exposing it to alkaline chemical solutions. To fix their positions permanently, the DNA fragments are transferred from the gel to a solid membrane, such as nitrocellulose (this is a Southern transfer, named after the man who invented the process in the mid-1970s). At this point, the solid membrane, often called a Southern blot, contains many thousands of fragments arrayed according to their size. Because of their large number, the fragments are indistinguishable from one another.