DNA Replication Errors
The process of DNA replication (see Fig. 2-4) is typically highly accurate; the majority of replication errors (i.e., inserting a base other than the complementary base that would restore the base pair at that position in the double helix) are rapidly removed from the DNA and corrected by a series of DNA repair enzymes that first recognize which strand in the newly synthesized double helix contains the incorrect base and then replace it with the proper complementary base, a process termed DNA proofreading. DNA replication needs to be a remarkably accurate process; otherwise, the burden of mutation on the organism and the species would be intolerable. The enzyme DNA polymerase faithfully duplicates the two strands of the double helix based on strict base-pairing rules (A pairs with T, C with G) but introduces one error every 10 million bp. Additional proofreading then corrects more than 99.9% of these errors of DNA replication. Thus the overall mutation rate per base as a result of replication errors is a remarkably low 1 × 10−10 per cell division—fewer than one mutation per genome per cell division.
Repair of DNA Damage
It is estimated that, in addition to replication errors, between 10,000 and 1 million nucleotides are damaged per human cell per day by spontaneous chemical processes such as depurination, demethylation, or deamination; by reaction with chemical mutagens (natural or otherwise) in the environment; and by exposure to ultraviolet or ionizing radiation. Some but not all of this damage is repaired. Even if the damage is recognized and excised, the repair machinery may create mutations by introducing incorrect bases. Thus, in contrast to replication-related DNA changes, which are usually corrected through proofreading mechanisms, nucleotide changes introduced by DNA damage and repair often result in permanent mutations.
A particularly common spontaneous mutation is the substitution of T for C (or A for G on the other strand). The explanation for this observation comes from considering the major form of epigenetic modification in the human genome, DNA methylation, introduced in Chapter 3. Spontaneous deamination of 5-methylcytosine to thymidine (compare the structures of cytosine and thymine in Fig. 2-2) in the CpG doublet gives rise to C to T or G to A mutations (depending on which strand the 5-methylcytosine is deaminated). Such spontaneous mutations may not be recognized by the DNA repair machinery and thus become established in the genome after the next round of DNA replication. More than 30% of all single nucleotide substitutions are of this type, and they occur at a rate 25 times greater than those of any other single nucleotide mutations. Thus the CpG doublet represents a true “hot spot” for mutation in the human genome.
Overall Rate of DNA Mutations
Although the rate of DNA mutations at specific loci has been estimated using a variety of approaches over the past 50 years, the overall impact of replication and repair errors on the occurrence of new mutations throughout the genome can now be determined directly by whole-genome sequencing of trios consisting of a child and both parents, looking for new mutations in the child that are not present in the genome sequence of either parent. The overall rate of new mutations averaged between maternal and paternal gametes is approximately 1.2 × 10−8 mutations per base pair per generation. Thus every person is likely to receive approximately 75 new mutations in his or her genome from one or the other parent. This rate, however, varies from gene to gene around the genome and perhaps from population to population or even individual to individual. Overall, this rate, combined with considerations of population growth and dynamics, predicts that there must be an enormous number of relatively new (and thus very rare) mutations in the current worldwide population of 7 billion individuals.
As might be predicted, the vast majority of these mutations will be single nucleotide changes in noncoding portions of the genome and will probably have little or no functional significance. Nonetheless, at the level of populations, the potential collective impact of these new mutations on genes of medical importance should not be overlooked. In the United States, for example, with over 4 million live births each year, approximately 6 million new mutations will occur in coding sequences; thus, even for a single protein-coding gene of average size, we can anticipate several hundred newborns each year with a new mutation in the coding sequence of that gene.
Conceptually similar studies have determined the rate of mutations in CNVs, where the generation of a new length variant depends on recombination, rather than on errors in DNA synthesis to generate a new base pair. The measured rate of formation of new CNVs (≈1.2 × 10−2 per locus per generation) is orders of magnitude higher than that of base substitutions.