DNA Replication: A Primer

One of the most exciting predictions of the Watson-Crick model for the structure of DNA was a mechanism for DNA replication. Because DNA strand pairing is determined by complementary base pairing, it was logical to propose the existence of DNA polymerases, enzymes that would move along a single strand of DNA, recognize each base in turn, and insert the proper complementary base at the end of the growing chain. Thus, one might have surmised that only a single enzyme was required for DNA synthesis. In fact, DNA replication in eukaryotic cells involves a complex macromolecular machine.

In the basic reaction of DNA replication, the 3′ hydroxyl at the end of the growing DNA strand makes a nucleophilic attack on the α-phosphate of the incoming nucleoside triphosphate to form a phosphodiester bond. This incorporates the nucleotide into the growing chain and releases pyrophosphate (Fig. 42-1). Subsequent hydrolysis of the pyrophosphate provides the driving force for the reaction. This reaction requires the presence of a template strand of DNA that specifies, through base pairing, which of the four nucleoside triphosphates is added to the growing complementary strand.

Figure 42-1 mechanism of dna polymerization. A 3′ OH group at the end of a growing DNA chain makes a nucleophilic attack on the α-phosphate of a triphosphate precursor in the active site of polymerase (enzyme not shown here). dNTP, nucleoside triphosphate.

Before discussing DNA replication and its regulation, an introduction to some terminology describing the geometry of replicating DNA is required. The exact site on the chromosomal DNA where replication begins is termed the origin of bidirectional replication. As the termbidirectional implies, two sets of DNA replication machinery head off in opposite directions from the origin. Each set of replication machinery, together with the DNA that it is replicating, is called a replication fork because at the site of replication, one parental DNA molecule splits into two (Fig. 42-2). It is not known whether replication forks move along the DNA like trains along a track or whether the fork sits at a stationary site (referred to as a replication factory) through which the DNA is “reeled in” as it is replicated.

Figure 42-2 KEY COMPONENTS AND EVENTS AT THE REPLICATION FORK.

The bidirectional nature of DNA replication causes a fundamental problem, as DNA synthesis invariably proceeds in a 5′ to 3′ direction. Replication of the so-called leading strand poses no problems. This is the strand along which the fork moves in a 3′ to 4′ direction, so the newly synthesized DNA is laid down smoothly in a 5′ to 3′ direction (Fig. 42-2). However, the other template strand faces in the opposite direction, apparently requiring DNA polymerase to synthesize DNA in the wrong direction as the replication fork progresses away from the origin (i.e., adding nucleotides in a 3′ to 4′ direction). No DNA polymerase with this polarity has been found. Instead, this lagging strand replicates in a series of short segments. Every time the DNA strands have been peeled apart (unwound) by 250 nucleotides or so, a polymerase/primase complex (see Fig. 42-11) initiates DNA synthesis on the lagging strand, with the polymerase running back toward the replication origin in a 5′ to 3′ direction. Locally, synthesis on the lagging strand proceeds in a direction opposite to the overall direction of fork movement. Synthesis of each lagging strand fragment stops when DNA polymerase runs into the 5′ end of the previous fragment. Thus, the lagging strand is copied in a highly discontinuous fashion into short fragments known as Okazaki fragments (named after their discoverer [Fig. 42-2]). Fig. 42-11 describes the enzymes and events at the replication fork in greater detail.

Figure 42-11 the main events of dna replication. For a more detailed description, see the text.

(PDB file for Fen1: 1A76. PDB file for RFC/PCNA: 1SXJ. PDB file for Cdt1: 1WLQ.)

Origins of Replication

Bacteria such as Escherichia coli replicate their circular chromosomes using two replication forks starting from a single origin of replication (Fig. 42-3A), but eukaryotes must use multiple origins of replication to duplicate their large genomes during a relatively short S phase, which can be limited to as little as a few minutes in some early embryos. These numerous origins are distributed along the chromosome: up to 400 in budding yeast and about 60,000 in human cells. These origins are positioned so that all of the DNA is replicated in the available time, and to be on the safe side, more origins are prepared than are actually needed.

Figure 42-3 A, The E. coli chromosome is a simple replicon with a single origin of replication. In cells, this chromosome has a complex, highly supercoiled structure. B, Eukaryotic chromosomes have multiple origins of replication.

The existence of multiple origins creates a potential hazard: If any origin were used more or less than once per cell cycle, genes would be duplicated or lost. How is the “firing” of all of these origins orchestrated so that each is used once and only once per S phase? Cells manage this problem by a mechanism termed licensing, which ensures that each origin is used once and only once per S phase. Each origin is licensed to replicate once and only once per cell cycle. Replication of the origin removes the license, which cannot normally be renewed until the cell has completely traversed the cycle and has passed through mitosis.

A unit of chromosomal DNA whose replication is initiated at a single origin is termed a replicon. The origin is defined genetically as a replicator element. The classic replicon is the E. coli chromosome (which is 4 × 10⁶ base pairs [bp] in size); this has a single replicator site called oriC (Fig. 42-3). An initiator protein (product of the E. coli DnaA gene [Fig. 42-12]) binds to this origin and either directly or indirectly promotes melting of the DNA duplex, giving the replication machinery access to two single strands of DNA. Other factors bind to the initiator, and their concerted action produces a wave of DNA replication proceeding outward in both directions along the DNA (a replication “bubble”) at about 750 to 1250 bases per second.

Figure 42-12 Diagram showing factors involved in the initiation of DNA replication in E. coli. A, DNA sequences at OriC. B, Unwinding of the origin. C, Binding of helicase. D, The template, now ready for binding of DNA polymerase.

(Adapted from Baker TA, Wickner SH: Genetics and enzymology of DNA replication in Escherichia coli. Annu Rev Genet 26:447–477, 1992.)

An average human chromosome contains about 150 × 10⁶ bp of DNA. Because the replication machinery in mammals moves only about 20 to 100 bases per second (probably reflecting the fact that the DNA is packaged into chromatin [see Chapter 13]), it would take up to 2000 hours to replicate this length of DNA from a single origin. In most human cells, the duration of the S phase is about eight hours. This means that at least 25 to 125 origins of replication would be required to replicate an average chromosome in the allotted time. In fact, origins of replication are much more closely spaced than this. It has been estimated that mammalian origins of replication are spaced about 100,000 to 150,000 bp apart. Thus, approximately 60,000 origins of replication participate in replication of the entire human genome.

To explain the events at origins of replication, the budding yeast Saccharomyces cerevisiae serves as a good example. Its DNA replication is better understood than that of any other eukaryote.

Replication Origins in S. Cerevisiae

About 400 origins of replication participate in replicating the budding yeast genome. A major breakthrough in understanding DNA replication in S. cerevisiae was the identification of short (100 to 150 bp) segments of DNA that act as replication origins in vivo when cloned into a yeast plasmid (circular DNA molecule). These autonomously replicating sequences (or ARS elements) allow yeast plasmids to replicate in parallel with the cellular chromosomes (Fig. 42-4). ARS elements are often, although not always, bona fide replication origins in their native chromosomal context. Replication always initiates within ARS elements, but not all ARS ele-ments act as origins of DNA replication in every cell cycle.

Figure 42-4 the plasmid assay for identification of an autonomously replicating sequence element (origin of dna replication) in budding yeast. The plasmid at left has a selectable marker gene (e.g., a gene required for the synthesis of an essential amino acid) plus (in panel B) an ARS element. This plasmid is transferred into growing yeast cells that are defective in the marker gene carried by the plasmid, and these cells are then plated out on agar medium that lacks the essential amino acid. Only cells containing a form of the plasmid that can be replicated will grow to make colonies. A, A plasmid lacking an ARS fails to replicate and is lost from the cells. These cells cannot grow into colonies on plates that lack the essential amino acid. B, If the plasmid contains an ARS element, it replicates along with the chromosomal DNA and is maintained in the population. These cells grow into colonies in the absence of the essential amino acid.

Yeast replication origins are spaced about every 30,000 bp, with a maximum separation of about 130,000 bp. Even this longest interval should replicate easily within the 30 minutes available during the S phase. Because the number of origins exceeds the number required to replicate the genome within the allotted time, some origins need not “fire” every cell cycle. The probability that any given origin will be used in a given cell cycle ranges from less than 0.2 to more than 0.9. It is important to note that replication of an origin by a fork coming from an adjacent origin inactivates it, thereby preventing excess replication during the cell cycle.

The ARS element does two things to establish an origin of replication. First, it has conserved sequences that act as binding sites for a protein complex that marks it as a potential origin. Second, it has nearby sequences that can readily be induced to unwind (become un-base-paired).

Budding yeast ARS elements share a common DNA sequence motif called the ARS core consensus sequence: 5′-(A/T)TTTAT(A/G)TTT(A/T)-3′ (Fig. 42-5). Single base mutations at several locations within this sequence completely inactivate ARS activity. Other, less well-conserved DNA sequences also contribute to the activity of the ARS as a replication origin. One of these, termed B1, together with the ARS core, forms the binding site for a complex of six proteins (five of which are AAA ATPases) termed the origin recognition complex (ORC [see later section]). The DNA unwinding element is thought to be another short sequence (B2) located a bit further along the DNA. DNA synthesis begins at an origin of bidirectional replication midway between the ORC binding site and the DNA unwinding element.

Figure 42-5 the organization of the ars1 element. ORC binds to the ARS core sequence plus element B1. B2 is a sequence that can readily be induced to unwind. The OBR (origin of bidirectional replication) is the site where DNA synthesis actually begins. B3 is a binding site for an auxiliary factor called ABF-1 that is both a transcriptional activator and an activator of the ARS element.

ORC was identified by its ability to bind the 11-bp ARS core sequence (Fig. 42-5). This binding has two noteworthy features. First, it requires adenosine triphosphate (ATP), which remains associated with the ORC complex. Second, in yeast, the ORC complex remains bound to the origins of replication across the entire cell cycle. Thus, something other than the presence of ORC must be responsible for regulating the periodic activation of origins in the S phase (see Fig. 42-14). In metazoans, ORC behavior is more complex; the largest subunit, Orc1, cycles on and off the DNA in a cell-cycle-regulated manner.

Figure 42-14 Measurement of the time of replication of particular chromosomal regions in Saccharomyces cerevisiae. A–C, This protocol is based on a classic density shift experiment of Messelson and Stahl that proved that DNA replication is semiconservative. S. cerevisiae cells are grown for several generations in a medium containing ¹³C and ¹⁵N heavy isotopes. As a result, their DNA is fully substituted with heavy isotopes. At the beginning of the experiment, the cells are synchronized so that they enter the S phase in a single wave. At the same time, the heavy (H) isotope medium is removed and replaced with “light medium” (L) containing ¹²C and ¹⁴N. At various times after the initiation of the S phase, aliquots of cells are removed, and the DNA is isolated. The DNA is then cleaved with restriction enzymes so that the chromosomes are cut into many fragments. DNA from each time point is then subjected to CsCl density gradient centrifugation. When any local region of DNA is replicated, its density alters from heavy/heavy to heavy/light. After very short incubations with light isotopes, only DNA near the origin of replication will be heavy/light; all other DNA will be heavy/heavy. These two populations of molecules are separated from one another by the density gradient centrifugation. To examine the timing of replication of a specific gene, a cloned segment of DNA corresponding to the region of interest is used to probe (by DNA hybridization) the heavy/heavy and heavy/light peaks from each gradient. The time of replication of each locus is the time at which the restriction fragment being detected by DNA hybridization moves from the heavy/heavy peak to the heavy/light peak. The numbers in panels B and C refer to the numbered regions of the chromosomes shown in A. D, Data from a replication timing experiment show that in budding yeast, centromeres replicate early in the S phase and telomeres replicate late. To generate curve a, fractions from a gradient like that shown in panel B were hybridized to a cloned centromere region. To generate curve b, fractions from the same gradient were hybridized to a cloned telomere region probe. Note that in mammalian cells, centromeres replicate late and telomeres replicate earlier. (This figure is based on the work of the laboratory of B. J. Brewer and W. L. Fangman.

ARS elements typically contain binding sites for other sequence-specific DNA binding proteins, such as transcription factors. For example, a transcription factor called ARS-binding factor 1 (ABF-1) binds to the B3 sequence within the ARS1 element (Fig. 42-5). Deletion of the ABF-1 binding site only slightly reduces the ability of ARS1 to act as a replication origin in vivo. Furthermore, substitution of DNA binding sequences for other transcription factors within the B3 sequence has little effect on replication efficiency.

In addition to their role in DNA replication, several ORC components also seem to regulate heterochromatin formation and transcription (see Chapters 13 and 15). This cross talk between the machinery used for transcription and DNA replication may explain why regions of chromosomes with actively transcribed genes typically replicate early in the S phase (see the discussion that follows). The Orc6 subunit also functions in mitosis at kinetochores and during cytokinesis. Its detailed role in those processes is not known.