The rapid development of techniques in molecular biology is revolutionizing the practice of medicine. The potential uses of these techniques for the diagnosis and treatment of disease are vast.
Clinical Applications. Polymorphisms, inherited differences in DNA base sequences, are abundant in the human population, and many alterations in DNA sequences are associated with diseases. Tests for DNA sequence variations are more sensitive than many other techniques (such as enzyme assays) and permit recognition of diseases at earlier and, therefore, potentially more treatable stages. These tests can also identify carriers of inherited diseases so they can receive appropriate counseling. Because genetic variations are so distinctive, DNA fingerprinting (analysis of DNA sequence differences) can be used to determine family relationships or to help identify the perpetrators of a crime.
Techniques of molecular biology are used in the prevention and treatment of disease. For example, recombinant DNA techniques provide human insulin for the treatment of diabetes, factor VIII for the treatment of hemophilia, and vaccines for the prevention of hepatitis. Enzyme replacement therapy has been successful for a number of diseases, in part owing to the ability to produce large amounts of enzyme through recombinant DNA technology. Although treatment of disease by gene therapy is in the experimental phase of development, the possibilities are limited only by the human imagination and, of course, by ethical considerations. The ability to rapidly analyze the genome and proteome (all expressed proteins) of a cell enables different variants of a particular disease to be identified and treated appropriately.
Techniques. To recognize normal or pathologic genetic variations, DNA must be isolated from the appropriate source and adequate amounts must be available for study. Techniques for isolating and amplifying genes and studying and manipulating DNA sequences involve the use of restriction enzymes, cloning vectors, polymerase chain reaction (PCR), gel electrophoresis, blotting onto nitrocellulose paper, and the preparation of labeled probes that hybridize to the appropriate target DNA sequences. Techniques to analyze all expressed genes within a cell require gene chip assays, which can lead to a genetic profile of normal versus diseased cells. Gene therapy involves isolating normal genes and inserting them into diseased cells so that the normal genes are expressed, permitting the diseased cells to return to a normal state. Ablation of gene expression is possible using techniques based on silencing RNA (small interfering RNA [siRNA]) and the clustered regularly interspaced small palindromic repeats (CRISPR)/CRISPR-associated (Cas) DNA editing system. CRISPR/Cas can also be used to either repair existing mutations in the human genome or to replace genes in the human genome. Students should have a general understanding of recombinant DNA techniques to appreciate their current use and the promise they hold for the future. Rapid sequencing of DNA and complementary DNA (cDNA; next-generation sequencing) allows for rapid determination of mutations in the genome and changes in gene expression.
THE WAITING ROOM
Edna R., a third-year medical student, has started working in the hospital blood bank two nights a week (see Chapter 14 for an introduction to Edna R. and her daughter, Beverly R.). Because she will be handling human blood products, she must have a series of hepatitis B vaccinations. She has reservations about having these vaccinations and inquires about the efficacy and safety of the vaccines currently in use.
Susan F. is a 3-year-old Caucasian girl who has been diagnosed with cystic fibrosis (CF). Her growth rate has been in the 30th percentile over the last year. Since birth, she has had occasional episodes of spontaneously reversible and minor small-bowel obstruction. These episodes are superimposed on gastrointestinal symptoms that suggest a degree of dietary fat malabsorption, such as bulky, glistening, foul-smelling stools two or three times per day. She has experienced recurrent flare-ups of bacterial bronchitis/bronchiolitis in the last 10 months, each time caused by Pseudomonas aeruginosa. A quantitative sweat test was unequivocally positive (excessive sodium and chloride were found in her sweat on two occasions). Based on these findings, the pediatrician informed Susan F.’s parents that Susan F. probably has CF. A sample of her blood was sent to a DNA testing laboratory to confirm the diagnosis and to determine specifically which one of the many potential genetic mutations known to cause CF was present in her cells.
Victoria T. was a 21-year-old woman who was the victim of a rape and murder. Her parents had told police she had left her home and drove to the local convenience store. When she had not returned home an hour later, her father drove to the store, looking for Victoria T. He found her car still parked in front of the store and called the police. They searched the area around the store and found Victoria T.’s body in a wooded area behind the building. She had been sexually assaulted and strangled. Medical technologists from the police laboratory collected a semen sample from vaginal fluid and took samples of dried blood from under the victim’s fingernails. Witnesses identified three men who spoke to Victoria T. while she was at the convenience store. DNA samples were obtained from these suspects to determine whether any of them was the perpetrator of the crime.
Isabel S.’s cough is slightly improved on a multidrug regimen for pulmonary tuberculosis, but she continues to have night sweats. She is tolerating her current human immunodeficiency virus (HIV) therapy well but complains of weakness and fatigue. The man with whom she has shared needles to inject drugs accompanies Isabel S. to the clinic and requests that he be tested for the presence of HIV.
I. Recombinant DNA Techniques
Techniques for joining DNA sequences into new combinations (recombinant DNA) were originally developed as research tools to explore and manipulate genes and to produce the gene products (protein). Now they are also being used to identify mutated genes associated with disease and to correct genetic defects. These techniques will soon replace many current clinical testing procedures. A basic appreciation of recombinant DNA techniques is required to understand how genetic variations among individuals are determined and how these differences can be used to diagnose disease. The first steps in determining individual variations in genes involve isolating the genes (or fragments of DNA) that contain variable sequences and obtaining adequate quantities for study. The human genome project has succeeded in sequencing the 3 billion bases of the human genome and can now be used as a template to discover and understand the molecular basis of disease.
A. Strategies for Obtaining Fragments of DNA and Copies of Genes
1. Restriction Fragments
Enzymes called restriction endonucleases enable molecular biologists to cleave segments of DNA from the genome of various types of cells or to fragment DNA obtained from other sources. A class II restriction enzyme is an endonuclease that specifically recognizes a short sequence of DNA, usually 4 to 6 base pairs (bp) in length, and cleaves a phosphodiester bond in both DNA strands within this sequence (Fig. 16.1). A key feature of class II restriction enzymes is their specificity. A restriction enzyme always cleaves at the same DNA sequence and cleaves only at that particular sequence. Most of the DNA sequences recognized by restriction enzymes are palindromes—that is, both strands of DNA have the same base sequence when read in a 5′-to-3′ direction. The cuts made by these enzymes are usually “sticky” (i.e., the products are single stranded at the ends, with one strand overhanging the other, so they anneal with complementary sequences to the overhang). However, sometimes they are blunt (the products are double stranded at the ends, with no overhangs). Hundreds of restriction enzymes with different specificities have been isolated (Table e-16.1).
Restriction endonucleases were discovered in bacteria in the late 1960s and 1970s. These enzymes were named for the fact that bacteria use them to “restrict” the growth of viruses (bacteriophage) that infect the bacterial cells. They cleave the phage DNA into smaller pieces so the phage cannot reproduce in the bacterial cells. However, they do not cleave the bacterial DNA because its bases are methylated at the restriction sites by DNA methylases. Restriction enzymes also restrict the uptake of DNA from the environment, and they restrict mating with nonhomologous species.
Restriction fragments of DNA can be used to identify variations in base sequence in a gene. However, they also can be used to synthesize a recombinant DNA (also called chimeric DNA), which is composed of molecules of DNA from different sources that have been recombined in vitro (outside the organism; e.g., in a test tube). The sticky ends of two unrelated DNA fragments can be joined to each other if they have sticky ends that are complementary. Complementary ends are obtained by cleaving the unrelated DNAs with the same restriction enzyme (Fig. 16.2). After the sticky ends of the fragments base-pair with each other, the fragments can be attached covalently by the action of DNA ligase.
2. DNA Produced by Reverse Transcriptase
If messenger RNA (mRNA) transcribed from a gene is isolated, this mRNA can be used as a template by the enzyme reverse transcriptase (see Chapter 11, Biochemical Comments), which produces a DNA copy (cDNA) of the RNA. In contrast to DNA fragments cleaved from the genome by restriction enzymes, DNA produced by reverse transcriptase does not contain introns because mRNA, which has no introns, is used as a template. cDNA also lacks the regulatory regions of a gene as those sequences (promoter, promoter-proximal elements, and enhancers) are not transcribed into mRNA.
3. Chemical Synthesis of DNA
Automated machines can synthesize oligonucleotides (short molecules of single-stranded DNA) up to 150 nucleotides in length. These machines can be programmed to produce oligonucleotides with a specified base sequence. Although entire genes cannot yet be synthesized in one piece, appropriate overlapping pieces of genes can be made and then ligated together to produce a fully synthetic gene. Additionally, oligonucleotides can be prepared that will base-pair with segments of genes. These oligonucleotides can be used in the process of identifying, isolating, and amplifying genes.
B. Techniques for Identifying DNA Sequences
1. Probes
A probe is a single-stranded polynucleotide of DNA or RNA that is used to identify a complementary sequence on a larger single-stranded DNA or RNA molecule (Fig. 16.3). Formation of base pairs with a complementary strand is called annealing or hybridization. Probes can be composed of cDNA (produced from mRNA by reverse transcriptase), fragments of genomic DNA (cleaved by restriction enzymes from the genome), chemically synthesized oligonucleotides, or, occasionally, RNA.
The conditions of hybridization can be manipulated to provide different degrees of stringency. Stringency refers to how exact a match the probe must have to the DNA to which it is hybridizing for significant hybridization to occur. Low-stringency conditions allow a number of mismatches between the two nucleic acid strands to be tolerated (nonstandard base pairs); high stringency requires an exact match of the complementary sequences before hybridization can take place. Stringency can be manipulated by raising or lowering the temperature (increased temperature increases stringency) and raising or lowering the salt concentration in the hybridization reaction (high salt concentration reduces the stringency because it negates the electrostatic repulsion between the phosphates in the DNA backbone of two mismatched DNA strands). Thus, a high-stringency hybridization (looking for an exact match) will be performed at high temperature and low salt concentrations.
To identify the target sequence, the probe must carry a label (see Fig. 16.3). If the probe has a radioactive label such as 32P, it can be detected by autoradiography. An autoradiogram is produced by covering the material containing the probe with a sheet of X-ray film. Electrons (β-particles) emitted by disintegration of the radioactive atoms expose the film in the region directly over the probe. Several techniques can be used to introduce labels into these probes. Not all probes are radioactive. Some are chemical adducts (compounds that bind covalently to DNA) that can be identified by, for example, fluorescence microscopy.
2. Gel Electrophoresis
Gel electrophoresis is a technique that uses an electrical field to separate molecules on the basis of size. Because DNA contains negatively charged phosphate groups, it will migrate in an electrical field toward the positive electrode (Fig. 16.4). Shorter molecules migrate more rapidly through the pores of a gel than do longer molecules, so separation is based on length. Gels composed of polyacrylamide, which can separate DNA molecules that differ in length by only one nucleotide, are used to determine the base sequence of DNA. Agarose gels are used to separate longer DNA fragments that have larger size differences.
The bands of DNA in the gel can be visualized by various techniques. Staining with dyes such as ethidium bromide allows direct visualization of DNA bands under ultraviolet light. Specific sequences are generally detected using a labeled probe.
3. Detection of Specific DNA Sequences
To detect specific sequences, DNA is usually transferred to a solid support, such as a sheet of nitrocellulose paper. For example, if bacteria are growing on an agar plate, cells from each colony will adhere to a nitrocellulose sheet pressed against the agar, and an exact replica of the bacterial colonies can be transferred to the nitrocellulose paper (Fig. 16.5). A similar technique is used to transfer bands of DNA from electrophoretic gels to nitrocellulose sheets. After bacterial colonies or bands of DNA are transferred to nitrocellulose paper, the paper is treated with an alkaline solution and then heated. Alkaline solutions denature DNA (i.e., separate the two strands of each double helix), and the heating fixes the DNA on the filter paper, such that it will not move from its position during the rest of the blotting procedure. The single-stranded DNA is then hybridized with a probe, and the regions on the nitrocellulose blot containing DNA that base-pairs with the probe are identified.
E. M. Southern developed the technique, which bears his name, for identifying DNA sequences on gels. Southern blots are produced when DNA on a nitrocellulose blot of an electrophoretic gel is hybridized with a DNA probe. Molecular biologists decided to continue with this geographic theme as they named two additional techniques. Northern blots are produced when RNA on a nitrocellulose blot is hybridized with a DNA probe. A slightly different but related technique, known as a Western blot, involves separating proteins by gel electrophoresis and probing with labeled antibodies for specific proteins (Fig. 16.6).
4. DNA Sequencing
The most common procedure for determining the sequence of nucleotides in a DNA strand was developed by Frederick Sanger and involves the use of dideoxynucleotides. Dideoxynucleotides (see Chapter 12) lack a 3′-hydroxyl group (in addition to lacking the 2′-hydroxyl group that is normally absent from DNA deoxynucleotides). Thus, once they are incorporated into a replicating DNA chain, the next nucleotide cannot add, and polymerization is terminated. In this procedure, only one of the four dideoxynucleotides (dideoxyadenosine triphosphate [ddATP], dideoxythymidine triphosphate [ddTTP], dideoxyguanosine triphosphate [ddGTP], or dideoxycytidine triphosphate [ddCTP]) is added to a tube containing all four normal deoxynucleotides, DNA polymerase, a primer, and the template strand for the DNA that is being sequenced (Fig. 16.7). As DNA polymerase catalyzes the sequential addition of complementary bases to the 3′-end, the dideoxynucleotide competes with its corresponding normal nucleotide for insertion. Whenever the dideoxynucleotide is incorporated, further polymerization of the strand cannot occur, and synthesis is terminated. Some of the chains will terminate at each of the locations in the template strand that is complementary to the dideoxynucleotide. Consider, for example, a growing polynucleotide strand in which adenine (A) should add at positions 10, 15, and 16. Competition between ddATP and dATP for each position results in some chains terminating at position 10, some at 15, and some at 16. Thus, DNA strands of varying lengths are produced from a template. The shortest strands are closest to the 5′-end of the growing DNA strand because the strand is synthesized in a 5′-to-3′ direction.
Four separate reactions are performed, each with only one of the dideoxynucleotides present (ddATP, ddTTP, ddGTP, ddCTP), plus a complete mixture of normal nucleotides (see Fig. 16.7B). In each tube, some strands are terminated whenever the complementary base for that dideoxynucleotide is encountered. If these strands are subjected to gel electrophoresis, the sequence 5′ → 3′ of the DNA strand complementary to the template can be determined by “reading” from the bottom to the top of the gel, that is, by noting the lanes (A, G, C, or T) in which bands appear, starting at the bottom of the gel and moving sequentially toward the top.
5. Next-Generation DNA Sequencing
The original limitation in the traditional Sanger method of sequencing DNA was speed; it took an extended period of time to generate significant amounts of sequence data. Improvements in the speed of sequencing have led to next-generation sequencing, which allow for sequencing of an entire genome in <1 day.
This technique involves mechanical fractionation of the genome, followed by the addition of known sequences of DNA to the ends of the unknown DNA (Fig. 16.8). The fractionated DNA, with known ends, is added to a glass slide which contains bound DNA complementary to the added ends of the unknown DNA. The DNA samples are amplified (see the section on PCR in this chapter) and then sequenced using a primer that is complementary to the known sequence of DNA on the ends of the unknown DNA. Many thousands of pieces of DNA are sequenced simultaneously on the slide. In this case, one sequences the unknown fragments one nucleotide at a time. Each dNTP in the reaction mixture is linked to a different fluorophore as well as a chemical blocking agent at the 3′-hydroxyl group on the ribose. After the first nucleotide has been added to the primer, a computer analyzes the fluorescence of all the DNA sequences on the slide and stores the data. Chemicals are then added to remove both the blocking groups from the 3′-hydroxyl groups and the fluorophores from the nucleotides already incorporated into the DNA. DNA synthesis is then initiated to add the next base to the primer, and the process is repeated. This continues for up to 100 bases, such that 100-base-long sequences are stored in the computer for each piece of unknown DNA on the slide. The computer then analyzes these sequences, looks for overlaps in sequence, and can generate an entire sequence of the DNA being analyzed. Multiple variations of this procedure have been developed and are revolutionizing clinical testing.
C. Techniques for Amplifying DNA Sequences
To study genes or other DNA sequences, adequate quantities of material must be obtained. It is often difficult to isolate significant quantities of DNA from the original source. For example, an individual cannot usually afford to part with enough tissue to provide the amount of DNA required for clinical testing. Therefore, the available quantity of DNA has to be amplified.
1. Cloning of DNA
The first technique developed for amplifying the quantity of DNA is known as cloning (Fig. 16.9). The DNA that you want to be amplified (the “foreign” DNA) is attached to a vector (a carrier DNA), which is introduced into a host cell that makes multiple copies of the DNA. The foreign DNA and the vector DNA are usually cleaved with the same restriction enzyme, which produces complementary sticky ends in both DNAs. The foreign DNA is then added to the vector. Base pairs form between the complementary single-stranded regions, and DNA ligase joins the molecules to produce a chimera or recombinant DNA. As the host cells divide, they replicate their own DNA, and they also replicate the DNA of the vector, which includes the foreign DNA.
If the host cells are bacteria, commonly used vectors are bacteriophage (viruses that infect bacteria), plasmids (extrachromosomal pieces of circular DNA that are taken up by bacteria), or cosmids (plasmids that contain DNA sequences from the lambda bacteriophage). When eukaryotic cells are used as the host, the vectors are often retroviruses, adenoviruses, free DNA, or DNA coated with a lipid layer (liposomes). The foreign DNA sometimes integrates into the host-cell genome, or it exists as episomes (extrachromosomal fragments of DNA).
Host cells that contain recombinant DNA are called transformed cells if they are bacteria or transfected (or transduced if the vector is a virus) cells if they are eukaryotes. Markers in the vector DNA are used to identify cells that have been transformed, and probes for the foreign DNA can be used to determine that the host cells actually contain the foreign DNA. If the host cells containing the foreign DNA are incubated under conditions in which they replicate rapidly, large quantities of the foreign DNA can be isolated from the cells. With the appropriate vector and growth conditions that permit the expression of the foreign DNA, large quantities of the protein produced from this DNA can be isolated.
2. Libraries
Specific collections of DNA fragments are known as libraries. A genomic library is a set of host cells (or phage) that collectively contain all the DNA sequences from the genome of another organism. Thus, a genomic library contains promoter and intron sequences of every gene. A cDNA library is a set of host cells that collectively contain all the DNA sequences produced by reverse transcriptase from the mRNA obtained from cells (or tissue) of a particular type. Therefore, a cDNA library contains cDNA for all the genes expressed in that cell type, corresponding to the particular stage of differentiation of the cell when the mRNA was isolated. Because cDNA libraries are generated by reverse transcription of mRNA, promoter and intron sequences of genes are not present in those libraries.
The DNA fragments that are utilized to construct genomic libraries are much larger than those needed to construct cDNA libraries (for humans, a genomic library would need to represent all 3 billion bp in the haploid genome; the average mRNA is about 2,500 bases in size). Thus, different vectors are utilized to construct genomic libraries compared to cDNA libraries. Bacteriophage (which can handle up to 20 kilobase pairs [kb] of foreign DNA), bacterial artificial chromosomes (BACs; which can handle up to 150 kb of foreign DNA), and yeast artificial chromosomes (YACs; which can handle up to 1,000 kb of foreign DNA) are often used in the construction of genomic libraries. For cDNA libraries, plasmids (which can accept up to 2 kb of foreign DNA) are usually the vector of choice.
To clone a gene, a suitable probe must be developed (derived from either an amino acid sequence within a protein or from a similar DNA sequence obtained from another species); the library is then screened with the probe (utilizing techniques described previously) to find host cells that harbor DNA sequences complementary to the probe. Obtaining sufficient clones enable the complete cDNA, or gene, to be obtained and sequenced.
3. Polymerase Chain Reaction
The PCR is an in vitro method that can be used for the rapid production of very large amounts of specific segments of DNA. It is particularly suited for amplifying regions of DNA for clinical or forensic testing procedures because only a very small sample of DNA is required as the starting material. Regions of DNA can be amplified by PCR from a single strand of hair or a single drop of blood or semen.
First, a sample of DNA containing the segment to be amplified must be isolated. Large quantities of primers, the four deoxyribonucleoside triphosphates, and a heat-stable DNA polymerase are added to a solution in which the DNA is heated to separate the strands (Fig. 16.10). The primers are two synthetic oligonucleotides: one oligonucleotide is complementary to a short sequence in one strand of the DNA to be amplified and the other is complementary to a sequence in the other DNA strand. As the solution is cooled, the oligonucleotides form base pairs with the DNA and serve as primers for the synthesis of DNA strands by a heat-stable DNA polymerase (this polymerase is isolated from Thermus aquaticus, a bacterium that grows in hot springs). The process of heating, cooling, and new DNA synthesis is repeated many times until a large number of copies of the DNA is obtained. The process is automated, so each round of replication takes only a few minutes, and in 20 heating and cooling cycles, the DNA is amplified over a million-fold.