The Human Genome

Chapter 7 The Human Genome


Although the clockwork of life is similar in prokaryotes and eukaryotes, eukaryotes are more complex. Prokaryotes must be mean and lean to ensure fast reproduction. Therefore they keep their genomes small and regulate gene expression in simple yet efficient ways. Eukaryotes, however, require genetic complexity to control their complex cellular and organismal structures and sophisticated lifecycles.


Humans, for example, have 700 times more DNA than Escherichia coli, but they have only six times more genes (Table 7.1). This disparity comes from the fact that 90% of E. coli DNA, but only 1.3% of human DNA, codes for proteins.



Gene regulation is more important than gene number. Multicellular eukaryotes have (almost) the same genes in every cell of the body, but different genes are expressed in different cell types and at different stages during the development of the organism. This requires control mechanisms of extraordinary complexity.



Chromatin consists of DNA and histones


The prokaryotic chromosome is a naked circular DNA double helix with a length of about 1 mm. Eukaryotic chromosomes consists of a single linear DNA double helix with a length of several centimeters, which is tightly packaged with a set of histone proteins.


The histones are small basic proteins, with numerous positive charges on the side chains of lysine and arginine residues. These positive charges bind to the negatively charged phosphate groups of the DNA, and they neutralize at least 60% of the negative charges on DNA.


Eukaryotic cells have five major types of histones (Table 7.2). With the exception of histone H1, whose structure varies in different species and even in different tissues of the same organism, the histones are well conserved throughout the phylogenetic tree. For example, histones H3 and H4 from pea seedlings and calf thymus differ in only four and two amino acid positions, respectively. Presumably, the histones were invented by the very first eukaryotes, perhaps as early as 2 billion years ago, and have served the same essential functions ever since.


Table 7.2 Five Types of Histones



























Type Size (Amino Acids) Location
H1 215 Linker
H2A 129 Nucleosome core
H2B 125 Nucleosome core
H3 135 Nucleosome core
H4 102 Nucleosome core

Chromatin, named for its affinity for basic dyes such as hematoxylin and fuchsin, contains roughly equal amounts of DNA and histones. Euchromatin has a loose structure, whereas heterochromatin is more tightly condensed and deeper staining. Genes are actively transcribed in euchromatin but are repressed in heterochromatin.




Covalent histone modifications regulate DNA replication and transcription


Transcription can take place only when the 30-nm fiber has disintegrated into the loose structure of euchromatin, and histones have been displaced from the DNA. Whereas prokaryotic genes are transcribed unless transcription is prevented by a repressor, eukaryotic genes are silent unless the histones are removed from the DNA. The association between histones and DNA is regulated by covalent modifications of the histones:






These histone modifications are controlled by sequence-specific DNA binding proteins that regulate transcription (“transcription factors”) and by DNA methylation.



DNA methylation silences genes


About 3% of the cytosine in human DNA is methylated:



Methylcytosine is found in palindromic 5′-CG-3′ sequences, which carry the methyl mark on the cytosines of both strands. 5-Methylcytosine causes chromatin condensation and gene silencing, most likely by recruiting histone deacetylases. The methyl groups are introduced by two types of DNA methyltransferase: de novo DNA methyltransferases, which attach methyl groups to previously unmethylated CG sequences; and maintenance DNA methyltransferases, which methylate the new strand after DNA replication to complement a methyl mark on the old strand. Because of the maintenance DNA methyltransferases, DNA methylation is heritable through the cell generations. The term epigenetic inheritance is used to describe the transmission of DNA methylation patterns and histone modifications.


DNA methylation has several functions:









Telomerase is required (but not sufficient) for immortality


Replication of linear DNA in eukaryotic chromosomes poses a special problem. At the end of the chromosome, the leading strand can be extended to the very end of the template. The lagging strand, however, is synthesized in the opposite direction from small RNA primers. Even in the unlikely case that the last primer is at the very end of the template strand, its removal would leave a gap that cannot be filled by DNA polymerase (Fig. 7.3, A).



The enzyme telomerase solves this problem by adding the telomeric TTAGGG sequence to the overhanging 3′ end. No DNA template is available for this reaction; therefore, telomerase contains an RNA template. One section of this 150-nucleotide RNA is complementary to the telomeric repeat sequence. By base pairing with the DNA, it serves as a template for the elongation of the overhanging 3′ terminus. This extended 3′ end is then used as a template for the extension of the opposite strand (see Fig. 7.3, B and C). Telomerase qualifies as a reverse transcriptase, which is an enzyme that uses an RNA template for the synthesis of a complementary DNA.


Telomerase is required for immortality. The Olympic gods were immortal, so presumably they expressed telomerase in all their cells. However, in the human body, only the cells of the germ line are immortal. They have telomerase; therefore, egg and sperm have long telomeres. The expression of telomerase tapers off during fetal development, and from that time on, the cells lose 50 to 100 base pairs of DNA from the telomeres with every round of DNA replication.


For example, fibroblasts can be grown in cell culture but eventually die after a few dozen mitotic divisions. Fibroblasts taken from an infant survive longer in cell culture than those taken from a senior citizen. However, the best predictor of fibroblast lifespan is not the chronological age of the donor but the length of the telomeric DNA. Fibroblasts with long telomeres live long, and those with short telomeres die fast.


The telomeres bind protective proteins that hide the ends of the DNA. Without telomeres, the chromosome ends are recognized as broken DNA in need of repair, and misguided DNA repair systems produce haphazard chromosomal rearrangements. Usually, however, aged cells respond to undersized telomeres with growth arrest and programmed cell death long before the telomeres have disappeared altogether.


Cancer cells express telomerase and are immortal. In order to become malignant, a somatic cell not only has to escape the controls that normally limit its growth but also has to find ways to derepress its telomerase. This suggests that the lack of telomerase in somatic cells is not only a curse that seals humans’ earthly fate but also a protective mechanism to reduce the cancer risk.



Eukaryotic DNA replication requires three DNA polymerases


The mechanism of eukaryotic DNA replication is incompletely understood. Although eukaryotes use the same types of protein as E. coli, the details are different. For example, eukaryotes have a far greater number of DNA polymerases. The human genome encodes at least 14 DNA-dependent DNA polymerases. At least three of them participate routinely in DNA replication. The others are concerned with DNA repair or with DNA replication across sites of DNA damage.


In lagging strand synthesis, the primase is associated with DNA polymerase α. This composite enzyme synthesizes about 10 nucleotides of RNA primer followed by about 20 nucleotides of DNA, before relinquishing its product to DNA polymerase δ. Like its bacterial counterpart polymerase III, polymerase δ owes its high processivity to its association with a clamp protein that holds it on the DNA template (Fig. 7.4). The eukaryotic clamp protein is known as proliferating cell nuclear antigen (PCNA) because it was first identified in proliferating but not quiescent cells.



Eukaryotic Okazaki fragments are only 100 to 200 nucleotides long. When polymerase δ runs into the RNA primer of the preceding Okazaki fragment, the RNA primer is displaced from the DNA and removed by a nuclease, most commonly the flap endonuclease FEN1. Polymerase α has no proofreading 3′-exonuclease activity (Table 7.3), and its errors are most likely corrected by polymerase δ.



Synthesis of the leading strand is most likely performed by DNA polymerase ε, although polymerase δ can synthesize the leading strand and appears to be involved in leading strand synthesis in some situations.



Most human DNA does not code for proteins


Only 1.3% of the human genome codes for proteins. Genes are separated by vast expanses of noncoding DNA, including gene deserts extending over more than one million base pairs. Noncoding DNA is present even within the genes. Human genes are patchworks of exons, whose transcripts are processed to a mature mRNA, and introns. Introns are transcribed along with the exons but are excised from the transcript before the messenger RNA (mRNA) leaves the nucleus.


Human genes have between 1 and 178 exons, with an average of 8.8 exons and 7.8 introns. The average exon is about 145 base pairs long and codes for 48 amino acids, and the average polypeptide has a length of 440 amino acids. Introns are generally far longer than exons, and more than 90% of the DNA within genes belongs to introns (see Fig. 7.12 for an example).



Why human genes have introns, why they have so many of them, and why the introns are so long are not known. Except for some intronic sequences that contribute to the regulation of gene expression by binding regulatory proteins, introns appear to be useless junk DNA.


However, the intron-exon structure of human genes is important for evolution. Different structural and functional domains of a polypeptide are often encoded by separate exons. For example, the immunoglobulin chains consist of several globular domains with similar amino acid sequence and tertiary structure, each encoded by its own exon (see Chapter 15). Immunoglobulin genes most likely arose by repeated exon duplication from a single-exon gene.


In other cases, exons from different genes appear to have combined to form a new functional gene. This is called exon shuffling. The exons are the building blocks from which the multitude of eukaryotic genes has been assembled in the course of evolution.


Figure 7.5 shows an overview of the composition of the human genome. One commentator wrote about the human genome: “In some ways it may resemble your garage/bedroom/refrigerator/life: highly individualistic, but unkempt; little evidence of organization; much accumulated clutter (referred to by the uninitiated as ‘junk’); virtually nothing ever discarded; and the few patently valuable items indiscriminately, apparently carelessly, scattered throughout.”




Gene families originate by gene duplication


Most protein-coding genes are present in only one copy in the haploid genome, but duplicated genes, with two identical or near-identical copies close together on the same chromosome, are seen occasionally. Some genes that code for very abundant RNAs or proteins are present in multiple copies, including the ribosomal RNA (rRNA) genes (≈200 copies), the 5S rRNA gene (≈2000 copies), the histone genes (≈20 copies), and most of the transfer RNA (tRNA) genes. In most cases, identical or near-identical copies of the gene are arranged in tandem, head to tail over long stretches of DNA, separated by untranscribed spacers.


Gene families consist of two or more similar but not identical genes that, in most cases, are positioned close together on the chromosome. They arise during evolution by repeated gene duplications, mostly during crossing over in prophase of meiosis I when homologous chromosomes align in parallel and exchange DNA by homologous recombination. Normal crossing over is a strictly reciprocal process in which the chromosome neither gains nor loses genes. However, if the chromosomes are mispaired during crossing over, one chromosome acquires a deletion and the other a duplication (Fig. 7.6). Through new mutations, a duplicated gene can acquire new biological properties and functions.



In many cases, however, one of the duplication products acquires crippling mutations that prevent its transcription or translation. The result is called a pseudogene. Pseudogenes still have the intron-exon structure of the functional gene from which they were derived, and they are located close to their functional counterpart on the chromosome.





Many repetitive DNA sequences are (or were) mobile


About 45% of the human genome consists of repetitive sequences with lengths of a few hundred to several thousand base pairs. They are not aligned in tandem but are scattered throughout the genome as interspersed elements (Table 7.5). These elements are repetitive because they can insert copies of themselves into new genomic locations. These mobile elements can be understood as molecular parasites that infest the human genome.



DNA transposons contain a gene for a transposase enzyme that is flanked by inverted repeats. The transposase catalyzes the duplication of the transposon and the insertion of a copy in a new genomic location (see Chapter 10). DNA transposons were active in the genomes of early primates, but in the human lineage they mutated into nonfunctionality approximately 30 million years ago. Only their molecular fossils can still be inspected.


Jun 18, 2016 | Posted by in BIOCHEMISTRY | Comments Off on The Human Genome

Full access? Get Clinical Tree

Get Clinical Tree app for offline access