Gene sequences occupy only about 2% of the human genome. Following the successful genome project that enabled the sequence of bases in the human genome (3 billion) to be elucidated, it is now thought that there are only approximately 20–25 000 genes in the human genome. The term genotype is used to describe the genes present in a genome. However, in different cells and tissues not all the same genes will be expressed. The term phenotype is used to describe which genes are expressed in different cells or tissues. Gene expression refers to the processes in which the information in a gene in DNA is used to produce the corresponding protein in a cell. It involves the transcription of the DNA sequence into RNA and then the translation of RNA into proteins. Gene expression is very carefully regulated in order to control not only how much of a protein is made but also when particular proteins are synthesised, e.g. during development or differentiation of cell types. Although cells can control transcription and translation, the major control of gene expression is at the level of transcription.
Transcription (Figure 16.1)
Cells contain three different types of RNA molecules: mRNA, rRNA and tRNA (Chapter 15). All three types of RNA are transcribed from genes present in DNA. Only mRNA, however, is used as a template for new protein synthesis. rRNA and tRNA are used in the translation process. In eukaryotic cells, transcription is carried out by three different DNA-dependent RNA polymerases:
- pol I transcribes rRNA genes and is located in the nucleolus;
- pol II transcribes mRNA genes;
- pol III transcribes tRNA genes and other small RNA molecules.
Pol II and III are found in the nucleus. RNA polymerase enzymes are multisubunit structures containing approximately 16 subunits and are more complex than the single RNA polymerase enzyme found in bacteria.
To understand the series of events that produce an mRNA molecule from a gene it is necessary to understand the eukaryotic gene structure. Eukaryotic genes are made up of exons, regions of DNA that will code for the mature mRNA, and introns, regions of DNA that will be transcribed into RNA but later removed during post-transcriptional processing. RNA polymerase II will transcribe the full length of the gene using the template strand as a template so that the mRNA sequence is the same as that in the DNA-coding strand. Only one DNA strand is copied. This initial product is called heterogeneous nuclear RNA (hnRNA). Three important post-transcriptional processing events occur while the hnRNA is still in the nucleus (see Figure 16.1).
Following completion of these events the product is mRNA. It now moves out of the nucleus, through the nuclear pores, and is located in the cytosol where translation occurs.
Control of transcription
As mentioned above, transcription is very well regulated, specifically transcription initiation. This regulation is exerted through a combination of special DNA sequences and proteins known as transcription factors, which recognise and bind to the control elements in DNA. One major DNA control element is the promoter situated at the 5′ end of the gene. Eukaryotic promoters have several features in common. The promoter sequence will allow the polymerase to bind and start transcription at the correct base. A specific conserved sequence was identified approximately 25–35 bases upstream of the transcription start site. This sequence is rich in A and T bases and is known as the TATA box. There are other special DNA sequences up to 200 bases upstream of the transcription start site that also help regulate the level of gene transcription, and even others that can be many hundreds or thousands of bases upstream of the gene. The latter are known as enhancers or silencers because they can increase or decrease the frequency of transcription.
The DNA-controlling elements at the 5′ end of the gene bind other proteins known as transcription factors. The initiation of transcription can be controlled by many of these proteins forming a multisubunit transcription complex at the promoter site. These will form in some cell types but not others, or at different times during development, and hence control transcription. Transcription factors have several distinctive protein motifs that enable them to bind to DNA and to other proteins in the complex. These motifs include zinc fingers, helix loop helix motifs and leucine zippers.
A recent development has been the discovery of alternative splicing. During the removal of introns from a hnRNA the exons may be spliced in different ways in different tissues. This increases the diversity of proteins that can be made from the relatively small number of genes present in the human genome. An example of this phenomenon is tropomyosin. This is used in different contractile systems in different cells and the specific form required is produced by alternative splicing of the single hnRNA rather than having different tropomyosin genes in the different cell types.