Disease-Gene Identification

The identification of mutations that cause disease is a central focus of medical genetics. With the completion of the Human Genome Project (see Box 8.2 later in this chapter), the locations of virtually every human gene in the genome are now known. The availability of these data, along with dramatic advances in molecular genetic technology and important developments in the statistical analysis of genetic data, has greatly expedited the discovery of disease-causing mutations. However, the specific genetic alterations responsible for many inherited disease phenotypes remain unknown (i.e., which gene or genes contribute to which disease). Much research is now devoted to discovering these genetic mutations and their consequences. As this work progresses, our understanding of the biological basis of genetic disease will continue to increase.

Identifying disease-causing genes is an important step in the understanding, diagnosis, and eventual treatment of a genetic disease. When a disease-causing gene’s location has been pinpointed, it is often possible to provide a more accurate prognosis for persons at risk for a genetic disease. The DNA sequence that makes up the gene can be analyzed, and its RNA and/or protein product can be studied. This can contribute to our understanding of the cause of the disease. Furthermore, discovery of a disease-associated gene can open the way to modification of the abnormal gene product (e.g., preventing its expression) or the manufacture of a normal gene product through recombinant DNA techniques, permitting more effective treatment of a genetic disease. For example, recombinant clotting factor VIII is used to treat hemophilia A, as discussed in Chapter 5 , and recombinant insulin is used in the treatment of type 1 diabetes. Gene therapy—modifying genes of persons with a genetic disease—also becomes a possibility. Thus disease-gene discovery contributes directly to many of the primary goals of medical genetics.

This chapter discusses the approaches commonly used in gene mapping and identification. Frequently this process begins by mapping mutations that occur in affected persons to specific locations on chromosomes. Two major types of gene mapping can be distinguished. In gene mapping, the frequency of meiotic crossovers between loci is used to estimate distances between loci. Physical mapping involves using cytogenetic, molecular, and computational methods to determine the physical locations of disease-causing mutations on chromosomes.

Gene Mapping

Linkage Analysis

One of Gregor Mendel’s laws, the principle of independent assortment, states that an individual’s genes will be transmitted to the next generation independently of one another (see Chapter 4 ). Mendel was not aware that genes are located on chromosomes and that genes located near one another on the same chromosome are transmitted together rather than independently. The principle of independent assortment holds true for most pairs of loci, but not for those that occupy the same region of a chromosome. Such loci are said to be linked.

Fig. 8.1 depicts two loci, A and B, which are located close together on the same chromosome. A third locus, C, is located on another chromosome. In the individual in our example, each of these loci has two alleles, designated 1 and 2. A and B are linked, so A 1 and B 1 are inherited together. Because A and C are on different chromosomes and thus unlinked, their alleles do follow the principle of independent assortment. Hence, if the process of meiosis places A 1 in a gamete, the probability that C 1 will be found in the same gamete is 50%.

Fig. 8.1

Loci A and B are linked on the same chromosome, so alleles A 1 and B 1 are usually inherited together. Locus C is on a different chromosome, so it is not linked to A and B, and its alleles are transmitted independently of the alleles of A and B.

Recall from Chapter 2 that homologous chromosomes sometimes exchange portions of their DNA during prophase I (this is known as crossing over or crossover ). The average chromosome experiences one to three crossover events during meiosis. As a result of crossover, new combinations of alleles can be formed on a chromosome. Consider again the linked loci, A and B, in Fig. 8.1 . Alleles A 1 and B 1 are located close together on one chromosome, and alleles A 2 and B 2 are located on the homologous chromosome. The combination of alleles on each chromosome is a haplotype (from “haploid genotype”). The two haplotypes of this individual are denoted A 1 B 1 /A 2 B 2 . As Fig. 8.2A shows, in the absence of crossover A 1 B 1 will be found in one gamete and A 2 B 2 in the other. But when there is a crossover, new allele combinations, A 1 B 2 and A 2 B 1 , will be found in the two gametes ( Fig. 8.2B ). The process of forming such new arrangements of alleles is called recombination. Crossover does not necessarily lead to recombination, however, because a double crossover can occur between two loci, resulting in no recombination ( Fig. 8.2C ).

Fig. 8.2

The genetic results of crossover. A, No crossover: A 1 and B 1 remain together after meiosis. B, A crossover between A and B results in a recombination: A 1 and B 2 are inherited together on one chromosome, and A 2 and B 1 are inherited together on another chromosome. C, A double crossover between A and B results in no recombination of alleles.

Modified from McCance KL, Huether SE. Pathophysiology. The Biologic Basis for Disease in Adults and Children. 5th ed. St Louis: Mosby; 2005.

As Fig. 8.3 shows, crossovers are more likely to occur between loci that are situated far apart on a chromosome than between loci that are situated close together. Thus the distance between two loci can be inferred by estimating how frequently recombinations occur in families (this is called the recombination frequency). In a large series of meioses studied in families, if the alleles of A and B undergo recombination 5% of the time, then the recombination frequency for A and B is 5%.

Fig. 8.3

Crossover is more likely between loci that are far apart on chromosomes (left) than between those that are close together (right).

The genetic distance between two loci is measured in centimorgans (cM), in honor of T. H. Morgan, who discovered the process of crossing over in 1910. One cM is approximately equal to a recombination frequency of 1%. The relationship between recombination frequency and genetic distance is approximate, because double crossovers produce no recombination. The recombination frequency thus underestimates map distance, especially as the recombination frequency increases above about 10%. Mathematical formulae have been devised to correct for this underestimate.

Loci that are on the same chromosome are said to be syntenic (meaning “same thread”). If two syntenic loci are 50 cM apart, they are considered to be unlinked. This is because their recombination frequency of 50% is equivalent to independent transmission, as in the case of alleles of loci that are on different chromosomes. (To understand this, think of the chromosomes shown in Fig. 8.1 : If a person transmits allele A 1 , the probability that he or she also transmits allele C 1 , which is on another chromosome, is 50%, and the probability that he or she transmits allele C 2 is also 50%.)

Crossovers between loci on the same chromosome can produce recombination. Loci on the same chromosome that experience recombination less than 50% of the time are said to be linked. The distance between loci can be expressed in centimorgans (cM); 1 cM represents a recombination frequency of approximately 1%.

Recombination frequencies can be estimated by observing the transmission of genes in pedigrees. Fig. 8.4 is an example of a pedigree in which neurofibromatosis type 1 (NF1) is being transmitted. The members of this pedigree have also been typed for a two-allele single nucleotide polymorphism (SNP), which like the NF1 gene is located on chromosome 17. The SNP genotypes are shown below each individual’s number in the pedigree. Examination of generations I and II allows us to determine that under the hypothesis of linkage between NF1 and the SNP, the disease-causing mutation in the NF1 gene must be on the same copy of chromosome 17 as allele 1 of the SNP in this family. Individual I-2, who is homozygous for allele 2, is unaffected with the disease. Only the affected father (I-1), who is a heterozygote for the SNP, could have transmitted a copy of chromosome 17 that contains both the NF1 disease allele and SNP allele 1 to the daughter (II-2).

Fig. 8.4

A neurofibromatosis type 1 pedigree in which each member has been typed for a two-allele single nucleotide polymorphism (SNP). Genotypes for this two-allele marker locus are shown below each individual in the pedigree. Affected pedigree members are indicated by a shaded symbol.

The arrangement of these alleles on each chromosome is referred to as linkage phase. With the linkage phase known, individual II-2′s haplotypes would then be N1/n2, where N indicates the mutated allele causing NF1; n indicates the normal allele, and 1 and 2 are the two SNP alleles (in other words, individual II-2 has one copy of chromosome 17 that contains both the disease-causing mutation N and SNP allele 1, and her other copy of chromosome 17 contains the normal allele n and SNP allele 2 ). This woman’s husband (individual II-1) is not affected with the disease and is a homozygote for SNP allele 2. He must have the haplotypes n2/n2. If the NF1 locus and the SNP are linked, the children of this union who are affected with NF1 should usually have SNP allele 1, and those who are unaffected should have allele 2. In seven of eight children in generation III, we find this to be true. In one case, a recombination occurred (individual III-6). This gives a recombination frequency of 1/8, or 12.5%, supporting the hypothesis of linkage between NF1 and the SNP. A recombination frequency of 50% would support the hypothesis that the two loci are not linked. Note that the pedigree allows us to determine linkage phase in individual II-2, but we cannot determine whether a recombination took place in the gamete transmitted to II-2 by her father. Thus the recombination frequency is estimated only in the descendants of II-2. In actual practice, a much larger sample of families would be studied to ensure the statistical accuracy of this result.

Estimates of recombination frequencies are obtained by observing the transmission of alleles in families. Determination of the linkage phase (i.e., the chromosome copy on which each allele is located) is an important part of this procedure.

Polymorphisms that are used to follow a disease-causing allele through a family are termed markers (i.e., they can mark the chromosome on which a disease-causing allele is located). Because linked markers can be typed in an individual of any age (even in a fetus), they are useful for the early diagnosis of genetic disease (see Chapter 13 ). It is important to emphasize that a marker locus simply helps us to determine which member of a chromosome pair is being transmitted by a parent; the actual cause of the genetic disease is usually a nearby mutation, which may be identified in subsequent DNA sequencing analysis.

In general, 1 cM corresponds to approximately 1 million base pairs (1 Mb) of DNA. However, this is only an approximate relationship, because several factors are known to influence crossover rates. First, crossovers are roughly 1.5 times more common during female meiosis (oogenesis) than during male meiosis (spermatogenesis). Also, crossovers tend to be especially common near the telomeres of chromosomes. Finally, some small chromosome regions (one to a few kb in size) exhibit crossover rates that are at least 10-fold higher than elsewhere in the genome. The human genome contains thousands of these recombination hot spots, and they account for more than half of all recombination events.

Although there is a correlation between centimorgans and actual physical distances between loci, this relationship is complicated by sex differences in recombination, higher recombination frequencies near telomeres, and the existence of recombination hot spots.

LOD Scores: Determining the Significance of Linkage Results

As in any statistical study, we must be careful to ensure that the results obtained in a linkage analysis are not due simply to chance. For example, consider a two-allele marker locus that has been typed in a pedigree. It is possible by chance for all affected offspring to inherit one allele and for all unaffected offspring to inherit the other allele even if the marker is not linked to the disease-causing gene. This misleading result becomes less likely as we increase the number of subjects in our linkage study (just as the chance of a strong deviation from a 50/50 heads-to-tails ratio becomes smaller when we toss a coin many times).

How do we determine whether a linkage result is likely to be due to chance alone? We begin by comparing the likelihood (a likelihood is similar in concept to a probability) that two loci are linked at a given recombination frequency (denoted θ) versus the likelihood that the two loci are not linked (recombination frequency = 50%, or θ = 0.5). Suppose we wish to test the hypothesis that two loci are linked at a recombination frequency of θ = 0.1 versus the hypothesis that they are not linked. We use our pedigree data to form a likelihood ratio:

<SPAN role=presentation tabIndex=0 id=MathJax-Element-1-Frame class=MathJax style="POSITION: relative" data-mathml='likelihoodofobservingpedigreedataifθ=0.1likelihoodofobservingpedigreedataifθ=0.5′>likelihoodofobservingpedigreedataifθ=0.1likelihoodofobservingpedigreedataifθ=0.5likelihoodofobservingpedigreedataifθ=0.1likelihoodofobservingpedigreedataifθ=0.5

If our pedigree data indicate that θ is more likely to be 0.1 than 0.5, then the likelihood ratio (or odds) will be greater than 1.0. If, however, the pedigree data argue against linkage of the two loci, then the denominator will be greater than the numerator, and the ratio will be less than 1.0. For convenience, the common logarithm of the ratio is usually taken; this logarithm of the odds is termed an LOD score. Conventionally, an LOD score of 3.0 or more is accepted as evidence of linkage; a score of 3.0 indicates that the likelihood in favor of linkage is 1000 times greater than the likelihood against linkage. Conversely, an LOD score lower than −2.0 (odds of 100 to 1 against linkage) is considered to be evidence that two loci are not linked. Box 8.1 provides details on the calculation of LOD scores.

* Recall that the common logarithm (log 10 ) of a number is the power to which 10 is raised to obtain the number. The common logarithm of 100 is 2, the common logarithm of 1000 is 3, and so on.

The statistical odds that two loci are a given number of centimorgans apart can be calculated by measuring the ratio of two likelihoods: the likelihood of linkage at a given recombination frequency divided by the likelihood of no linkage. The logarithm of this odds ratio is an LOD score. LOD scores of 3.0 or higher are taken as evidence of linkage, and LOD scores lower than −2.0 are taken as evidence that the two loci are not linked.

Box 8.1

Estimating LOD Scores in Linkage Analysis

A simple example will help to illustrate the concepts of likelihood ratios and LOD scores. Consider the pedigree diagram in the figure below, which illustrates another family in which NF1 is being transmitted. The family has been typed for a two-allele SNP, as in Fig. 8.4 . The male in generation II must have received SNP allele 1 from his mother, because she can transmit only this marker allele. Therefore his copy of allele 2 had to come from his father, on the same chromosome copy as the NF1 disease gene (under the hypothesis of linkage). This allows us to establish linkage phase in this pedigree; the affected male in generation II must have the haplotypes N2/n1. He marries an unaffected woman who is a homozygote for allele 2. The hypothesis of close linkage (θ = 0.0) predicts that each child in generation III who receives allele 2 from their father must also receive the NF1 disease allele. Under the hypothesis of linkage, the father can transmit only two possible combinations: either the chromosome copy that carries both the disease-causing gene and allele 2 ( N2 haplotype), or the other chromosome copy, which has the normal gene and SNP allele 1 (haplotype n1 ). The probability of each of these events is 1/2. Therefore if θ = 0.0, the probability of observing five children with the genotypes shown in the figure below is (1/2) 5 , or 1/32 (i.e., the multiplication rule is applied to obtain the probability that all five of these events will occur together). This is the numerator of the likelihood ratio.

An NF1 pedigree in which each member has been typed for a 2-allele single nucleotide polymorphism (SNP). The SNP marker genotypes are shown below each individual in the pedigree.

Now consider the likelihood of observing these genotypes if the SNP and NF1 are not linked (θ = 0.5). Under this hypothesis, there is independent assortment of the two SNP alleles and NF1. The father could transmit any of four combinations (N1, N2, n1, and n2) with equal probability (1/4). The probability of observing five children with the observed genotypes would then be (1/4) 5 = 1/1024. This likelihood is the denominator of the likelihood ratio. The likelihood ratio is then 1/32 divided by 1/1024, or 32. Thus the data in this pedigree tell us that linkage, at θ = 0.0, is 32 times more likely than nonlinkage.

If we take the common logarithm of 32, we find that the LOD score is 1.5, which is still far short of the value of 3.0 usually accepted as evidence of linkage. To prove linkage, we would need to examine data from additional families. LOD scores obtained from individual families can be added together to obtain an overall score. (Note that mathematically, adding LOD scores is the same as multiplying the odds of linkage in each family together and then taking the logarithm of the result. This is another example of using the multiplication rule to assess the probability of co-occurrence.)

Suppose that a recombination had occurred in the meiosis producing III-5, the fifth child in generation III (i.e., she would retain the same marker genotype but would be affected with the disease rather than unaffected). This event is impossible under the hypothesis that θ = 0.0, so the numerator of the likelihood ratio becomes zero, and the LOD score for θ = 0.0 is –∞. It is possible, however, that the marker and disease loci are still linked, but at a recombination frequency greater than zero. Let us test, for example, the hypothesis that θ = 0.1. This hypothesis predicts that the disease allele, N, will be transmitted along with marker allele 2, 90% of the time and with SNP allele 1, 10% of the time (i.e., when a recombination has occurred). By the same reasoning, the normal allele n will be transmitted with SNP allele 1, 90% of the time and with SNP allele 2, 10% of the time. As in the previous example, the father can transmit either the normal allele or the disease allele with equal probability (0.5) to each child. Thus the probability of inheriting the disease allele with SNP allele 2 (haplotype N2 ) is 0.5 × 0.90 = 0.45, and the probability of inheriting the disease allele with SNP allele 1 (haplotype N1 ) is 0.5 × 0.1 = 0.05. The probability of inheriting the normal allele with marker 1 (n1) is 0.45, and the probability of inheriting the normal allele with marker 2 (n2) is 0.05. In either case then, the probability of receiving a nonrecombination (N2 or n1) is 0.45, and the probability of receiving a recombination (N1 or n2) is 0.05. We know that four of the children in generation III are nonrecombinants, and each of these events has a probability of 0.45. We know that one individual is a recombinant, and the probability of this event is 0.05. The probability of four nonrecombinations and one recombination occurring together in generation III is obtained by applying the multiplication rule: 0.45 4 × 0.05. This becomes the numerator for our LOD score calculation. As before, the denominator (the likelihood that θ = 0.5) is (1/4) 5 . The LOD score for θ = 0.1 is then given by log 10 [(0.45 4 × 0.05)/(1/4) 5 ] = 0.32.

To test the hypothesis that θ = 0.2 the approach just outlined is used again, with θ = 0.2 instead of θ = 0.1. This yields an LOD score of 0.42. It makes sense that the LOD score for θ = 0.2 is higher than that for θ = 0.1, because we know that one of five children (0.2) in generation III is a recombinant. Applying this formula to a series of possible θ values (0, 0.1, 0.2, 0.3, 0.4, and 0.5) shows that 0.2 yields the highest LOD score, as we would expect:

θ 0 0.1 0.2 0.3 0.4 0.5
LOD −∞ 0.32 0.42 0.36 0.22 0.0

Sometimes the linkage phase in a pedigree is not known. For example, if the grandparents in the figure above had not been typed, we would not know the linkage phase of the father in generation II. It is equally likely that his haplotypes are N2/n1 or N1/n2 (i.e., each combination has a probability of 1/2). Thus we need to take both possibilities into account. If he has the N2/n1 haplotypes, then the first four children are nonrecombinants, each with a probability of (1 – θ)/2, and the fifth child is a recombinant, with a probability of θ/2 (using the reasoning outlined previously). Under the hypothesis that θ = 0.1, the overall probability that the father has haplotypes N2/n1 and that the five children have the observed genotypes is 1/2(0.45 4 × 0.05) = 0.001. We now need to take the alternative phase into account (i.e., that the father has haplotypes N1/n2 ). Here the first four children would each be recombinants, with probability θ/2, and only the fifth child would be a nonrecombinant, with probability (1 − θ)/2. The probability that the father has the N1/n2 haplotypes and that the children have the observed genotypes is 1/2(0.45 × 0.05 4 ) = 0.000001. This probability is considerably smaller than the probability of the previous phase, which makes sense when we consider that under the hypothesis of linkage at θ = 0.1, four of five recombinants is an unlikely outcome. We can now consider the probability of either linkage phase in the father by adding the two probabilities together: 1/2(0.45 4 × 0.05) + 1/2(0.45 × 0.05 4 ). This becomes the numerator for the LOD score calculation. As before, the denominator (i.e., the probability that θ = 0.5) is simply (1/4) 5 = 1/1024. Then the total LOD score for unknown linkage phase at θ = 0.1 is log 10 [(1/2[0.45 4 × 0.05] + 1/2[0.45 × 0.05 4 ])/(1/1024)] = 0.02. As before, we can estimate LOD scores for each recombination frequency:

θ 0 0.1 0.2 0.3 0.4 0.5
LOD −∞ 0.02 0.12 0.09 0.03 0.0

Notice that each LOD score is lower than the corresponding score when linkage phase is known. This follows from the fact that a known linkage phase contributes useful information to allow more accurate estimation of the actual genotypes of the offspring.

LOD scores are often graphed against the θ values, as shown in the figure below. The highest LOD score on the graph is the maximum likelihood estimate of θ. That is, it is the most likely distance between the two loci being analyzed.

The logarithm of the odds (LOD) score (y-axis) is plotted against the recombination frequency (x-axis) to demonstrate the most likely recombination frequency for a pair of loci.

In practice, the analysis of human linkage data is not as simple as in these examples. Penetrance of the disease-causing gene may be incomplete, recombination frequencies differ between the sexes, and the mode of inheritance of the disease may be unclear. Consequently, linkage data are analyzed using one of several available computer software packages. Many of these packages also allow one to carry out multipoint mapping, an approach in which the map locations of several markers are estimated simultaneously.

Linkage Analysis and the Human Gene Map

Suppose that we are studying a disease-causing gene in a series of pedigrees, and we wish to map it to a specific chromosome location. Typically we would type the members of our pedigree for marker loci whose locations on each chromosome have been established. Using the techniques just described, we test for linkage between the disease-causing gene and each marker. Most of these tests would yield negative LOD scores, indicating no linkage between the marker and the disease-causing gene. Eventually this exercise will reveal linkage between the disease-causing gene and a marker or group of markers. Because of the large size of the human genome, thousands of markers are typically evaluated to find one or several that are linked to the disease-causing gene. Many important hereditary diseases have been localized using this approach, including cystic fibrosis, Huntington disease, Marfan syndrome, and neurofibromatosis type 1 (NF1).

Until the 1980s linkage analyses had little chance of success because there were only a few dozen useful polymorphic markers in the entire human genome. Thus it was unlikely that a disease-causing gene would be located near enough to a marker to yield a significant linkage result. This situation changed dramatically after thousands of new polymorphic markers were discovered throughout the genome (see Chapter 3 ). With efficient genotyping techniques and large numbers of markers, it is now commonplace to map a disease-causing gene very quickly.

To be useful for gene mapping, marker loci should be highly polymorphic (i.e., the locus has many different alleles in the population). A high degree of polymorphism ensures that most parents will be heterozygous for the marker locus, making it easier to establish linkage phase in families. Short tandem repeats (STRs) typically have many alleles and are easy to assay; they are therefore especially well-suited to gene mapping. In addition, marker loci should be numerous, so that close linkage to the disease-causing gene is likely. Hundreds of thousands of STRs and millions of SNPs have now been identified throughout the genome, so this requirement has been fulfilled. Each chromosome is now saturated with markers ( Fig. 8.5 ).

Fig. 8.5

A genetic map of chromosome 9, showing the locations of a large number of polymorphic markers. Because recombination rates are usually higher in female meiosis, the distances between markers (in centimorgans) are larger for females than for males.

From Attwood J, Chiano M, Collins A, et al. CEPH consortium map of chromosome 9. Genomics. 1994;19:203–214.

An example illustrates the importance of numerous polymorphic markers. Consider the pedigree in Fig. 8.6A . The affected man is a homozygote for a two-allele marker locus that is closely linked to the disease locus. The man’s wife is a heterozygote for the marker locus. Their affected daughter is homozygous for the marker locus. Based on these genotypes, it is impossible to determine linkage phase in this generation, so we cannot predict which children will be affected with the disorder and which will not. The mating in generation I is called an uninformative mating. In contrast, a marker locus with six alleles has been typed in the same family ( Fig. 8.6B ). Because the mother in generation I has two alleles that differ from those of the affected father, it is now possible to determine that the affected daughter in generation II has inherited the disease-causing allele on the same copy of the chromosome that contains marker allele 1. Because she married a man who has alleles 4 and 5, we can predict that each offspring who receives allele 1 from her will be affected, and each one who receives allele 2 will be unaffected. Exceptions will be due to recombination. This example demonstrates the value of highly polymorphic markers for both linkage analysis and diagnosis of genetic disease (see Chapter 13 ).

To be useful in gene mapping, linked markers should be numerous and highly polymorphic. A high degree of polymorphism in the marker locus increases the probability that matings will be informative for linkage analysis.

Fig. 8.6

An autosomal dominant disease-causing gene is segregating in this family. A, A closely linked two-allele marker polymorphism has been typed for each member of the family, but linkage phase cannot be determined (uninformative mating). B, A closely linked six-allele short tandem repeat (STR) polymorphism has been typed for each family member, and linkage phase can now be determined (informative mating).

The availability of many highly polymorphic markers throughout the genome helps researchers to narrow the location of a gene by direct observation of recombinations within families. Suppose that a series of marker polymorphisms, labeled A, B, C, D, and E, are all known to be closely linked to a disease-causing gene. The family shown in Fig. 8.7 has been typed for each marker, and we observe that individual II-2 carries marker alleles A 2 , B 2 , C 2 , D 2 , and E 2 on the same copy of the chromosome that contains the disease-causing mutation (linkage phase). The other (normal) copy of this chromosome in individual II-2 carries marker alleles A 1 , B 1 , C 1 , D 1 , and E 1 . Among the affected offspring in generation III, we see evidence of two recombinations. Individual III-2 clearly inherited marker allele A 1 from her affected mother (II-2), but she also inherited the disease-causing mutation from her mother. This tells us that there has been a recombination (crossover) between marker A and the disease-causing gene. Thus we now know that the region of the chromosome between marker A and the telomere cannot contain the disease-causing gene.

Fig. 8.7

A family in which markers A, B, C, D, and E have been typed and assessed for linkage with an autosomal dominant disease–causing mutation. As explained in the text, recombination is seen between the disease locus and marker A in individual III-2 and between the disease locus and marker D in individual III-5.

We observe another recombination in the gamete transmitted to individual III-5. In this case, the individual inherited markers D 1 and E 1 but also inherited the disease-causing mutation from II-2. This indicates that a crossover occurred between marker locus D and the disease-causing locus. We now know that the region between marker D and the centromere (including marker E ) cannot contain the disease-causing locus. These two key recombinations have thus allowed us to substantially narrow the region that contains the disease-causing locus. Additional analyses in other families could narrow the location even further, provided that additional recombinations can be observed. In this way it is often possible to narrow the location of a disease-causing locus to a region that is several centimorgans or so in size.

Sometimes a linkage analysis produces a total LOD score close to zero. This could mean simply that the pedigrees are uninformative (a LOD score of zero indicates that the likelihoods of linkage and nonlinkage are approximately equal, because 10 0 = 1). However, a total LOD score of zero can also result when one subset of families has positive LOD scores (indicating linkage), and another subset has negative LOD scores (arguing against linkage). This result would provide evidence of locus heterogeneity for the disease in question (see Chapter 4 ). For example, osteogenesis imperfecta type I may be caused by mutations on either chromosome 7 or chromosome 17 (see Chapter 4 ). A study of families with this disease could show linkage to markers on chromosome 17 in some families and linkage to chromosome 7 in others. Linkage analysis has helped to define locus heterogeneity in a large number of diseases, including retinitis pigmentosa, a major cause of blindness ( Clinical Commentary 8.1 ).

The direct observation of recombinations between marker loci and the disease-causing locus can help to narrow the size of the region that contains the disease-causing locus. In addition, linkage analysis sometimes shows that some affected families demonstrate linkage to markers in a given chromosome region and others do not. This is an indication of locus heterogeneity.

Clinical Commentary 8.1

Retinitis Pigmentosa: A Genetic Disorder Characterized by Locus Heterogeneity

Courtesy Dr. Donnell J. Creel, University of Utah Health Sciences Center.

Retinitis pigmentosa (RP) describes a collection of inherited retinal defects that together are the most common inherited cause of human blindness, affecting 1 in 3000 persons. The first clinical signs of RP are seen as the rod photoreceptor cells begin to die, causing night blindness. Rod electroretinogram (ERG) amplitudes are reduced or absent. With the death of rod cells, other tissue begins to degenerate as well. Cone cells die, and the vessels that supply blood to the retinal membranes begin to attenuate. This leads to a reduction in daytime vision. Patients develop tunnel vision, and most are legally blind by 40 years of age. The name retinitis pigmentosa comes from the pigments that are deposited on the retinal surface as pathological changes accumulate ( Fig. 8.8 ). RP is neither preventable nor curable, but there is evidence that its progress can be slowed somewhat by dietary supplementation with vitamin A.

Fig. 8.8

A fundus photograph illustrating clumps of pigment deposits and retinal blood vessel attenuation in retinitis pigmentosa.

RP is known to be inherited in different families in an autosomal dominant, autosomal recessive, or X-linked recessive fashion. These modes of inheritance account for approximately 15% to 25%, 5% to 20%, and 5% to 15% of RP cases, respectively (the mode of inheritance is unknown for approximately half of RP cases). In addition, a small number of cases are caused by mitochondrial mutations, and one form of RP is caused by the mutual occurrence of mutations at two different loci ( PRPH2 and ROM1, both of which encode structural components of photoreceptor outer segment disc membranes). This mode of inheritance is termed digenic . Genetic studies have demonstrated that mutations in any of 70 different genes can cause nonsyndromic RP (i.e., RP that does not occur as part of an identified syndrome), making this disease an example of extensive locus heterogeneity.

An early linkage analysis mapped an autosomal dominant form of RP to the long arm of chromosome 3. This was a significant finding, because the gene RHO that encodes rhodopsin had also been mapped to this region. Rhodopsin is the light-absorbing molecule that initiates the process of signal transduction in rod photoreceptor cells. Thus RHO was a reasonable candidate gene (see text) for RP. Linkage analysis was performed using a polymorphism located within RHO, and a LOD score of 20 was obtained for a recombination frequency of zero in a large Irish pedigree. Subsequently, more than 200 different mutations in RHO have been shown to cause RP, confirming the role of this locus in causing the disease. RHO mutations are estimated to account for 25% of autosomal dominant RP cases and roughly 5% of all RP cases.

Additional studies have identified mutations in genes involved in many different aspects of retinal degeneration. Some of these genes encode proteins involved, for example, in visual transduction (e.g., rhodopsin, the α subunit of the rod cyclic guanosine monophosphate [cGMP] cation-gated channel protein, and the α and β subunits of rod cGMP phosphodiesterase); photoreceptor structure (e.g., PRPH2 and ROM1 ); and retinal protein transport (e.g., ABCR ). Additional genes have been implicated in syndromes that include RP as one feature. For example, RP is seen in Leber congenital amaurosis (LCA), the most common hereditary visual disorder of children. About 10% to 20% of persons with RP have Usher syndrome, which has a number of subtypes and typically also involves vestibular dysfunction and sensorineural deafness. Another 5% of RP cases occur as part of the Bardet–Biedl syndrome, in which intellectual disability and obesity are also observed.

Collectively the 70 genes identified thus far in causing RP account for 50% to 60% of all disease cases. Genetic testing ( Chapter 13 ) for RP is typically carried out using gene panels in which the genes most commonly associated with the disease are sequenced. Some forms of RP, such as Leber congenital amaurosis, have now been treated successfully with gene therapy ( Chapter 13 ).

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Dec 29, 2019 | Posted by in HUMAN BIOLOGY & GENETICS | Comments Off on Disease-Gene Identification

Full access? Get Clinical Tree

Get Clinical Tree app for offline access