The most extensive current inventory of the amount and type of variation to be expected in any given genome comes from the direct analysis of individual diploid human genomes. The first of such genome sequences, that of a male individual, was reported in 2007. Now, tens of thousands of individual genomes have been sequenced, some as part of large international research consortia exploring human genetic diversity in health and disease, and others in the context of clinical sequencing to determine the underlying basis of a disorder in particular patients. What degree of genome variation does one detect in such studies? Individual human genomes typically carry 5 to 10 million SNPs, of which—depending in part on the population—as many as a quarter to a third are novel (see Box). This suggests that the number of SNPs described for our species is still incomplete, although presumably the fraction of such novel SNPs will decrease as more and more genomes from more and more populations are sequenced. Within this variation lie variants with known, likely, or suspected clinical impact. Based on studies to date, each genome carries 50 to 100 variants that have previously been implicated in known inherited conditions. In addition, each genome carries thousands of nonsynonymous SNPs in protein-coding genes around the genome, some of which would be predicted to alter protein function. Each genome also carries approximately 200 to 300 likely loss-of-function mutations, some of which are present at both alleles of genes in that individual. Within the clinical setting, this realization has important implications for the interpretation of genome sequence data from patients, particularly when trying to predict the impact of mutations in genes of currently unknown function (see Chapter 16). An interesting and unanticipated aspect of individual genome sequencing is that the reference human genome assembly still lacks considerable amounts of undocumented and unannotated DNA that are discovered in literally every individual genome being sequenced. These “new” sequences are revealed only as additional genomes are sequenced. Thus the complete collection of all human genome sequences to be found in our current population of 7 billion individuals, estimated to be 20 to 40 Mb larger than the extant reference assembly, still remains to be fully elucidated. As impressive as the current inventory of human genetic diversity is, it is clear that we are still in a mode of discovery; no doubt millions of additional SNPs and other variants remain to be uncovered, as does the degree to which any of them might affect an individual’s clinical status in the context of wellness and health care. Variation Detected in a Typical Human Genome Individuals vary greatly in a wide range of biological functions, determined in part by variation among their genomes. Any individual genome will contain the following: • ≈5-10 million SNPs (varies by population) • 25,000-50,000 rare variants (private mutations or seen previously in < 0.5% of individuals tested) • ≈75 new base pair mutations not detected in parental genomes • 3-7 new CNVs involving ≈500 kb of DNA • 200,000-500,000 indels (1-50 bp) (varies by population) • 500-1000 deletions 1-45 kb, overlapping ≈200 genes • ≈200-250 shifts in reading frame • 10,000-12,000 synonymous SNPs • 8,000-11,000 nonsynonymous SNPs in 4,000-5,000 genes • 175-500 rare nonsynonymous variants • 1 new nonsynonymous mutation • 40-50 splice site-disrupting variants • 250-300 genes with likely loss-of-function variants
Variation in Individual Genomes
Stay updated, free articles. Join our Telegram channel
Full access? Get Clinical Tree