With completion of the reference human genome sequence, much attention has turned to the discovery and cataloguing of variation in sequence among different individuals (including both healthy individuals and those with various diseases) and among different populations around the globe. As we will explore in much more detail in Chapter 4, there are many tens of millions of common sequence variants that are seen at significant frequency in one or more populations; any given individual carries at least 5 million of these sequence variants. In addition, there are countless very rare variants, many of which probably exist in only a single or a few individuals. In fact, given the number of individuals in our species, essentially each and every base pair in the human genome is expected to vary in someone somewhere around the globe. It is for this reason that the original human genome sequence is considered a “reference” sequence for our species, but one that is actually identical to no individual’s genome. Early estimates were that any two randomly selected individuals would have sequences that are 99.9% identical or, put another way, that an individual genome would carry two different versions (alleles) of the human genome sequence at some 3 to 5 million positions, with different bases (e.g., a T or a G) at the maternally and paternally inherited copies of that particular sequence position (see Fig. 2-6). Although many of these allelic differences involve simply one nucleotide, much of the variation consists of insertions or deletions of (usually) short sequence stretches, variation in the number of copies of repeated elements (including genes), or inversions in the order of sequences at a particular position (locus) in the genome (see Chapter 4). The total amount of the genome involved in such variation is now known to be substantially more than originally estimated and approaches 0.5% between any two randomly selected individuals. As will be addressed in future chapters, any and all of these types of variation can influence biological function and thus must be accounted for in any attempt to understand the contribution of genetics to human health.
Variation in the Human Genome