Race, ethnicity, and ancestry are often used in medicine and genetics, but their usefulness remains controversial. In this brief review, recent genetic findings relevant to these concepts are summarized. Individual ancestry is a more useful descriptor of one’s genetic constitution than are race or ethnicity, and ancestry can now be estimated genetically. In some cases, direct assessment of genetic variation in individual patients can offer more accurate diagnosis and treatment than self-reported race, ethnicity, or ancestry.
The Distribution of Human Genetic Diversity
On average, humans are heterozygous at about one in every 1000 DNA base pairs (bp). Thus, in terms of variation among single-nucleotide polymorphisms (SNPs), haploid human sequences are 99.9% identical but differ by about 3 million bp. In addition, each human is heterozygous for approximately 100 copy number variants (CNVs), which consist of contiguous segments of DNA sequence that are present in varying numbers of copies. Each CNV segment is at least 1000 bp in size, so CNVs account for an additional several million base-pair differences between haploid human sequences. Overall, the level of genetic diversity in humans is about half that of gorillas and central African chimpanzees, reflecting a relatively recent origin of our species, as well as the influence of bottlenecks in the size of human populations during our prehistory.
Because anatomically modern humans have a relatively recent origin not more than 200,000 years ago, and because our ancestors were located in a single continent, Africa, until 50,000 to 70,000 years ago, human populations are quite similar to one another. Most of our genetic variation, in fact, can be found within any major human population. About 85% to 90% of all human genetic variation would be observed, for example, in a sample of individuals from Great Britain or China. Only an additional 10% to 15% of variation is found if all human populations are assayed.
Population similarities can also be assessed by estimating the extent to which common SNPs (ie, those in which the frequency of the less common allele is >5%) are shared among major continental populations. These SNP alleles, which account for most human genetic variation, have attained high frequency because they originated a long time ago, before anatomically modern humans migrated out of Africa. Thus, the great majority of these SNPs are shared among individuals of African, Asian, or European origin and are not exclusive to a specific population. They differ only in frequency among populations. In contrast, most SNPs in which the less common allele is rarer (frequency <5%) have arisen since the major migration out of Africa, and insufficient time has elapsed for the allele to spread among all populations. These rarer alleles are much more likely to be specific to just one or a few human populations.
The African origin of humans predicts that more genetic variation should be found in Africa than anywhere else in the world. This is indeed the case, and genetic variation in human populations generally declines with greater distances from Africa (eg, Native Americans have less genetic diversity than do Asians or Europeans, who have less diversity than Africans). As expected, among SNPs that are not shared among major continental populations, more are found in African populations than in other continental populations. To better assess this diversity in genome-wide association studies (GWASs), SNP microarrays have been developed in which a larger number of African-specific SNPs have been included.