During the past 9 years, copy number variation (CNV) has emerged as a highly prevalent form of genomic variation. CNVs are smaller than the chromosomal aberrations observed microscopically by cytogeneticists, but larger than single-nucleotide polymorphisms (SNPs) and insertion or deletion (indel) mutations. Many CNVs are simply normal genetic variants that do not contribute to a clinically recognizable phenotype. Other CNVs predispose or are significantly associated to conditions of medical consequence.
In clinical genetics, the challenge has been to understand (and ultimately predict accurately) the relationship between genomic changes and clinical phenotypes and outcomes. This is compounded by the complexity of the data being generated by genome-wide genetic tests.
Definition of Copy Number Variants
When the human genome was completed on April 14, 2003, 50 years after the discovery of the structure of the double helix, scientists concluded that they had deciphered the complete DNA sequence of essentially every human. This was based on the notion that the genomes of healthy individuals were 99.9% identical and that the major genetic differences between any two individual was in the form of scattered single base changes (SNPs). In 2004, two papers showed that the genomes of healthy individuals were actually a lot more different from one another than previously thought. Using genome-wide array comparative genomic hybridization (CGH) platforms, hundreds of genomic regions were found to vary between individuals—not significantly with respect to the actual DNA sequence—but rather with respect to the number of copies an individual had of each DNA segment. These CNVs are now operationally defined as a DNA segment, longer than 1 kb, with a variable copy number compared to a reference genome.
CNVs are scattered throughout the human genome and it has been estimated that as many as 1500 CNVs can be found in a single individual’s genome. When one examines all CNVs in a given individual, there are clearly more smaller-sized CNVs compared to large-sized CNVs. While some CNVs in healthy individuals can be greater than 1 Mb in size, the median size of CNVs is approximately 2.9 kb. Taken together, when comparing the genomes of two individuals, approximately 0.8% of their genomes differ with respect to CNVs.
Simple CNVs can take the form of genomic losses (deletions) or copy number gains (duplications or amplifications). Duplications can occur in tandem or elsewhere on the same chromosome or even on different chromosomes (ie, insertional duplication). CNVs may involve whole genes, portion of genes, multiples of contiguous genes, regulatory elements, or none of the above. Clearly, the nature and extent of genomic material that is gained or lost is undoubtedly important for the phenotypic consequences.
The DNA sequences at the edges of CNVs are important as they often yield clues as to how the CNV was generated. For example, if a given CNV is flanked by nearly identical blocks of sequences (also known as segmental duplications or low copy repeats) or by Alu or LINE repetitive elements, misalignment of DNA strands during meiotic recombination can lead to a process called nonallelic homologous recombination (NAHR). This process was first suggested as the basis for genomic duplications that led to Charcot-Marie-Tooth disease type 1A (CMT1A). Such recurrent changes have now been associated with many other genomic disorders.
Other mechanisms have become appreciated for generating CNVs. With the advent of next-generation DNA sequencing, large-scale projects to sequence whole genomes (eg, the 1000 Genomes Project) has provided nucleotide-resolution breakpoint information for over 10,000 common CNVs. Based on this dataset, the majority of common deletions (~65%) are generated via nonhomologous end joining (NHEJ) mechanisms. For these CNVs, two base microhomologies were found at the CNV breakpoints. For more complex CNVs, microhomology-mediated break-induced replication is one mechanism that has been proposed.