Genome-wide mapping and functional analysis of copy number variation in the human genome

Abstract

Copy Number Variation (CNV) has emerged as a major source of human genomic variation comprising benign and pathological variants. These deletions and duplications of genomic regions form a size continuum from small indels to whole chromosomal aneuploidies, and have been mapped by dozens of studies employing diverse methods. Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. There are several microarray-based technologies for mapping CNVs utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping and combination platforms. We developed methods to refine the mapping of CNVs in the human genome using high-resolution and high-throughput microarray technologies. We developed several aCGH platforms targeted to specific genomic regions in order to map CNVs in these regions of interest with high breakpoint accuracy. We also attempted to develop a multiplexed aCGH protocol to allow four different samples to be hybridized to the same array, thereby increasing the array CNV mapping efficiency. Alongside our efforts, several commercial array- based CNV detection platforms became available. We quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878 (Haraksingh et al. 2011). We found significant differences in performance and that sensitivity, total number, size range, and breakpoint resolution of CNV calls were highest for CNV focused arrays. Despite the rapidly growing appreciation for the extent of CNV in the human genome, evidence for their functional consequences remains limited. It is clear, that CNVs are theoretically capable of reorganizing functional elements of the genome by altering gene dosage, coding segments, and regulatory regions. Recently, several association studies have suggested that CNVs significantly impact certain disease phenotypes. Performing CNV-phenotype association studies requires cost-effective, unbiased, genome-wide, high-resolution mapping of common and rare CNVs. We used some of the best performing array-based technologies from our comparison to investigate the association of CNVs with various phenotypes; the NimbleGen 2.1 M CNV array for hereditary hearing loss, our custom NimbleGen Functional Elements and Variable Regions (FEVR) array for melanoma, our custom NimbleGen lexinome array for dyslexia, and the NimbleGen 2.1 M WG array for basal cell carcinoma. We found a relatively strong association between a deletion on chromosome 16 and hearing loss (Odds Ratio = 3.41). In addition, we investigated whether certain pathways were enriched for CNVs in cases versus controls, and whether the cases had a higher CNV load than the controls. Both of these analyses showed no difference in CNV load between the cases and controls. We found several other CNVs in genes already known to be associated with hearing loss, indicating the existence of multiple causative alleles in this sample set. We also found weaker CNV associations to melanoma and basal cell carcinoma. Finally, we attempted to measure the direct effects of CNVs on transcription using RNA-seq on 42 lymphoblastoid cell lines each containing one of three large CNVs known to be associated with Schizophrenia. We found that copy number within these large CNVs is generally not predictive of transcriptional activity indicating that complex dosage compensation mechanisms may exist. This work highlights the importance of high-resolution mapping of CNVs to understand their role in human genomic variation and their biological relevance

    Similar works

    Full text

    thumbnail-image

    Available Versions