10 research outputs found
Population Stratification of a Common APOBEC Gene Deletion Polymorphism
The APOBEC3 gene family plays a role in innate cellular immunity inhibiting retroviral infection, hepatitis B virus propagation, and the retrotransposition of endogenous elements. We present a detailed sequence and population genetic analysis of a 29.5-kb common human deletion polymorphism that removes the APOBEC3B gene. We developed a PCR-based genotyping assay, characterized 1,277 human diversity samples, and found that the frequency of the deletion allele varies significantly among major continental groups (global F (ST) = 0.2843). The deletion is rare in Africans and Europeans (frequency of 0.9% and 6%), more common in East Asians and Amerindians (36.9% and 57.7%), and almost fixed in Oceanic populations (92.9%). Despite a worldwide frequency of 22.5%, analysis of data from the International HapMap Project reveals that no single existing tag single nucleotide polymorphism may serve as a surrogate for the deletion variant, emphasizing that without careful analysis its phenotypic impact may be overlooked in association studies. Application of haplotype-based tests for selection revealed potential pitfalls in the direct application of existing methods to the analysis of genomic structural variation. These data emphasize the importance of directly genotyping structural variation in association studies and of accurately resolving variant breakpoints before proceeding with more detailed population-genetic analysis
A genome-wide survey of structural variation between human and chimpanzee
Structural changes (deletions, insertions, and inversions) between human and chimpanzee genomes have likely had a significant impact on lineage-specific evolution because of their potential for dramatic and irreversible mutation. The low-quality nature of the current chimpanzee genome assembly precludes the reliable identification of many of these differences. To circumvent this, we applied a method to optimally map chimpanzee fosmid paired-end sequences against the human genome to systematically identify sites of structural variation ≥12 kb between the two species. Our analysis yielded a total of 651 putative sites of chimpanzee deletion (n = 293), insertions (n = 184), and rearrangements consistent with local inversions between the two genomes (n = 174). We validated a subset (19/23) of insertion and deletions using PCR and Southern blot assays, confirming the accuracy of our method. The events are distributed throughout the genome on all chromosomes but are highly correlated with sites of segmental duplication in human and chimpanzee. These structural variants encompass at least 24 Mb of DNA and overlap with >245 genes. Seventeen of these genes contain exons missing in the chimpanzee genomic sequence and also show a significant reduction in gene expression in chimpanzee. Compared with the pioneering work of Yunis, Prakash, Dutrillaux, and Lejeune, this analysis expands the number of potential rearrangements between chimpanzees and humans 50-fold. Furthermore, this work prioritizes regions for further finishing in the chimpanzee genome and provides a resource for interrogating functional differences between humans and chimpanzees
High-throughput genotyping of intermediate-size structural variation
The contribution of large-scale and intermediate-size structural variation (ISV) to human genetic disease and disease susceptibility is only beginning to be understood. The development of high-throughput genotyping technologies is one of the most critical aspects for future studies of linkage disequilibrium (LD) and disease association. Using a simple PCR-based method designed to assay the junctions of the breakpoints, we gen-otyped seven simple insertion and deletion polymorphisms ranging in size from 6.3 to 24.7 kb among 90 CEPH individuals. We then extended this analysis to a larger collection of samples (n 5 460) by application of an oligonucleotide extension–ligation genotyping assay. The analysis showed a high level of concor-dance (99%) when compared with PCR/sequence-validated genotypes. Using the available HapMap data, we observed significant LD (r 2 5 0.74–0.95) between each ISV and flanking single nucleotide polymorph-isms, but this observation is likely to hold only for similar simple insertion/deletion events. The approach we describe may be used to characterize a large number of individuals in a cost-effective manner once the sequence organization of ISVs is known
Linkage Disequilibrium and Heritability of Copy-Number Polymorphisms within Duplicated Regions of the Human Genome
Studies of copy-number variation and linkage disequilibrium (LD) have typically excluded complex regions of the genome that are rich in duplications and prone to rearrangement. In an attempt to assess the heritability and LD of copy-number polymorphisms (CNPs) in duplication-rich regions of the genome, we profiled copy-number variation in 130 putative “rearrangement hotspot regions” among 269 individuals of European, Yoruba, Chinese, and Japanese ancestry analyzed by the International HapMap Consortium. Eighty-four hotspot regions, corresponding to 257 bacterial artificial chromosome (BAC) probes, showed evidence of copy-number differences. Despite a predisposing genetic architecture, no polymorphism was ever observed in the remaining 46 “rearrangement hotspots,” and we suggest these represent excellent candidate sites for pathogenic rearrangements. We used a combination of BAC-based and high-density customized oligonucleotide arrays to resolve the molecular basis of structural rearrangements. For common variants (frequency >10%), we observed a distinct bias against copy-number losses, suggesting that deletions are subject to purifying selection. Heritability estimates did not differ significantly from 1.0 among the majority (30 of 34) of loci analyzed, consistent with normal Mendelian inheritance. Some of the CNPs in duplication-rich regions showed strong LD with nearby single-nucleotide polymorphisms (SNPs) and were observed to segregate on ancestral SNP haplotypes. However, LD with the best available SNP markers was weaker than has been reported for deletion polymorphisms in less complex regions of the genome. These observations may be accounted for by a low density of SNP data in duplicated regions, challenges in mapping and typing the CNPs, and the possibility that CNPs in these regions have rearranged on multiple haplotype backgrounds. Our results underscore the need for complete maps of genetic variation in duplication-rich regions of the genome