4,309 research outputs found

    A robust clustering algorithm for identifying problematic samples in genome-wide association studies

    Get PDF
    Summary: High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections

    Computational analysis of the LRRK2 interactome.

    Get PDF
    LRRK2 was identified in 2004 as the causative protein product of the Parkinson's disease locus designated PARK8. In the decade since then, genetic studies have revealed at least 6 dominant mutations in LRRK2 linked to Parkinson's disease, alongside one associated with cancer. It is now well established that coding changes in LRRK2 are one of the most common causes of Parkinson's. Genome-wide association studies (GWAs) have, more recently, reported single nucleotide polymorphisms (SNPs) around the LRRK2 locus to be associated with risk of developing sporadic Parkinson's disease and inflammatory bowel disorder. The functional research that has followed these genetic breakthroughs has generated an extensive literature regarding LRRK2 pathophysiology; however, there is still no consensus as to the biological function of LRRK2. To provide insight into the aspects of cell biology that are consistently related to LRRK2 activity, we analysed the plethora of candidate LRRK2 interactors available through the BioGRID and IntAct data repositories. We then performed GO terms enrichment for the LRRK2 interactome. We found that, in two different enrichment portals, the LRRK2 interactome was associated with terms referring to transport, cellular organization, vesicles and the cytoskeleton. We also verified that 21 of the LRRK2 interactors are genetically linked to risk for Parkinson's disease or inflammatory bowel disorder. The implications of these findings are discussed, with particular regard to potential novel areas of investigation

    The geography of recent genetic ancestry across Europe

    Get PDF
    The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of recent genealogical ancestry over the past three thousand years at a continental scale. We detected 1.9 million shared genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 10-50 genetic common ancestors from the last 1500 years, and upwards of 500 genetic ancestors from the previous 1000 years. These numbers drop off exponentially with geographic distance, but since genetic ancestry is rare, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1000 years. There is substantial regional variation in the number of shared genetic ancestors: especially high numbers of common ancestors between many eastern populations likely date to the Slavic and/or Hunnic expansions, while much lower levels of common ancestry in the Italian and Iberian peninsulas may indicate weaker demographic effects of Germanic expansions into these areas and/or more stably structured populations. Recent shared ancestry in modern Europeans is ubiquitous, and clearly shows the impact of both small-scale migration and large historical events. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.Comment: Full size figures available from http://www.eve.ucdavis.edu/~plralph/research.html; or html version at http://ralphlab.usc.edu/ibd/ibd-paper/ibd-writeup.xhtm

    Exome sequencing and genotyping identify a rare variant in NLRP7 gene associated with ulcerative colitis.

    Get PDF
    Background and Aims Although genome-wide association studies [GWAS] in inflammatory bowel disease [IBD] have identified a large number of common disease susceptibility alleles for both Crohn’s disease [CD] and ulcerative colitis [UC], a substantial fraction of IBD heritability remains unexplained, suggesting that rare coding genetic variants may also have a role in pathogenesis. We used high-throughput sequencing in families with multiple cases of IBD, followed by genotyping of cases and controls, to investigate whether rare protein-altering genetic variants are associated with susceptibility to IBD. Methods Whole-exome sequencing was carried out in 10 families in whom three or more individuals were affected with IBD. A stepwise filtering approach was applied to exome variants, to identify potential causal variants. Follow-up genotyping was performed in 6025 IBD cases [2948 CD; 3077 UC] and 7238 controls. Results Our exome variant analysis revealed coding variants in the NLRP7 gene that were present in affected individuals in two distinct families. Genotyping of the two variants, p.S361L and p.R801H, in IBD cases and controls showed that the p.S361L variant was significantly associated with an increased risk of ulcerative colitis [odds ratio 4.79, p = 0.0039] and IBD [odds ratio 3.17, p = 0.037]. A combined analysis of both variants showed suggestive association with an increased risk of IBD [odds ratio 2.77, p = 0.018]. Conclusions The results suggest that NLRP7 signalling and inflammasome formation may be a significant component in the pathogenesis of IBD

    The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.

    Get PDF
    Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets
    corecore