4,309 research outputs found
A robust clustering algorithm for identifying problematic samples in genome-wide association studies
Summary: High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections
Computational analysis of the LRRK2 interactome.
LRRK2 was identified in 2004 as the causative protein product of the Parkinson's disease locus designated PARK8. In the decade since then, genetic studies have revealed at least 6 dominant mutations in LRRK2 linked to Parkinson's disease, alongside one associated with cancer. It is now well established that coding changes in LRRK2 are one of the most common causes of Parkinson's. Genome-wide association studies (GWAs) have, more recently, reported single nucleotide polymorphisms (SNPs) around the LRRK2 locus to be associated with risk of developing sporadic Parkinson's disease and inflammatory bowel disorder. The functional research that has followed these genetic breakthroughs has generated an extensive literature regarding LRRK2 pathophysiology; however, there is still no consensus as to the biological function of LRRK2. To provide insight into the aspects of cell biology that are consistently related to LRRK2 activity, we analysed the plethora of candidate LRRK2 interactors available through the BioGRID and IntAct data repositories. We then performed GO terms enrichment for the LRRK2 interactome. We found that, in two different enrichment portals, the LRRK2 interactome was associated with terms referring to transport, cellular organization, vesicles and the cytoskeleton. We also verified that 21 of the LRRK2 interactors are genetically linked to risk for Parkinson's disease or inflammatory bowel disorder. The implications of these findings are discussed, with particular regard to potential novel areas of investigation
The impact of different DNA extraction kits and laboratories upon the assessment of human gut microbiota composition by 16S rRNA gene sequencing
Peer reviewedPublisher PD
A Meta-Analysis of Genome-Wide Association Scans Identifies IL18RAP, PTPN2, TAGAP, and PUS10 As Shared Risk Loci for Crohn's Disease and Celiac Disease
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Recommended from our members
An integrated clinical program and crowdsourcing strategy for genomic sequencing and Mendelian disease gene discovery.
Despite major progress in defining the genetic basis of Mendelian disorders, the molecular etiology of many cases remains unknown. Patients with these undiagnosed disorders often have complex presentations and require treatment by multiple health care specialists. Here, we describe an integrated clinical diagnostic and research program using whole-exome and whole-genome sequencing (WES/WGS) for Mendelian disease gene discovery. This program employs specific case ascertainment parameters, a WES/WGS computational analysis pipeline that is optimized for Mendelian disease gene discovery with variant callers tuned to specific inheritance modes, an interdisciplinary crowdsourcing strategy for genomic sequence analysis, matchmaking for additional cases, and integration of the findings regarding gene causality with the clinical management plan. The interdisciplinary gene discovery team includes clinical, computational, and experimental biomedical specialists who interact to identify the genetic etiology of the disease, and when so warranted, to devise improved or novel treatments for affected patients. This program effectively integrates the clinical and research missions of an academic medical center and affords both diagnostic and therapeutic options for patients suffering from genetic disease. It may therefore be germane to other academic medical institutions engaged in implementing genomic medicine programs
The geography of recent genetic ancestry across Europe
The recent genealogical history of human populations is a complex mosaic
formed by individual migration, large-scale population movements, and other
demographic events. Population genomics datasets can provide a window into this
recent history, as rare traces of recent shared genetic ancestry are detectable
due to long segments of shared genomic material. We make use of genomic data
for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of
recent genealogical ancestry over the past three thousand years at a
continental scale. We detected 1.9 million shared genomic segments, and used
the lengths of these to infer the distribution of shared ancestors across time
and geography. We find that a pair of modern Europeans living in neighboring
populations share around 10-50 genetic common ancestors from the last 1500
years, and upwards of 500 genetic ancestors from the previous 1000 years. These
numbers drop off exponentially with geographic distance, but since genetic
ancestry is rare, individuals from opposite ends of Europe are still expected
to share millions of common genealogical ancestors over the last 1000 years.
There is substantial regional variation in the number of shared genetic
ancestors: especially high numbers of common ancestors between many eastern
populations likely date to the Slavic and/or Hunnic expansions, while much
lower levels of common ancestry in the Italian and Iberian peninsulas may
indicate weaker demographic effects of Germanic expansions into these areas
and/or more stably structured populations. Recent shared ancestry in modern
Europeans is ubiquitous, and clearly shows the impact of both small-scale
migration and large historical events. Population genomic datasets have
considerable power to uncover recent demographic history, and will allow a much
fuller picture of the close genealogical kinship of individuals across the
world.Comment: Full size figures available from
http://www.eve.ucdavis.edu/~plralph/research.html; or html version at
http://ralphlab.usc.edu/ibd/ibd-paper/ibd-writeup.xhtm
Exome sequencing and genotyping identify a rare variant in NLRP7 gene associated with ulcerative colitis.
Background and Aims
Although genome-wide association studies [GWAS] in inflammatory bowel disease [IBD] have identified a large number of common disease susceptibility alleles for both Crohn’s disease [CD] and ulcerative colitis [UC], a substantial fraction of IBD heritability remains unexplained, suggesting that rare coding genetic variants may also have a role in pathogenesis. We used high-throughput sequencing in families with multiple cases of IBD, followed by genotyping of cases and controls, to investigate whether rare protein-altering genetic variants are associated with susceptibility to IBD.
Methods
Whole-exome sequencing was carried out in 10 families in whom three or more individuals were affected with IBD. A stepwise filtering approach was applied to exome variants, to identify potential causal variants. Follow-up genotyping was performed in 6025 IBD cases [2948 CD; 3077 UC] and 7238 controls.
Results
Our exome variant analysis revealed coding variants in the NLRP7 gene that were present in affected individuals in two distinct families. Genotyping of the two variants, p.S361L and p.R801H, in IBD cases and controls showed that the p.S361L variant was significantly associated with an increased risk of ulcerative colitis [odds ratio 4.79, p = 0.0039] and IBD [odds ratio 3.17, p = 0.037]. A combined analysis of both variants showed suggestive association with an increased risk of IBD [odds ratio 2.77, p = 0.018].
Conclusions
The results suggest that NLRP7 signalling and inflammasome formation may be a significant component in the pathogenesis of IBD
The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.
Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets
- …