Search CORE

20,778 research outputs found

Bioinformatics challenges for genome-wide association studies

Author: Ahmed
Altshuler
Amundadottir
Askland
Bureau
Bush
Calle
Chang
Chanock
Cook
Culverhouse
Donnelly
Easton
Eiberg
Elbers
Emily
F. W. Asselbergs
Greene
Hahn
Hahn
Hirschhorn
Holmans
Infante
J. H. Moore
Jakobsdottir
Kooperberg
Kraft
Lewontin
Lou
Lunetta
Manolio
Manolio
Marchini
McKinney
McKinney
Mei
Millstein
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Motsinger
Namkung
Nelson
Pan
Pattin
Reich
Reif
Ripperger
Ritchie
Ritchie
Ritchie
S. M. Williams
Schork
Sinnott-Armstrong
Spencer
Thornton-Wells
Torkamani
Velez
Wang
Wilke
Williams
Wongseree
Yu
Yu
Zhang
Publication venue: Oxford University Press
Publication date: 15/02/2010
Field of study

Motivation: The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype–phenotype relationship that is characterized by significant heterogeneity and gene–gene and gene–environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods

CiteSeerX

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

PubMed Central

UCL Discovery

Dissertations of the University of Groningen

Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality

Author: Richard E. Neapolitan
Xia Jiang
Xiaofeng Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 12/10/2012
Field of study

Background: The interaction between loci to affect phenotype is called epistasis. It is strict epistasis if no proper subset of the interacting loci exhibits a marginal effect. For many diseases, it is likely that unknown epistatic interactions affect disease susceptibility. A difficulty when mining epistatic interactions from high-dimensional datasets concerns the curse of dimensionality. There are too many combinations of SNPs to perform an exhaustive search. A method that could locate strict epistasis without an exhaustive search can be considered the brass ring of methods for analyzing high-dimensional datasets. Methodology/Findings: A SNP pattern is a Bayesian network representing SNP-disease relationships. The Bayesian score for a SNP pattern is the probability of the data given the pattern, and has been used to learn SNP patterns. We identified a bound for the score of a SNP pattern. The bound provides an upper limit on the Bayesian score of any pattern that could be obtained by expanding a given pattern. We felt that the bound might enable the data to say something about the promise of expanding a 1-SNP pattern even when there are no marginal effects. We tested the bound using simulated datasets and semi-synthetic high-dimensional datasets obtained from GWAS datasets. We found that the bound was able to dramatically reduce the search time for strict epistasis. Using an Alzheimer's dataset, we showed that it is possible to discover an interaction involving the APOE gene based on its score because of its large marginal effect, but that the bound is most effective at discovering interactions without marginal effects. Conclusions/Significance: We conclude that the bound appears to ameliorate the curse of dimensionality in high-dimensional datasets. This is a very consequential result and could be pivotal in our efforts to reveal the dark matter of genetic disease risk from high-dimensional datasets. © 2012 Jiang, Neapolitan

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

Discover hidden splicing variations by mapping personal transcriptomes to personal genomes.

Author: Bahrami-Samani Emad
Lu Zhi-Xiang
Park Juw Won
Stein Shayna
Xing Yi
Publication venue: eScholarship, University of California
Publication date: 17/11/2015
Field of study

RNA-seq has become a popular technology for studying genetic variation of pre-mRNA alternative splicing. Commonly used RNA-seq aligners rely on the consensus splice site dinucleotide motifs to map reads across splice junctions. Consequently, genomic variants that create novel splice site dinucleotides may produce splice junction RNA-seq reads that cannot be mapped to the reference genome. We developed and evaluated an approach to identify 'hidden' splicing variations in personal transcriptomes, by mapping personal RNA-seq data to personal genomes. Computational analysis and experimental validation indicate that this approach identifies personal specific splice junctions at a low false positive rate. Applying this approach to an RNA-seq data set of 75 individuals, we identified 506 personal specific splice junctions, among which 437 were novel splice junctions not documented in current human transcript annotations. 94 splice junctions had splice site SNPs associated with GWAS signals of human traits and diseases. These involve genes whose splicing variations have been implicated in diseases (such as OAS1), as well as novel associations between alternative splicing and diseases (such as ICA1). Collectively, our work demonstrates that the personal genome approach to RNA-seq read alignment enables the discovery of a large but previously unknown catalog of splicing variations in human populations

PubMed Central

eScholarship - University of California

Population Genetics in the Genomic Era

Author: Shuhua Xu
Wenfei Jin
Publication venue: 'IntechOpen'
Publication date: 22/08/2012
Field of study

IntechOpen

Immunochip analysis identifies multiple susceptibility loci for systemic sclerosis

Author: Airó Paolo
Alarcón-Riquelme Marta
AlKassab Firas
Arnett Frank C.
Assassi Shervin
Baron Murray
Beretta Lorenzo
Bossini-Castillo Lara
Broen Jasper
Brown Matthew
Carmona Francisco David
Carreira Patricia
Castellví Iván
Chen Wei V.
Denton Christopher
Distler Jörg H.W.
Docherty Peter
Fessler Barri J.
Fonseca Carmen
Frech Tracy M.
Furst Daniel E.
González-Gay Miguel Ángel
Gorlova Olga
Gregersen Peter K.
Guerra Sandra
Herrick Ariane
Hesselstrand Roger
Hinchcliff Monique E.
Hudson Marie
Hummers Laura K.
Hunzelmann Nicolas
Jones Henry Niall
Kaminska Elzbieta
Khalidi Nader
Khanna Dinesh
Koeleman Bobby P.
Kreuter Alexander
Lafyatis Robert A.
Lee Annette T.
Lunardi Claudio
López-Isac Elena
Markland Janet
Martin Javier
Martin José Ezequiel
Mayes Maureen D.
Molitor Jerry A.
Nordin Annika
Ochoa Eguzkine
Padyukov Leonid
Phillips Kristin
Pope Janet E.
Radstake Timothy R.D.J.
Reveille John D.
Riemekasten Gabriela
Robinson David
Schiopu Elena
Schuerwegh Annemie J.
Segal Barbara M.
Shiels Paul
Silver Richard M.
Simeón Carmen Pilar
Simms Robert W.
Steen Virginia D.
Tan Filemon K.
Teruel María
van Laar Jacob M.
Varga John
Voskuyl Alexandre E.
Wigley Fredrick M.
Wijmenga Cisca
Witte Torsten
Worthington Jane
Ying Jun
Zhernakova Alexandra
Zhou Xiaodong
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

In this study, 1,833 systemic sclerosis (SSc) cases and 3,466 controls were genotyped with the Immunochip array. Classical alleles, amino acid residues, and SNPs across the human leukocyte antigen (HLA) region were imputed and tested. These analyses resulted in a model composed of six polymorphic amino acid positions and seven SNPs that explained the observed significant associations in the region. In addition, a replication step comprising 4,017 SSc cases and 5,935 controls was carried out for several selected non-HLA variants, reaching a total of 5,850 cases and 9,401 controls of European ancestry. Following this strategy, we identified and validated three SSc risk loci, including DNASE1L3 at 3p14, the SCHIP1-IL12A locus at 3q25, and ATG5 at 6q21, as well as a suggested association of the TREH-DDX6 locus at 11q23. The associations of several previously reported SSc risk loci were validated and further refined, and the observed peak of association in PXK was related to DNASE1L3. Our study has increased the number of known genetic associations with SSc, provided further insight into the pleiotropic effects of shared autoimmune risk factors, and highlighted the power of dense mapping for detecting previously overlooked susceptibility loci

Lund University Publications

Kölner UniversitätsPublikationsServer

PubMed Central

Enlighten

The University of Manchester - Institutional Repository