Search CORE

137 research outputs found

The choice of null distributions for detecting gene-gene interactions in genome-wide association studies

Author: A Niu
B Efron
B Med
C Greene
C Greene
C Herold
C Yang
C Yang
Can Yang
D Balding
D Evans
E Eichler
H Cordell
Hong Xue
J Marchini
J Moore
J Moore
K Kira
L Wiskott
M Nelson
M Park
M Ritchie
PC Phillips
Qiang Yang
R Culverhouse
R Klein
R Tibshirani
S Dudoit
S Dudoit
S Purcell
T Hastie
T Hastie
T Wu
T Zheng
W Li
Weichuan Yu
WTCCC
X Chen
X Wan
X Wan
Xiang Wan
Y Benjamini
Y Zhang
Zengyou He
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions

Author: A Motsinger
AL Tyler
B McKinney
BA McKinney
Casey S Greene
CS Greene
CS Greene
CS Greene
I Kononenko
J Hardy
J Jakobsdottir
Jason H Moore
Jeff Kiralis
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JN Hirschhorn
K Kira
L Beretta
M Robnik-Sikonja
M Robnik-Sikonja
MI McCarthy
MM Iles
Nadia M Penrod
P Kraft
RR Sokal
U Finckh
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Genome-wide association studies are becoming the de facto standard in the genetic analysis of common human diseases. Given the complexity and robustness of biological networks such diseases are unlikely to be the result of single points of failure but instead likely arise from the joint failure of two or more interacting components. The hope in genome-wide screens is that these points of failure can be linked to single nucleotide polymorphisms (SNPs) which confer disease susceptibility. Detecting interacting variants that lead to disease in the absence of single-gene effects is difficult however, and methods to exhaustively analyze sets of these variants for interactions are combinatorial in nature thus making them computationally infeasible. Efficient algorithms which can detect interacting SNPs are needed. ReliefF is one such promising algorithm, although it has low success rate for noisy datasets when the interaction effect is small. ReliefF has been paired with an iterative approach, Tuned ReliefF (TuRF), which improves the estimation of weights in noisy data but does not fundamentally change the underlying ReliefF algorithm. To improve the sensitivity of studies using these methods to detect small effects we introduce Spatially Uniform ReliefF (SURF). Results SURF's ability to detect interactions in this domain is significantly greater than that of ReliefF. Similarly SURF, in combination with the TuRF strategy significantly outperforms TuRF alone for SNP selection under an epistasis model. It is important to note that this success rate increase does not require an increase in algorithmic complexity and allows for increased success rate, even with the removal of a nuisance parameter from the algorithm. Conclusion Researchers performing genetic association studies and aiming to discover gene-gene interactions associated with increased disease susceptibility should use SURF in place of ReliefF. For instance, SURF should be used instead of ReliefF to filter a dataset before an exhaustive MDR analysis. This change increases the ability of a study to detect gene-gene interactions. The SURF algorithm is implemented in the open source Multifactor Dimensionality Reduction (MDR) software package available from <url>http://www.epistasis.org</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Bioinformatics challenges for genome-wide association studies

Author: Ahmed
Altshuler
Amundadottir
Askland
Bureau
Bush
Calle
Chang
Chanock
Cook
Culverhouse
Donnelly
Easton
Eiberg
Elbers
Emily
F. W. Asselbergs
Greene
Hahn
Hahn
Hirschhorn
Holmans
Infante
J. H. Moore
Jakobsdottir
Kooperberg
Kraft
Lewontin
Lou
Lunetta
Manolio
Manolio
Marchini
McKinney
McKinney
Mei
Millstein
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Motsinger
Namkung
Nelson
Pan
Pattin
Reich
Reif
Ripperger
Ritchie
Ritchie
Ritchie
S. M. Williams
Schork
Sinnott-Armstrong
Spencer
Thornton-Wells
Torkamani
Velez
Wang
Wilke
Williams
Wongseree
Yu
Yu
Zhang
Publication venue: Oxford University Press
Publication date: 15/02/2010
Field of study

Motivation: The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype–phenotype relationship that is characterized by significant heterogeneity and gene–gene and gene–environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods

CiteSeerX

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

PubMed Central

UCL Discovery

Dissertations of the University of Groningen

BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies

Author: Fan Xiaodan
Tang Nelson L. S.
Wan Xiang
Xue Hong
Yang Can
Yang Qiang
Yu Weichuan
Publication venue
Publication date: 01/01/2010
Field of study

Gene-gene interactions have long been recognized to be fundamentally important to understand genetic causes of complex disease traits. At present, identifying gene-gene interactions from genome-wide case-control studies is computationally and methodologically challenging. In this paper, we introduce a simple but powerful method, named `BOolean Operation based Screening and Testing'(BOOST). To discover unknown gene-gene interactions that underlie complex diseases, BOOST allows examining all pairwise interactions in genome-wide case-control studies in a remarkably fast manner. We have carried out interaction analyses on seven data sets from the Wellcome Trust Case Control Consortium (WTCCC). Each analysis took less than 60 hours on a standard 3.0 GHz desktop with 4G memory running Windows XP system. The interaction patterns identified from the type 1 diabetes data set display significant difference from those identified from the rheumatoid arthritis data set, while both data sets share a very similar hit region in the WTCCC report. BOOST has also identified many undiscovered interactions between genes in the major histocompatibility complex (MHC) region in the type 1 diabetes data set. In the coming era of large-scale interaction mapping in genome-wide case-control studies, our method can serve as a computationally and statistically useful tool.Comment: Submitte

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

PubMed Central

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Detecting purely epistatic multi-locus interactions by an omnibus permutation test on ensembles of two-locus analyses

Author: Assawamakin Anunchai
Chaiyaratana Nachol
Limwongse Chanin
Piroonratana Theera
Sinsomros Saravudh
Wongseree Waranyu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Purely epistatic multi-locus interactions cannot generally be detected via single-locus analysis in case-control studies of complex diseases. Recently, many two-locus and multi-locus analysis techniques have been shown to be promising for the epistasis detection. However, exhaustive multi-locus analysis requires prohibitively large computational efforts when problems involve large-scale or genome-wide data. Furthermore, there is no explicit proof that a combination of multiple two-locus analyses can lead to the correct identification of multi-locus interactions. Results The proposed 2LOmb algorithm performs an omnibus permutation test on ensembles of two-locus analyses. The algorithm consists of four main steps: two-locus analysis, a permutation test, global <it>p</it>-value determination and a progressive search for the best ensemble. 2LOmb is benchmarked against an exhaustive two-locus analysis technique, a set association approach, a correlation-based feature selection (CFS) technique and a tuned ReliefF (TuRF) technique. The simulation results indicate that 2LOmb produces a low false-positive error. Moreover, 2LOmb has the best performance in terms of an ability to identify all causative single nucleotide polymorphisms (SNPs) and a low number of output SNPs in purely epistatic two-, three- and four-locus interaction problems. The interaction models constructed from the 2LOmb outputs via a multifactor dimensionality reduction (MDR) method are also included for the confirmation of epistasis detection. 2LOmb is subsequently applied to a type 2 diabetes mellitus (T2D) data set, which is obtained as a part of the UK genome-wide genetic epidemiology study by the Wellcome Trust Case Control Consortium (WTCCC). After primarily screening for SNPs that locate within or near 372 candidate genes and exhibit no marginal single-locus effects, the T2D data set is reduced to 7,065 SNPs from 370 genes. The 2LOmb search in the reduced T2D data reveals that four intronic SNPs in <it>PGM1 </it>(phosphoglucomutase 1), two intronic SNPs in <it>LMX1A </it>(LIM homeobox transcription factor 1, alpha), two intronic SNPs in <it>PARK2 </it>(Parkinson disease (autosomal recessive, juvenile) 2, parkin) and three intronic SNPs in <it>GYS2 </it>(glycogen synthase 2 (liver)) are associated with the disease. The 2LOmb result suggests that there is no interaction between each pair of the identified genes that can be described by purely epistatic two-locus interaction models. Moreover, there are no interactions between these four genes that can be described by purely epistatic multi-locus interaction models with marginal two-locus effects. The findings provide an alternative explanation for the aetiology of T2D in a UK population. Conclusion An omnibus permutation test on ensembles of two-locus analyses can detect purely epistatic multi-locus interactions with marginal two-locus effects. The study also reveals that SNPs from large-scale or genome-wide case-control data which are discarded after single-locus analysis detects no association can still be useful for genetic epidemiology studies.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Genome-wide association study of colorectal cancer using evolutionary computing

Author: Geng Shengkai
Publication venue: Memorial University of Newfoundland
Publication date: 01/01/2021
Field of study

The heritability of complex diseases is usually ascribed to interacting genetic alterations. Many diseases have been found that are influenced by genetic factors. Colorectal cancer (CRC) is a type of cancer starting from the colon or rectum that seriously threatens human health, and it has the chance to spread to other parts of the human body. The cause of CRC is multifactorial, including age, sex, intake of fat, etc. In addition, it has been suggested that genetic factors also play an essential role. Several genetic variations have been identified as associated with CRC. However, they only explain a small portion of the heritability. More advanced computational techniques are required to identify combinations of genetic factors. Recently, artificial intelligence algorithms have became a powerful tool for biomedical data analyses. In this thesis, I design an evolutionary algorithm for the identification of combinations of genetic factors, i.e., single nucleotide polymorphisms (SNPs), that can best explain the susceptibility to CRC

Memorial University Research Repository