Search CORE

5,260 research outputs found

Detecting high-order interactions of single nucleotide polymorphisms using genetic programming

Author: Bernholt Thorsten
Ickstadt Katja
Nunkesser Robin
Schwender Holger
Wegener Ing
Publication venue
Publication date
Field of study

Motivation: Not individual single nucleotide polymorphisms (SNPs), but high-order interactions of SNPs are assumed to be responsible for complex diseases such as cancer. Therefore, one of the major goals of genetic association studies concerned with such genotype data is the identification of these high-order interactions. This search is additionally impeded by the fact that these interactions often are only explanatory for a relatively small subgroup of patients. Most of the feature selection methods proposed in the literature, unfortunately, fail at this task, since they can either only identify individual variables or interactions of a low order, or try to find rules that are explanatory for a high percentage of the observations. In this paper, we present a procedure based on genetic programming and multi-valued logic that enables the identification of high-order interactions of categorical variables such as SNPs. This method called GPAS (Genetic Programming for Association Studies) cannot only be used for feature selection, but can also be employed for discrimination. Results: In an application to the genotype data from the GENICA study, an association study concerned with sporadic breast cancer, GPAS is able to identify high-order interactions of SNPs leading to a considerably increased breast cancer risk for different subsets of patients that are not found by other feature selection methods. As an application to a subset of the HapMap data shows, GPAS is not restricted to association studies comprising several ten SNPs, but can also be employed to analyze whole-genome data. --

Research Papers in Economics

Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases

Author: Kelemen Arpad
Liang Yulan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/03/2008
Field of study

Recent advances of information technology in biomedical sciences and other applied areas have created numerous large diverse data sets with a high dimensional feature space, which provide us a tremendous amount of information and new opportunities for improving the quality of human life. Meanwhile, great challenges are also created driven by the continuous arrival of new data that requires researchers to convert these raw data into scientific knowledge in order to benefit from it. Association studies of complex diseases using SNP data have become more and more popular in biomedical research in recent years. In this paper, we present a review of recent statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic association studies for complex diseases. The review includes both general feature reduction approaches for high dimensional correlated data and more specific approaches for SNPs data, which include unsupervised haplotype mapping, tag SNP selection, and supervised SNPs selection using statistical testing/scoring, statistical modeling and machine learning methods with an emphasis on how to identify interacting loci.Comment: Published in at http://dx.doi.org/10.1214/07-SS026 the Statistics Surveys (http://www.i-journals.org/ss/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Bioinformatics challenges for genome-wide association studies

Author: Ahmed
Altshuler
Amundadottir
Askland
Bureau
Bush
Calle
Chang
Chanock
Cook
Culverhouse
Donnelly
Easton
Eiberg
Elbers
Emily
F. W. Asselbergs
Greene
Hahn
Hahn
Hirschhorn
Holmans
Infante
J. H. Moore
Jakobsdottir
Kooperberg
Kraft
Lewontin
Lou
Lunetta
Manolio
Manolio
Marchini
McKinney
McKinney
Mei
Millstein
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Moore
Motsinger
Namkung
Nelson
Pan
Pattin
Reich
Reif
Ripperger
Ritchie
Ritchie
Ritchie
S. M. Williams
Schork
Sinnott-Armstrong
Spencer
Thornton-Wells
Torkamani
Velez
Wang
Wilke
Williams
Wongseree
Yu
Yu
Zhang
Publication venue: Oxford University Press
Publication date: 15/02/2010
Field of study

Motivation: The sequencing of the human genome has made it possible to identify an informative set of >1 million single nucleotide polymorphisms (SNPs) across the genome that can be used to carry out genome-wide association studies (GWASs). The availability of massive amounts of GWAS data has necessitated the development of new biostatistical methods for quality control, imputation and analysis issues including multiple testing. This work has been successful and has enabled the discovery of new associations that have been replicated in multiple studies. However, it is now recognized that most SNPs discovered via GWAS have small effects on disease susceptibility and thus may not be suitable for improving health care through genetic testing. One likely explanation for the mixed results of GWAS is that the current biostatistical analysis paradigm is by design agnostic or unbiased in that it ignores all prior knowledge about disease pathobiology. Further, the linear modeling framework that is employed in GWAS often considers only one SNP at a time thus ignoring their genomic and environmental context. There is now a shift away from the biostatistical approach toward a more holistic approach that recognizes the complexity of the genotype–phenotype relationship that is characterized by significant heterogeneity and gene–gene and gene–environment interaction. We argue here that bioinformatics has an important role to play in addressing the complexity of the underlying genetic basis of common human diseases. The goal of this review is to identify and discuss those GWAS challenges that will require computational methods

CiteSeerX

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

PubMed Central

UCL Discovery

Dissertations of the University of Groningen

RFreak-An R-package for evolutionary computation

Author: Nunkesser Robin
Publication venue
Publication date
Field of study

RFreak is an R package providing a framework for evolutionary computation. By enwrapping the functionality of an evolutionary algorithm kit written in Java, it offers an easy way to do evolutionary computation in R. In addition, application examples where an evolutionary approach is promising in computational statistics are included and described in this paper. The package is thus further supporting the use of evolutionary computation in computational statistics. --R,evolutionary algorithms,evolutionary computation,association study,robust regression

Research Papers in Economics

Variants within the MMP3 gene are associated with achilles tendinopathy: possible interaction with the COL5A1 gene

Author: Collins M
Raleigh Stuart M
Ribbans William J
Schwellnus M P
Smith R K
van der Merwe L
Publication venue: 'BMJ'
Publication date: 01/07/2009
Field of study

Objectives: Sequence variation within the COL5A1 and TNC genes are known to associate with Achilles tendinopathy. The primary aim of this case-control genetic association study was to investigate whether variants within the matrix metalloproteinase 3 (MMP3) gene also contributed to both Achilles tendinopathy and Achilles tendon rupture in a Caucasian population. A secondary aim was to establish whether variants within the MMP3 gene interacted with the COL5A1 rs12722 variant to raise risk of these pathologies. Methods: 114 subjects with symptoms of Achilles tendon pathology and 98 healthy controls were genotyped for MMP3 variants rs679620, rs591058 and rs650108. Results: As single markers, significant associations were found between the GG genotype of rs679620 (OR = 2.5, 95% CI 1.2 to 4.90, p = 0.010), the CC genotype of rs591058 (OR = 2.3, 95% CI 1.1 to 4.50, p = 0.023) and the AA genotype of rs650108 (OR = 4.9, 95% CI 1.0 to 24.1, p = 0.043) and risk of Achilles tendinopathy. The ATG haplotype (rs679620, rs591058, and rs650108) was under-represented in the tendinopathy group when compared to the control group (41% vs 53%, p = 0.038). Finally, the G allele of rs679620 and the T allele of COL5A1 rs12722 significantly interacted to raise risk of AT (p = 0.006). No associations were found between any of the MMP3 markers and Achilles tendon rupture. Conclusion: Variants within the MMP3 gene are associated with Achilles tendinopathy. Furthermore, the MMP3 gene variant rs679620 and the COL5A1 marker rs12722 interact to modify the risk of tendinopathy. These data further support a genetic contribution to a common sports related injur

University of Northampton's Research Explorer

Coventry University Pure Portal

NECTAR

GPNN: Power Studies and Applications of a Neural Network Method for Detecting Gene-Gene Interactions in Studies of Human Disease

Author: Lee Stephen L
Mellick George
Motsinger Alison A
Ritchie Marylyn D
Publication venue: Dartmouth Digital Commons
Publication date: 25/01/2006
Field of study

The identification and characterization of genes that influence the risk of common, complex multifactorial disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. We have previously introduced a genetic programming optimized neural network (GPNN) as a method for optimizing the architecture of a neural network to improve the identification of gene combinations associated with disease risk. The goal of this study was to evaluate the power of GPNN for identifying high-order gene-gene interactions. We were also interested in applying GPNN to a real data analysis in Parkinson\u27s disease

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Statistical methods of SNP data analysis with applications

Author: Bulinski Alexander
Butkovsky Oleg
Shashkin Alexey
Yaskov Pavel
Publication venue
Publication date: 14/06/2011
Field of study

Various statistical methods important for genetic analysis are considered and developed. Namely, we concentrate on the multifactor dimensionality reduction, logic regression, random forests and stochastic gradient boosting. These methods and their new modifications, e.g., the MDR method with "independent rule", are used to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and external risk factors are examined. To perform the data analysis concerning the ischemic heart disease and myocardial infarction the supercomputer SKIF "Chebyshev" of the Lomonosov Moscow State University was employed

arXiv.org e-Print Archive

Hal-Diderot

Bioinformatics: Strategies, Trends, and Perspectives

Author: Adriane Beatriz de Souza Serapião
Carlos Norberto Fischer
Publication venue: 'IntechOpen'
Publication date: 01/03/2010
Field of study

IntechOpen

Recommended from our members

GenEpi: gene-based epistasis discovery using machine learning.

Author: Alzheimer’s Disease Neuroimaging Initiative
Chang Yu-Chuan
Chen Chien-Yu
Giacomini Kathleen M
Hong Ming-Yi
Hsieh Ping-Han
Oyang Yen-Jen
Tung Yi-An
Wu June-Tai
Yee Sook Wah
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

BackgroundGenome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer's disease (AD).ResultsIn this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power.ConclusionsThe results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future

eScholarship - University of California