Search CORE

730 research outputs found

Haplotype Estimation from Fuzzy Genotypes Using Penalized Likelihood

Author: A Dempster
AM Mehta
D Clayton
DV Zaykin
H Akaike
H Kang
Hae-Won Uh
HW Uh
HW Uh
J Marchini
JC Long
KL Ayers
L Excoffer
M Stephens
ME Hawley
Paul H. C. Eilers
R Thompson
R van Berloo
S Lin
SL Slager
T Niu
Thomas Mailund
TJ Hastie
ZS Qin
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The Composite Link Model is a generalization of the generalized linear model in which expected values of observed counts are constructed as a sum of generalized linear components. When combined with penalized likelihood, it provides a powerful and elegant way to estimate haplotype probabilities from observed genotypes. Uncertain (“fuzzy”) genotypes, like those resulting from AFLP scores, can be handled by adding an extra layer to the model. We describe the model and the estimation algorithm. We apply it to a data set of accurate human single nucleotide polymorphism (SNP) and to a data set of fuzzy tomato AFLP scores

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

EUR Research Repository

Leiden University Scholary Publications

Erasmus University Digital Repository

Learning the optimal scale for GWAS through hierarchical SNP aggregation

Author: Ambroise Christophe
Guinot Florent
Samson Franck
Szafranski Marie
Publication venue
Publication date: 01/01/2018
Field of study

Motivation: Genome-Wide Association Studies (GWAS) seek to identify causal genomic variants associated with rare human diseases. The classical statistical approach for detecting these variants is based on univariate hypothesis testing, with healthy individuals being tested against affected individuals at each locus. Given that an individual's genotype is characterized by up to one million SNPs, this approach lacks precision, since it may yield a large number of false positives that can lead to erroneous conclusions about genetic associations with the disease. One way to improve the detection of true genetic associations is to reduce the number of hypotheses to be tested by grouping SNPs. Results: We propose a dimension-reduction approach which can be applied in the context of GWAS by making use of the haplotype structure of the human genome. We compare our method with standard univariate and multivariate approaches on both synthetic and real GWAS data, and we show that reducing the dimension of the predictor matrix by aggregating SNPs gives a greater precision in the detection of associations between the phenotype and genomic regions

arXiv.org e-Print Archive

HAL Evry

Directory of Open Access Journals

HAL Descartes

Hal-Diderot

Methods for analysing genetic association studies : application to cardiovascular disease

Author: Souverein O.W.
Publication venue
Publication date: 01/01/2007
Field of study

International Migration, Integration and Social Cohesion online publications

Bayesian Statistical Methods for Genetic Association Studies with Case-Control and Cohort Design

Author: Tachmazidou Ioanna
Tachmazidou Ioanna
Publication venue: Epidemiology and Public Health, Imperial College London
Publication date: 01/03/2009
Field of study

Large-scale genetic association studies are carried out with the hope of discovering single nucleotide polymorphisms involved in the etiology of complex diseases. We propose a coalescent-based model for association mapping which potentially increases the power to detect disease-susceptibility variants in genetic association studies with case-control and cohort design. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions and we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium (LD) therein assuming a perfect phylogeny. The haplotype space is then partitioned into disjoint clusters within which the phenotype-haplotype association is assumed to be the same. The novelty of our approach consists in the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common mutation. Our approach is fully Bayesian and we develop Markov Chain Monte Carlo algorithms to sample efficiently over the space of possible partitions. We have also developed a Bayesian survival regression model for high-dimension and small sample size settings. We provide a Bayesian variable selection procedure and shrinkage tool by imposing shrinkage priors on the regression coefficients. We have developed a computationally efficient optimization algorithm to explore the posterior surface and find the maximum a posteriori estimates of the regression coefficients. We compare the performance of the proposed methods in simulation studies and using real datasets to both single-marker analyses and recently proposed multi-marker methods and show that our methods perform similarly in localizing the causal allele while yielding lower false positive rates. Moreover, our methods offer computational advantages over other multi-marker approaches

Spiral - Imperial College Digital Repository

The EM Algorithm and the Rise of Computational Biology

Author: Citable Link
Jun S. Liu
Xiaodan Fan
Yuan Yuan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Mapping Haplotype-haplotype Interactions with Adaptive LASSO

Author: Cui Yuehua
Fu Wenjiang J
Li Ming
Romero Roberto
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The genetic etiology of complex diseases in human has been commonly viewed as a complex process involving both genetic and environmental factors functioning in a complicated manner. Quite often the interactions among genetic variants play major roles in determining the susceptibility of an individual to a particular disease. Statistical methods for modeling interactions underlying complex diseases between single genetic variants (e.g. single nucleotide polymorphisms or SNPs) have been extensively studied. Recently, haplotype-based analysis has gained its popularity among genetic association studies. When multiple sequence or haplotype interactions are involved in determining an individual's susceptibility to a disease, it presents daunting challenges in statistical modeling and testing of the interaction effects, largely due to the complicated higher order epistatic complexity. Results In this article, we propose a new strategy in modeling haplotype-haplotype interactions under the penalized logistic regression framework with adaptive <it>L</it>1-penalty. We consider interactions of sequence variants between haplotype blocks. The adaptive <it>L</it>1-penalty allows simultaneous effect estimation and variable selection in a single model. We propose a new parameter estimation method which estimates and selects parameters by the modified Gauss-Seidel method nested within the EM algorithm. Simulation studies show that it has low false positive rate and reasonable power in detecting haplotype interactions. The method is applied to test haplotype interactions involved in mother and offspring genome in a small for gestational age (SGA) neonates data set, and significant interactions between different genomes are detected. Conclusions As demonstrated by the simulation studies and real data analysis, the approach developed provides an efficient tool for the modeling and testing of haplotype interactions. The implementation of the method in R codes can be freely downloaded from <url>http://www.stt.msu.edu/~cui/software.html</url>.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Commons@Wayne State University