Search CORE

1,572 research outputs found

A methodology for multivariate phenotype-based genome-wide association studies to mine pleiotropic genes

Author: Kim Sangsoo
Lee Ji Young
Park Sung Hee
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes

Author: Giovanni Montana
Limsoon Wong
null null
Wilson Goh
Yue Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

10.1186/1471-2105-14-S16-S6BMC Bioinformatics14SUPPL16-BBMI

Crossref

Springer - Publisher Connector

PubMed Central

ScholarBank@NUS

Mining for genotype-phenotype relations in Saccharomyces using partial least squares

Abstract Background Multivariate approaches are important due to their versatility and applications in many fields as it provides decisive advantages over univariate analysis in many ways. Genome wide association studies are rapidly emerging, but approaches in hand pay less attention to multivariate relation between genotype and phenotype. We introduce a methodology based on a BLAST approach for extracting information from genomic sequences and Soft- Thresholding Partial Least Squares (ST-PLS) for mapping genotype-phenotype relations. Results Applying this methodology to an extensive data set for the model yeast <it>Saccharomyces cerevisiae</it>, we found that the relationship between genotype-phenotype involves surprisingly few genes in the sense that an overwhelmingly large fraction of the phenotypic variation can be explained by variation in less than 1% of the full gene reference set containing 5791 genes. These phenotype influencing genes were evolving 20% faster than non-influential genes and were unevenly distributed over cellular functions, with strong enrichments in functions such as cellular respiration and transposition. These genes were also enriched with known paralogs, stop codon variations and copy number variations, suggesting that such molecular adjustments have had a disproportionate influence on <it>Saccharomyces </it>yeasts recent adaptation to environmental changes in its ecological niche. Conclusions BLAST and PLS based multivariate approach derived results that adhere to the known yeast phylogeny and gene ontology and thus verify that the methodology extracts a set of fast evolving genes that capture the phylogeny of the yeast strains. The approach is worth pursuing, and future investigations should be made to improve the computations of genotype signals as well as variable selection procedure within the PLS framework.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Towards knowledge-based gene expression data mining

Author: Bellazzi Riccado
Zupan Blaz
Publication venue
Publication date: 01/01/2007
Field of study

The field of gene expression data analysis has grown in the past few years from being purely data-centric to integrative, aiming at complementing microarray analysis with data and knowledge from diverse available sources. In this review, we report on the plethora of gene expression data mining techniques and focus on their evolution toward knowledge-based data analysis approaches. In particular, we discuss recent developments in gene expression-based analysis methods used in association and classification studies, phenotyping and reverse engineering of gene networks

Elsevier - Publisher Connector

ePrints.FRI

Sparse reduced-rank regression for imaging genetics studies: models and applications

Author: Vounou Maria
Vounou Maria
Publication venue: Mathematics, Imperial College London
Publication date: 01/02/2012
Field of study

We present a novel statistical technique; the sparse reduced rank regression (sRRR) model which is a strategy for multivariate modelling of high-dimensional imaging responses and genetic predictors. By adopting penalisation techniques, the model is able to enforce sparsity in the regression coefficients, identifying subsets of genetic markers that best explain the variability observed in subsets of the phenotypes. To properly exploit the rich structure present in each of the imaging and genetics domains, we additionally propose the use of several structured penalties within the sRRR model. Using simulation procedures that accurately reflect realistic imaging genetics data, we present detailed evaluations of the sRRR method in comparison with the more traditional univariate linear modelling approach. In all settings considered, we show that sRRR possesses better power to detect the deleterious genetic variants. Moreover, using a simple genetic model, we demonstrate the potential benefits, in terms of statistical power, of carrying out voxel-wise searches as opposed to extracting averages over regions of interest in the brain. Since this entails the use of phenotypic vectors of enormous dimensionality, we suggest the use of a sparse classification model as a de-noising step, prior to the imaging genetics study. Finally, we present the application of a data re-sampling technique within the sRRR model for model selection. Using this approach we are able to rank the genetic markers in order of importance of association to the phenotypes, and similarly rank the phenotypes in order of importance to the genetic markers. In the very end, we illustrate the application perspective of the proposed statistical models in three real imaging genetics datasets and highlight some potential associations

Spiral - Imperial College Digital Repository

Efficient Computational Techniques for Tag SNP Selection, Epistasis Analysis, and Genome-wide Association Study

Author: WANG YUE
Publication venue
Publication date: 30/11/2012
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Efficient inference for genetic association studies with multiple outcomes

Author: Davison Anthony C.
Hager Jörg
Irincheeva Irina
Ruffieux Hélène
Publication venue: 'Oxford University Press (OUP)'
Publication date: 20/03/2017
Field of study

Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modelling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson et al. (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Complex biomarker discovery in neuroimaging data: Finding a needle in a haystack

Author: Atluri Gowtham
Doraiswamy P. Murali
Fang Gang
Kumar Vipin
Lim Kelvin
MacDonald Angus
Padmanabhan Kanchana
Petrella Jeffrey R.
Samatova Nagiza F.
Steinbach Michael
Publication venue: The Authors. Published by Elsevier Inc.
Publication date: 07/08/2013
Field of study

AbstractNeuropsychiatric disorders such as schizophrenia, bipolar disorder and Alzheimer's disease are major public health problems. However, despite decades of research, we currently have no validated prognostic or diagnostic tests that can be applied at an individual patient level. Many neuropsychiatric diseases are due to a combination of alterations that occur in a human brain rather than the result of localized lesions. While there is hope that newer imaging technologies such as functional and anatomic connectivity MRI or molecular imaging may offer breakthroughs, the single biomarkers that are discovered using these datasets are limited by their inability to capture the heterogeneity and complexity of most multifactorial brain disorders. Recently, complex biomarkers have been explored to address this limitation using neuroimaging data. In this manuscript we consider the nature of complex biomarkers being investigated in the recent literature and present techniques to find such biomarkers that have been developed in related areas of data mining, statistics, machine learning and bioinformatics

Elsevier - Publisher Connector

PubMed Central