Search CORE

125 research outputs found

Statistical Integration of Heterogeneous Data with PO2PLS

Author: Bouhaddani Said el
Houwing-Duistermaat Jeanine
Jongbloed Geurt
Uh Hae-Won
Publication venue
Publication date: 24/03/2021
Field of study

The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high-dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), which addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we implement a fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for testing the relationship between two datasets is proposed, and its asymptotic distribution is derived. Notably, several existing omics integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case-control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package PO2PLS. Supplementary materials for this article are available online.Comment: 36 pages, 4 figures, Submitted to Journal of the American Statistical Associatio

arXiv.org e-Print Archive

TU Delft Repository

Discussion on the paper ‘Statistical contributions to bioinformatics: Design, modelling, structure learning and integration’ by Jeffrey S. Morris and Veerabhadran Baladandayuthapani

Author: Aitchison J
Arief Gusnanto
Benjamini Y
Hae Won Uh
Jeanine J Houwing-Duistermaat
Walt D
Publication venue: 'SAGE Publications'
Publication date: 01/08/2017
Field of study

Bioinformatics is an important research area for statisticians. This discussion provides some additional topics to the paper, namely on statistical contributions to detect differential expressed genes, for protein structure prediction, and for the analysis of highly correlated features in Glycomics datasets

Crossref

Leiden University Scholary Publications

White Rose Research Online

Statistical integration of multi-omics and drug screening data from cell lines

Author: Bickle Marc
el Bouhaddani Said
Hoellerhage Matthias
Houwing-Duistermaat Jeanine
Höglinger Günter
Moebius Claudia
Uh Hae-Won
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2024
Field of study

Data integration methods are used to obtain a unified summary of multiple datasets. For multi-modal data, we propose a computational workflow to jointly analyze datasets from cell lines. The workflow comprises a novel probabilistic data integration method, named POPLS-DA, for multi-omics data.The workflow is motivated by a study on synucleinopathies where transcriptomics, proteomics, and drug screening data are measured in affected LUHMES cell lines and controls. The aim is to highlight potentially druggable pathways and genes involved in synucleinopathies. First, POPLS-DA is used to prioritize genes and proteins that best distinguish cases and controls. For these genes, an integrated interaction network is constructed where the drug screen data is incorporated to highlight druggable genes and pathways in the network. Finally, sfunctional enrichment analyses are performed to identify clusters of synaptic and lysosome-related genes and proteins targeted by the protective drugs. POPLS-DA is compared to other single- and multi-omics approaches.We found that HSPA5, a member of the heat shock protein 70 family, was one of the most targeted genes by the validated drugs, in particular by AT1-blockers. HSPA5 and AT1-blockers have been previously linked to alpha-synuclein pathology and Parkinson's disease, showing the relevance of our findings.Our computational workflow identified new directions for therapeutic targets for synucleinopathies. POPLS-DA provided a larger interpretable gene set than other single- and multi-omic approaches. An implementation based on R and markdown is freely available online. We present a computational workflow that combines the analysis of different types of data measured in cell line studies with non-overlapping samples. We apply the workflow to measurements of gene expression, protein abundances, and a screening of a wide range of FDA-approved drugs. These different types of data are obtained from LUHMES brain cells and jointly analyzed to discover new treatment options in synucleinopathies, such as Parkinson's disease. Our workflow includes a new probabilistic method, named POPLS-DA. POPLS-DA combines the analysis of the genes and proteins to pinpoint a set of relevant genes and proteins that can distinguish affected and non-affected cells. Compared to other approaches, POPLS-DA found a larger set of genes relevant to the disease. Further, we constructed a network that connects the relevant genes and proteins that interact with each other. We incorporate the drug screening data to highlight which part of the network is relevant to the disease and druggable. Through additional analysis of the functionality, we discovered that the genes and proteins that are targeted by protective drugs share relevant properties, namely they are synaptic and lysosome-related genes. Notably, we found that specific types of drugs, namely AT1-blockers such as Telmisartan, are protective and target the network of relevant genes and proteins. These drugs are approved by the FDA and readily available to further investigate their potential in treating synucleinopathies. We further found that a gene named HSPA5, a member of the heat shock protein 70 family, is highly targeted by the protective drugs. This gene has been linked to Parkinson's disease in previous scientific literature. Our computational workflow and the implementation in R and markdown are freely available online

Open Access LMU

Gene analysis for longitudinal family data using random-effects models

Author: Bruna Balliu
Erik van den Akker
Hae-Won Uh
Jeanine J Houwing-Duistermaat
Quinta Helmer
Roula Tsonaka
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

We have extended our recently developed 2-step approach for gene-based analysis to the family design and to the analysis of rare variants. The goal of this approach is to study the joint effect of multiple single-nucleotide polymorphisms that belong to a gene. First, the information in a gene is summarized by 2 variables, namely the empirical Bayes estimate capturing common variation and the number of rare variants. By using random effects for the common variants, our approach acknowledges the within-gene correlations. In the second step, the 2 summaries were included as covariates in linear mixed models. To test the null hypothesis of no association, a multivariate Wald test was applied. We analyzed the simulated data sets to assess the performance of the method. Then we applied the method to the real data set and identified a significant association between FRMD4B and diastolic blood pressure (p-value = 8.3 × 10(-12))

Springer - Publisher Connector

PubMed Central

Pathway analysis for family data using nested random-effects models

Author: Hae-Won Uh
HW Uh
Jeanine J Houwing-Duistermaat
K Wang
L Almasy
LS Chen
P Holmans
R Tsonaka
RM Cantor
Roula Tsonaka
Y Zheng
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Recently we proposed a novel two-step approach to test for pathway effects in disease progression. The goal of this approach is to study the joint effect of multiple single-nucleotide polymorphisms that belong to certain genes. By using random effects, our approach acknowledges the correlations within and between genes when testing for pathway effects. Gene-gene and gene-environment interactions can be included in the model. The method can be implemented with standard software, and the distribution of the test statistics under the null hypothesis can be approximated by using standard chi-square distributions. Hence no extensive permutations are needed for computations of the p-value. In this paper we adapt and apply the method to family data, and we study its performance for sequence data from Genetic Analysis Workshop 17. For the set of unrelated subjects, the performance of the new test was disappointing. We found a power of 6% for the binary outcome and of 18% for the quantitative trait Q1. For family data the new approach appears to perform well, especially for the quantitative outcome. We found a power of 39% for the binary outcome and a power of 89% for the quantitative trait Q1

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Locally weighted transmission/disequilibrium test for genetic association analysis

Author: El Galta Rachid
Houwing-Duistermaat Jeanine J
Hsu Li
Lebrec Jeremie JP
Tang Hua
Uh Hae-Won
Yu Xuesong
Publication venue: BioMed Central
Publication date: 01/12/2005
Field of study

The transmission/disequilibrium test statistic has been used for assessing genetic association in affected-parent trios. In the presence of multiple tightly linked marker loci where local dependency may exist, haplotypes are reconstructed statistically to estimate the joint effects of these markers. In this manuscript, we propose an alternative to the haplotype approach by taking a weighted average of multiple loci, where the weight is proportional to the product of (1-2X recombination fraction) and the linkage disequilibrium between markers. As an illustration, we applied the method to the simulated Aipotu data

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Haplotype Estimation from Fuzzy Genotypes Using Penalized Likelihood

Author: A Dempster
AM Mehta
D Clayton
DV Zaykin
H Akaike
H Kang
Hae-Won Uh
HW Uh
HW Uh
J Marchini
JC Long
KL Ayers
L Excoffer
M Stephens
ME Hawley
Paul H. C. Eilers
R Thompson
R van Berloo
S Lin
SL Slager
T Niu
Thomas Mailund
TJ Hastie
ZS Qin
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The Composite Link Model is a generalization of the generalized linear model in which expected values of observed counts are constructed as a sum of generalized linear components. When combined with penalized likelihood, it provides a powerful and elegant way to estimate haplotype probabilities from observed genotypes. Uncertain (“fuzzy”) genotypes, like those resulting from AFLP scores, can be handled by adding an extra layer to the model. We describe the model and the estimation algorithm. We apply it to a data set of accurate human single nucleotide polymorphism (SNP) and to a data set of fuzzy tomato AFLP scores

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

EUR Research Repository

Leiden University Scholary Publications

Erasmus University Digital Repository