Search CORE

125 research outputs found

Challenges in the Multivariate Analysis of Mass Cytometry Data: The Effect of Randomization

Author: Cabrero D-G
Lagani V
Papoutsoglou G
Schmidt A
Tegner J
Tsamardinos I
Tsirlis K
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 06/11/2019
Field of study

Cytometry by time-of-flight (CyTOF) has emerged as a high-throughput single cell technology able to provide large samples of protein readouts. Already, there exists a large pool of advanced high-dimensional analysis algorithms that explore the observed heterogeneous distributions making intriguing biological inferences. A fact largely overlooked by these methods, however, is the effect of the established data preprocessing pipeline to the distributions of the measured quantities. In this article, we focus on randomization, a transformation used for improving data visualization, which can negatively affect multivariate data analysis methods such as dimensionality reduction, clustering, and network reconstruction algorithms. Our results indicate that randomization should be used only for visualization purposes, but not in conjunction with high-dimensional analytical tools

UCL Discovery

Feature selection and prediction with a Markov blanket structure learning algorithm

Author: B Malone
I Tsamardinos
JR Quinlan
N Friedman
Yuan Tan
Z Liu
Zhifa Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An Experimental Comparison of Hybrid Algorithms for Bayesian Network Structure Learning

Author: A. Aussem
B. Ellis
C.F. Aliferis
D. Heckerman
D.M. Chickering
E. Perrier
G.E. Schwarz
I. Tsamardinos
I. Tsamardinos
J. Cheng
J. Pearl
J. Peña
J.M. Peña
J.M. Peña
K. Kojima
M. Koivisto
M. Scutari
S. Rodrigues de Morais
S.R. Morais de
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

International audienceWe present a novel hybrid algorithm for Bayesian network structure learning, called Hybrid HPC (H2PC). It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. It is based on a subroutine called HPC, that combines ideas from incremental and divide-and-conquer constraint-based methods to learn the parents and children of a target variable. We conduct an experimental comparison of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning, on several benchmarks with various data sizes. Our extensive experiments show that H2PC outperforms MMHC both in terms of goodness of fit to new data and in terms of the quality of the network structure itself, which is closer to the true dependence structure of the data. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available

Crossref

HAL

Hal-Diderot

Learning biological network using mutual information and conditional independence

Author: C Chow
Chin-Rang Yang
D Heckerman
DM Chickering
Dong-Chul Kim
GF Cooper
I Tsamardinos
IA Beinlich
J Pearl
J Rissanen
Jean Gao
LM de Campos
N Friedman
SL Lauritzen
T Verma
W Lam
Xiaoyu Wang
XW Chen
XW Chen
YB Kim
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment

Author: Aasmets O
Berland M
Carrillo de, Santa, Pau, E
Claesson MJ
Gruca A
Hasic J
Hron K
Karaduzovic-Hadziabdic K
Klammsteiner T
Kolev M
Lahti L
Loncar Turukalo, T
Lopes MB
Marcos-Zambrano LJ
Moreno V
Moreno-Indias I
Naskinova I
Org E
Paciência I
Papoutsoglou G
Przymus P
Shigdel R
Stres B
Trajkovik V
Truu J
Tsamardinos I
Vilne B
Yousef M
Zdravevski E
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2021
Field of study

The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.This study was supported by COST Action CA18131 “Statistical and machine learning techniques in human microbiome studies”. Estonian Research Council grant PRG548 (JT). Spanish State Research Agency Juan de la Cierva Grant IJC2019-042188-I (LM-Z). EO was founded and OA was supported by Estonian Research Council grant PUT 1371 and EMBO Installation grant 3573. AG was supported by Statutory Research project of the Department of Computer Networks and Systems

Repositório Aberto da Universidade do Porto

Expanding the Understanding of Biases in Development of Clinical-Grade Molecular Signatures: A Case Study in Acute Respiratory Viral Infections

Author: A Rangarajan
A Statnikov
A Statnikov
A Statnikov
A Statnikov
A Statnikov
AK Zaas
AK Zaas
Alexander Statnikov
AM Glas
C Ambroise
CF Aliferis
CF Aliferis
CF Aliferis
Constantin F. Aliferis
EE Ntzani
ER DeLong
F Azuaje
FJ Gonzalez
GG Jackson
I Guyon
I Guyon
I Tsamardinos
J Pearl
J Pearl
JA Sparano
JT Leek
Jörn-Hendrik Weitkamp
KA Baggerly
Lauren McVoy
LM Cope
Nikita I. Lytkin
O Ramilo
R Kohavi
R Simon
RA Irizarry
RA Irizarry
RL Somorjai
TW Anderson
UM Braga-Neto
Vladimir Brusic
VN Vapnik
WE Johnson
Y Benjamini
Y Benjamini
Z Liu
Publication venue: Public Library of Science
Publication date: 01/06/2011
Field of study

The promise of modern personalized medicine is to use molecular and clinical information to better diagnose, manage, and treat disease, on an individual patient basis. These functions are predominantly enabled by molecular signatures, which are computational models for predicting phenotypes and other responses of interest from high-throughput assay data. Data-analytics is a central component of molecular signature development and can jeopardize the entire process if conducted incorrectly. While exploratory data analysis may tolerate suboptimal protocols, clinical-grade molecular signatures are subject to vastly stricter requirements. Closing the gap between standards for exploratory versus clinically successful molecular signatures entails a thorough understanding of possible biases in the data analysis phase and developing strategies to avoid them.Using a recently introduced data-analytic protocol as a case study, we provide an in-depth examination of the poorly studied biases of the data-analytic protocols related to signature multiplicity, biomarker redundancy, data preprocessing, and validation of signature reproducibility. The methodology and results presented in this work are aimed at expanding the understanding of these data-analytic biases that affect development of clinically robust molecular signatures.Several recommendations follow from the current study. First, all molecular signatures of a phenotype should be extracted to the extent possible, in order to provide comprehensive and accurate grounds for understanding disease pathogenesis. Second, redundant genes should generally be removed from final signatures to facilitate reproducibility and decrease manufacturing costs. Third, data preprocessing procedures should be designed so as not to bias biomarker selection. Finally, molecular signatures developed and applied on different phenotypes and populations of patients should be treated with great caution

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Constraint-based probabilistic learning of metabolic pathways from tomato volatiles

Author: Anand K. Gavai
Arnaud Bovy
C Meek
E Baldwin
E Yilmaz
E Yilmaz
Fred van Eeuwijk
G Suizdak
Harm Nijveen
I Tsamardinos
J Kopka
Jack A. M. Leunissen
K Morgenthal
M Kalisch
M Zou
MI Jordan
MJ Beal
N Friedman
N Schauer
Peter J. F. Lucas
R Gohlke
R Opgen-Rhein
R Ursem
Remco Ursem
S Moco
W Weckwerth
Y Tikunov
Yury Tikunov
Publication venue: Springer US
Publication date: 01/01/2009
Field of study

Clustering and correlation analysis techniques have become popular tools for the analysis of data produced by metabolomics experiments. The results obtained from these approaches provide an overview of the interactions between objects of interest. Often in these experiments, one is more interested in information about the nature of these relationships, e.g., cause-effect relationships, than in the actual strength of the interactions. Finding such relationships is of crucial importance as most biological processes can only be understood in this way. Bayesian networks allow representation of these cause-effect relationships among variables of interest in terms of whether and how they influence each other given that a third, possibly empty, group of variables is known. This technique also allows the incorporation of prior knowledge as established from the literature or from biologists. The representation as a directed graph of these relationship is highly intuitive and helps to understand these processes. This paper describes how constraint-based Bayesian networks can be applied to metabolomics data and can be used to uncover the important pathways which play a significant role in the ripening of fresh tomatoes. We also show here how this methods of reconstructing pathways is intuitive and performs better than classical techniques. Methods for learning Bayesian network models are powerful tools for the analysis of data of the magnitude as generated by metabolomics experiments. It allows one to model cause-effect relationships and helps in understanding the underlying processes

Crossref

Springer - Publisher Connector

PubMed Central

Radboud Repository

Understanding human functioning using graphical models

Author: A Cieza
A Cieza
A Cieza
Bernd AG Fellinghauer
CL Tsai
Eva Grill
F Biering-Sorensen
G Stucki
Gerold Stucki
I Tsamardinos
IK Zola
J Pearl
JE Bickenbach
JE Ware
JG Ibrahim
JZ J Ramsey
M Kalisch
MA Hernan
Markus Kalisch
Marloes H Maathuis
MH Maathuis
P Spirtes
Peter Bühlmann
R Strobl
S van Buuren
SL Lauritzen
T Hastie
Ulrich Mansmann
World Health Organization
WR Frontera
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Functioning and disability are universal human experiences. However, our current understanding of functioning from a comprehensive perspective is limited. The development of the International Classification of Functioning, Disability and Health (ICF) on the one hand and recent developments in graphical modeling on the other hand might be combined and open the door to a more comprehensive understanding of human functioning. The objective of our paper therefore is to explore how graphical models can be used in the study of ICF data for a range of applications. Methods We show the applicability of graphical models on ICF data for different tasks: Visualization of the dependence structure of the data set, dimension reduction and comparison of subpopulations. Moreover, we further developed and applied recent findings in causal inference using graphical models to estimate bounds on intervention effects in an observational study with many variables and without knowing the underlying causal structure. Results In each field, graphical models could be applied giving results of high face-validity. In particular, graphical models could be used for visualization of functioning in patients with spinal cord injury. The resulting graph consisted of several connected components which can be used for dimension reduction. Moreover, we found that the differences in the dependence structures between subpopulations were relevant and could be systematically analyzed using graphical models. Finally, when estimating bounds on causal effects of ICF categories on general health perceptions among patients with chronic health conditions, we found that the five ICF categories that showed the strongest effect were plausible. Conclusions Graphical Models are a flexible tool and lend themselves for a wide range of applications. In particular, studies involving ICF data seem to be suited for analysis using graphical models.</p

Repository for Publications and Research Data

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Access LMU