6,126 research outputs found
Inferring causal relations from multivariate time series : a fast method for large-scale gene expression data
Various multivariate time series analysis techniques have been developed with the aim of inferring causal relations between time series. Previously, these techniques have proved their effectiveness on economic and neurophysiological data, which normally consist of hundreds of samples. However, in their applications to gene regulatory inference, the small sample size of gene expression time series poses an obstacle. In this paper, we describe some of the most commonly used multivariate inference techniques and show the potential challenge related to gene expression analysis. In response, we propose a directed partial correlation (DPC) algorithm as an efficient and effective solution to causal/regulatory relations inference on small sample gene expression data. Comparative evaluations on the existing techniques and the proposed method are presented. To draw reliable conclusions, a comprehensive benchmarking on data sets of various setups is essential. Three experiments are designed to assess these methods in a coherent manner. Detailed analysis of experimental results not only reveals good accuracy of the proposed DPC method in large-scale prediction, but also gives much insight into all methods under evaluation
Model-based clustering with data correction for removing artifacts in gene expression data
The NIH Library of Integrated Network-based Cellular Signatures (LINCS)
contains gene expression data from over a million experiments, using Luminex
Bead technology. Only 500 colors are used to measure the expression levels of
the 1,000 landmark genes measured, and the data for the resulting pairs of
genes are deconvolved. The raw data are sometimes inadequate for reliable
deconvolution leading to artifacts in the final processed data. These include
the expression levels of paired genes being flipped or given the same value,
and clusters of values that are not at the true expression level. We propose a
new method called model-based clustering with data correction (MCDC) that is
able to identify and correct these three kinds of artifacts simultaneously. We
show that MCDC improves the resulting gene expression data in terms of
agreement with external baselines, as well as improving results from subsequent
analysis.Comment: 28 page
Reconstruction of biological networks by supervised machine learning approaches
We review a recent trend in computational systems biology which aims at using
pattern recognition algorithms to infer the structure of large-scale biological
networks from heterogeneous genomic data. We present several strategies that
have been proposed and that lead to different pattern recognition problems and
algorithms. The strenght of these approaches is illustrated on the
reconstruction of metabolic, protein-protein and regulatory networks of model
organisms. In all cases, state-of-the-art performance is reported
Reconstructing directed and weighted topologies of phase-locked oscillator networks
The formalism of complex networks is extensively employed to describe the
dynamics of interacting agents in several applications. The features of the
connections among the nodes in a network are not always provided beforehand,
hence the problem of appropriately inferring them often arises. Here, we
present a method to reconstruct directed and weighted topologies (REDRAW) of
networks of heterogeneous phase-locked nonlinear oscillators. We ultimately
plan on using REDRAW to infer the interaction structure in human ensembles
engaged in coordination tasks, and give insights into the overall behavior
- …