1,602 research outputs found
Information-Theoretic Inference of Large Transcriptional Regulatory Networks
The paper presents MRNET, an original method for inferring genetic networks from microarray data. The method is based on maximum relevance/minimum redundancy (MRMR), an effective information-theoretic technique for feature selection in supervised learning. The MRMR principle consists in selecting among the least redundant variables the ones that have the highest mutual information with the target. MRNET extends this feature selection principle to networks in order to infer gene-dependence relationships from microarray data. The paper assesses MRNET by benchmarking it against RELNET, CLR, and ARACNE, three state-of-the-art information-theoretic methods for large (up to several thousands of genes) network inference. Experimental results on thirty synthetically generated microarray datasets show that MRNET is competitive with these methods.SCOPUS: ar.jinfo:eu-repo/semantics/publishe
Study of meta-analysis strategies for network inference using information-theoretic approaches
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Reverse engineering of gene regulatory networks (GRNs) from gene expression data is a classical challenge in systems biology. Thanks to high-throughput technologies, a massive amount of gene-expression data has been accumulated in the public repositories. Modelling GRNs from multiple experiments (also called integrative analysis) has; therefore, naturally become a standard procedure in modern computational biology. Indeed, such analysis is usually more robust than the traditional approaches focused on individual datasets, which typically suffer from some experimental bias and a small number of samples.
To date, there are mainly two strategies for the problem of interest: the first one (”data merging”) merges all datasets together and then infers a GRN whereas the other (”networks ensemble”) infers GRNs from every dataset separately and then aggregates them using some ensemble rules (such as ranksum or weightsum). Unfortunately, a thorough comparison of these two approaches is lacking.
In this paper, we evaluate the performances of various metaanalysis approaches mentioned above with a systematic set of experiments based on in silico benchmarks. Furthermore, we present a new meta-analysis approach for inferring GRNs from multiple studies. Our proposed approach, adapted to methods based on pairwise measures such as correlation or mutual information, consists of two steps: aggregating matrices of the pairwise measures from every dataset followed by extracting the network from the meta-matrix.Peer ReviewedPostprint (author's final draft
Inference of the genetic network regulating lateral root initiation in Arabidopsis thaliana
Regulation of gene expression is crucial for organism growth, and it is one of the challenges in Systems Biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyse two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants to infer their regulatory network. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale-free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation
Reverse Engineering Gene Networks with ANN: Variability in Network Inference Algorithms
Motivation :Reconstructing the topology of a gene regulatory network is one
of the key tasks in systems biology. Despite of the wide variety of proposed
methods, very little work has been dedicated to the assessment of their
stability properties. Here we present a methodical comparison of the
performance of a novel method (RegnANN) for gene network inference based on
multilayer perceptrons with three reference algorithms (ARACNE, CLR, KELLER),
focussing our analysis on the prediction variability induced by both the
network intrinsic structure and the available data.
Results: The extensive evaluation on both synthetic data and a selection of
gene modules of "Escherichia coli" indicates that all the algorithms suffer of
instability and variability issues with regards to the reconstruction of the
topology of the network. This instability makes objectively very hard the task
of establishing which method performs best. Nevertheless, RegnANN shows MCC
scores that compare very favorably with all the other inference methods tested.
Availability: The software for the RegnANN inference algorithm is distributed
under GPL3 and it is available at the corresponding author home page
(http://mpba.fbk.eu/grimaldi/regnann-supmat
Validating module network learning algorithms using simulated data
In recent years, several authors have used probabilistic graphical models to
learn expression modules and their regulatory programs from gene expression
data. Here, we demonstrate the use of the synthetic data generator SynTReN for
the purpose of testing and comparing module network learning algorithms. We
introduce a software package for learning module networks, called LeMoNe, which
incorporates a novel strategy for learning regulatory programs. Novelties
include the use of a bottom-up Bayesian hierarchical clustering to construct
the regulatory programs, and the use of a conditional entropy measure to assign
regulators to the regulation program nodes. Using SynTReN data, we test the
performance of LeMoNe in a completely controlled situation and assess the
effect of the methodological changes we made with respect to an existing
software package, namely Genomica. Additionally, we assess the effect of
various parameters, such as the size of the data set and the amount of noise,
on the inference performance. Overall, application of Genomica and LeMoNe to
simulated data sets gave comparable results. However, LeMoNe offers some
advantages, one of them being that the learning process is considerably faster
for larger data sets. Additionally, we show that the location of the regulators
in the LeMoNe regulation programs and their conditional entropy may be used to
prioritize regulators for functional validation, and that the combination of
the bottom-up clustering strategy with the conditional entropy-based assignment
of regulators improves the handling of missing or hidden regulators.Comment: 13 pages, 6 figures + 2 pages, 2 figures supplementary informatio
Elucidation of Directionality for Co-Expressed Genes: Predicting Intra-Operon Termination Sites
We present a novel framework for inferring regulatory and sequence-level
information from gene co-expression networks. The key idea of our methodology
is the systematic integration of network inference and network topological
analysis approaches for uncovering biological insights. We determine the gene
co-expression network of Bacillus subtilis using Affymetrix GeneChip time
series data and show how the inferred network topology can be linked to
sequence-level information hard-wired in the organism's genome. We propose a
systematic way for determining the correlation threshold at which two genes are
assessed to be co-expressed by using the clustering coefficient and we expand
the scope of the gene co-expression network by proposing the slope ratio metric
as a means for incorporating directionality on the edges. We show through
specific examples for B. subtilis that by incorporating expression level
information in addition to the temporal expression patterns, we can uncover
sequence-level biological insights. In particular, we are able to identify a
number of cases where (i) the co-expressed genes are part of a single
transcriptional unit or operon and (ii) the inferred directionality arises due
to the presence of intra-operon transcription termination sites.Comment: 7 pages, 8 figures, accepted in Bioinformatic
- …