Search CORE

17 research outputs found

A Posterior Probability Approach for Gene Regulatory Network Inference in Genetic Perturbation Data

Author: Raftery Adrian E.
Yeung Ka Yee
Young William Chad
Publication venue
Publication date: 15/03/2016
Field of study

Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown experiments in the NIH LINCS dataset by calculating posterior probabilities, incorporating prior information. We show that the method is able to find previously identified edges from TRANSFAC and JASPAR and discuss the merits and limitations of this approach

arXiv.org e-Print Archive

University of Washington: UW Tacoma Digital Commons

Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size

Author: A Fuente de la
A Reverter
AA Margolin
AL Barabasi
AV Werhli
B Zhang
B-E Perrin
C Olsen
Cristiane P. G. Calixto
D Marbach
D Marbach
F Markowetz
F Steinke
G Altay
H Hache
H Jong de
H Lahdesmaki
H Ma
H Peng
J Linde
J Schäfer
JD Allen
JJ Faith
John W. S. Brown
KP Murphy
KY Yip
L Song
LJ Kogelman
ME Studham
MV DiLeo
N Friedman
N Friedman
N Omranian
Nikoleta Tzioutziou
NS Watson-Haigh
P Bellot
P Langfelder
P Sarder
PB Madhamshettiwar
Ping Lin
R Albert
R Dehghannasiri
RJ Flassig
RJ Prill
Robbie Waugh
Runxuan Zhang
S Ballouz
S Bornholdt
S Kim
S Martin
S Rogers
S Roy
SD Walter
SM Ud-Dean
SM Ud-Dean
T Bulcke Van den
T Saito
T Schaffter
TM Cover
V Huynh-Thu
W Zhao
WC Young
Wenbin Guo
Y Tu
Y Zuo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2017
Field of study

Abstract Background Co-expression has been widely used to identify novel regulatory relationships using high throughput measurements, such as microarray and RNA-seq data. Evaluation studies on co-expression network analysis methods mostly focus on networks of small or medium size of up to a few hundred nodes. For large networks, simulated expression data usually consist of hundreds or thousands of profiles with different perturbations or knock-outs, which is uncommon in real experiments due to their cost and the amount of work required. Thus, the performances of co-expression network analysis methods on large co-expression networks consisting of a few thousand nodes, with only a small number of profiles with a single perturbation, which more accurately reflect normal experimental conditions, are generally uncharacterized and unknown. Methods We proposed a novel network inference methods based on Relevance Low order Partial Correlation (RLowPC). RLowPC method uses a two-step approach to select on the high-confidence edges first by reducing the search space by only picking the top ranked genes from an intial partial correlation analysis and, then computes the partial correlations in the confined search space by only removing the linear dependencies from the shared neighbours, largely ignoring the genes showing lower association. Results We selected six co-expression-based methods with good performance in evaluation studies from the literature: Partial correlation, PCIT, ARACNE, MRNET, MRNETB and CLR. The evaluation of these methods was carried out on simulated time-series data with various network sizes ranging from 100 to 3000 nodes. Simulation results show low precision and recall for all of the above methods for large networks with a small number of expression profiles. We improved the inference significantly by refinement of the top weighted edges in the pre-inferred partial correlation networks using RLowPC. We found improved performance by partitioning large networks into smaller co-expressed modules when assessing the method performance within these modules. Conclusions The evaluation results show that current methods suffer from low precision and recall for large co-expression networks where only a small number of profiles are available. The proposed RLowPC method effectively reduces the indirect edges predicted as regulatory relationships and increases the precision of top ranked predictions. Partitioning large networks into smaller highly co-expressed modules also helps to improve the performance of network inference methods. The RLowPC R package for network construction, refinement and evaluation is available at GitHub: https://github.com/wyguo/RLowPC

Crossref

Directory of Open Access Journals

University of Dundee Online Publications

Model-based clustering with data correction for removing artifacts in gene expression data

Author: Raftery Adrian E.
Yeung Ka Yee
Young William Chad
Publication venue
Publication date: 19/02/2016
Field of study

The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometimes inadequate for reliable deconvolution leading to artifacts in the final processed data. These include the expression levels of paired genes being flipped or given the same value, and clusters of values that are not at the true expression level. We propose a new method called model-based clustering with data correction (MCDC) that is able to identify and correct these three kinds of artifacts simultaneously. We show that MCDC improves the resulting gene expression data in terms of agreement with external baselines, as well as improving results from subsequent analysis.Comment: 28 page

arXiv.org e-Print Archive

University of Washington: UW Tacoma Digital Commons

Gene Regulatory Network Inference Using Machine Learning Techniques

Author: Kamgnia Wonkap Stephanie
Publication venue
Publication date: 06/07/2020
Field of study

Systems Biology is a field that models complex biological systems in order to better understand the working of cells and organisms. One of the systems modeled is the gene regulatory network that plays the critical role of controlling an organism's response to changes in its environment. Ideally, we would like a model of the complete gene regulatory network. In recent years, several advances in technology have permitted the collection of an unprecedented amount and variety of data such as genomes, gene expression data, time-series data, and perturbation data. This has stimulated research into computational methods that reconstruct, or infer, models of the gene regulatory network from the data. Many solutions have been proposed, yet there remain open challenges in utilising the range of available data as it is inherently noisy, and must be integrated by the inference techniques. The thesis seeks to contribute to this discourse by investigating challenges of performance, scale, and data integration. We propose a new algorithm BENIN that views network inference as feature selection to address issues of scale, that uses elastic net regression for improved performance, and adapts elastic net to integrate different types of biological data. The BENIN algorithm is benchmarked on a synthetic dataset from the DREAM4 challenge, and on real expression data for the human HeLa cell cycle. On the DREAM4 dataset BENIN out-performed all DREAM4 competitors on the size 100 subchallenge, and is also competitive with more recent state-of-the-art methods. Moreover, on the HeLa cell cycle data, BENIN could infer known regulatory interactions and propose new interactions that warrant further experimental investigation. Keys words: gene regulatory network, network inference, feature selection, elastic net regression

Concordia University Research Repository

Learning the structure of Bayesian Networks: A quantitative assessment of the effect of different algorithmic schemes

Author: Beretta Stefano
Castelli Mauro
Goncalves Ivo
Henriques Roberto
Ramazzotti Daniele
Publication venue
Publication date: 01/01/2018
Field of study

One of the most challenging tasks when adopting Bayesian Networks (BNs) is the one of learning their structure from data. This task is complicated by the huge search space of possible solutions, and by the fact that the problem is NP-hard. Hence, full enumeration of all the possible solutions is not always feasible and approximations are often required. However, to the best of our knowledge, a quantitative analysis of the performance and characteristics of the different heuristics to solve this problem has never been done before. For this reason, in this work, we provide a detailed comparison of many different state-of-the-arts methods for structural learning on simulated data considering both BNs with discrete and continuous variables, and with different rates of noise in the data. In particular, we investigate the performance of different widespread scores and algorithmic approaches proposed for the inference and the statistical pitfalls within them

arXiv.org e-Print Archive

Directory of Open Access Journals

Repositório da Universidade Nova de Lisboa

Estudo Geral

Inference of regulatory networks with a convergence improved MCMC sampler

Author
Publication venue: BioMed Central
Publication date: 24/09/2015
Field of study

Springer - Publisher Connector

A tree-like Bayesian structure learning algorithm for small-sample datasets from complex biological model systems

Author
Publication venue: BioMed Central
Publication date: 28/08/2015
Field of study

Springer - Publisher Connector

Development and evaluation of machine learning algorithms for biomedical applications

Author: Turki Turki Talal
Publication venue: Digital Commons @ NJIT
Publication date: 01/04/2017
Field of study

Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches. This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques for selecting topological features suitable for link prediction in gene networks; a graph sparsification method for network sampling; (iii) combined supervised and unsupervised methods to infer gene networks; and (iv) sampling and boosting techniques for reverse engineering gene networks. For drug sensitivity prediction problem, the dissertation presents (i) an instance selection technique and hybrid method for drug sensitivity prediction; (ii) a link prediction approach to drug sensitivity prediction; a noise-filtering method for drug sensitivity prediction; and (iv) transfer learning approaches for enhancing the performance of drug sensitivity prediction. Substantial experiments are conducted to evaluate the effectiveness and efficiency of the proposed algorithms. Experimental results demonstrate the feasibility of the algorithms and their superiority over the existing approaches

Digital Commons @ New Jersey Institute of Technology (NJIT)

Distributed Bayesian networks reconstruction on the whole genome scale

Author: Alina Frolova
Bartek Wilczyński
Publication venue: 'PeerJ'
Publication date: 01/10/2018
Field of study

Background Bayesian networks are directed acyclic graphical models widely used to represent the probabilistic relationships between random variables. They have been applied in various biological contexts, including gene regulatory networks and protein–protein interactions inference. Generally, learning Bayesian networks from experimental data is NP-hard, leading to widespread use of heuristic search methods giving suboptimal results. However, in cases when the acyclicity of the graph can be externally ensured, it is possible to find the optimal network in polynomial time. While our previously developed tool BNFinder implements polynomial time algorithm, reconstructing networks with the large amount of experimental data still leads to computations on single CPU growing exceedingly. Results In the present paper we propose parallelized algorithm designed for multi-core and distributed systems and its implementation in the improved version of BNFinder—tool for learning optimal Bayesian networks. The new algorithm has been tested on different simulated and experimental datasets showing that it has much better efficiency of parallelization than the previous version. BNFinder gives comparable results in terms of accuracy with respect to current state-of-the-art inference methods, giving significant advantage in cases when external information such as regulators list or prior edge probability can be introduced, particularly for datasets with static gene expression observations. Conclusions We show that the new method can be used to reconstruct networks in the size range of thousands of genes making it practically applicable to whole genome datasets of prokaryotic systems and large components of eukaryotic genomes. Our benchmarking results on realistic datasets indicate that the tool should be useful to a wide audience of researchers interested in discovering dependencies in their large-scale transcriptomic datasets

Directory of Open Access Journals

Inferring sparse networks for noisy transient processes

Author: Satish T.S. Bukkapatnam
Tran Hoang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Inferring causal structures of real world complex networks from measured time series signals remains an open issue. The current approaches are inadequate to discern between direct versus indirect influences (i.e., the presence or absence of a directed arc connecting two nodes) in the presence of noise, sparse interactions, as well as nonlinear and transient dynamics of real world processes. We report a sparse regression (referred to as the [Image: see text]-min) approach with theoretical bounds on the constraints on the allowable perturbation to recover the network structure that guarantees sparsity and robustness to noise. We also introduce averaging and perturbation procedures to further enhance prediction scores (i.e., reduce inference errors), and the numerical stability of [Image: see text]-min approach. Extensive investigations have been conducted with multiple benchmark simulated genetic regulatory network and Michaelis-Menten dynamics, as well as real world data sets from DREAM5 challenge. These investigations suggest that our approach can significantly improve, oftentimes by 5 orders of magnitude over the methods reported previously for inferring the structure of dynamic networks, such as Bayesian network, network deconvolution, silencing and modular response analysis methods based on optimizing for sparsity, transients, noise and high dimensionality issues

Texas A&M Repository

PubMed Central