Search CORE

881 research outputs found

Comparative study of discretization methods of microarray data for inferring transcriptional regulatory networks

Author: Bai Xi
Cai Hua
Guo Dianjing
Ji Wei
Li Yong
Liu Lili
Zhu Yanming
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Microarray data discretization is a basic preprocess for many algorithms of gene regulatory network inference. Some common discretization methods in informatics are used to discretize microarray data. Selection of the discretization method is often arbitrary and no systematic comparison of different discretization has been conducted, in the context of gene regulatory network inference from time series gene expression data. Results In this study, we propose a new discretization method "bikmeans", and compare its performance with four other widely-used discretization methods using different datasets, modeling algorithms and number of intervals. Sensitivities, specificities and total accuracies were calculated and statistical analysis was carried out. Bikmeans method always gave high total accuracies. Conclusions Our results indicate that proper discretization methods can consistently improve gene regulatory network inference independent of network modeling algorithms and datasets. Our new method, bikmeans, resulted in significant better total accuracies than other methods.</p

Springer - Publisher Connector

Directory of Open Access Journals

Development and evaluation of machine learning algorithms for biomedical applications

Author: Turki Turki Talal
Publication venue: Digital Commons @ NJIT
Publication date: 01/04/2017
Field of study

Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches. This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques for selecting topological features suitable for link prediction in gene networks; a graph sparsification method for network sampling; (iii) combined supervised and unsupervised methods to infer gene networks; and (iv) sampling and boosting techniques for reverse engineering gene networks. For drug sensitivity prediction problem, the dissertation presents (i) an instance selection technique and hybrid method for drug sensitivity prediction; (ii) a link prediction approach to drug sensitivity prediction; a noise-filtering method for drug sensitivity prediction; and (iv) transfer learning approaches for enhancing the performance of drug sensitivity prediction. Substantial experiments are conducted to evaluate the effectiveness and efficiency of the proposed algorithms. Experimental results demonstrate the feasibility of the algorithms and their superiority over the existing approaches

Digital Commons @ New Jersey Institute of Technology (NJIT)

Genome-scale bacterial transcriptional regulatory networks: reconstruction and integrated analysis with metabolic models

Author: Alkema
Babu
Balaji
Balleza
Banta
Bar-Joseph
Basso
Bonneau
Bonneau
Brazhnik
Brazma
Brazma
Bro
C. S. Henry
Cantone
Casadesus
Castro-Melchor
Cerulo
Chen
Chen
Chuang
Cloots
Covert
Covert
Covert
Covert
Covert
Croucher
de Jong
De Smet
di Bernardo
Edgar
Edwards
Engelen
Ernst
F. Xia
Faith
Feist
Friedman
Friedman
Gardner
Gelfand
Geurts
Greenfield
Gustafsson
Henry
Herring
Hohmann
I. Rocha
Ihmels
Iyer
J. P. Faria
Jha
Karlebach
Karr
Kauffman
Kim
Kim
Leek
Lemmens
Llaneras
Lozada-Chavez
M. Rocha
Machado
Madan Babu
Madan Babu
Madar
Marbach
McCue
Michoel
Min Lee
Mironov
Moreno-Campuzano
Mortazavi
Mwangi
Narendra
Nudler
Oberto
Otero
Overbeek
Overbeek
Palsson
Papin
Pareja
Perkins
Pilpel
Price
Price
Price
Price
Prill
R. Overbeek
Reed
Ren
Robison
Rodionov
Rodionov
Rodionov
Roh
Roth
Ruppin
Segal
Sherlock
Shlomi
Simons
Stolovitzky
Struhl
Su
Tan
Teichmann
Terzer
Tomita
Tompa
Varner
Velculescu
Velculescu
Vilaca
Wade
Wang
Willenbrock
Wu
Yeung
Yip
Yoon
Yoon
You
Young
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

Advances in sequencing technology are resulting in the rapid emergence of large numbers of complete genome sequences. High throughput annotation and metabolic modeling of these genomes is now a reality. The high throughput reconstruction and analysis of genome-scale transcriptional regulatory networks represents the next frontier in microbial bioinformatics. The fruition of this next frontier will depend upon the integration of numerous data sources relating to mechanisms, components, and behavior of the transcriptional regulatory machinery, as well as the integration of the regulatory machinery into genome-scale cellular models. Here we review existing repositories for different types of transcriptional regulatory data, including expression data, transcription factor data, and binding site locations, and we explore how these data are being used for the reconstruction of new regulatory networks. From template network based methods to de novo reverse engineering from expression data, we discuss how regulatory networks can be reconstructed and integrated with metabolic models to improve model predictions and performance. Finally, we explore the impact these integrated models can have in simulating phenotypes, optimizing the production of compounds of interest or paving the way to a whole-cell model.J.P.F. acknowledges funding from [SFRH/BD/70824/2010] of the FCT (Portuguese Foundation for Science and Technology) PhD program. The work was supported in part by the ERDF—European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness), National Funds through the FCT within projects [FCOMP-01-0124-FEDER015079] (ToMEGIM—Computational Tools for Metabolic Engineering using Genome-scale Integrated Models) and FCOMP-01-0124-FEDER009707 (HeliSysBio—molecular Systems Biology in Helicobacter pylori), the U.S. Department of Energy under contract [DE-ACO2-06CH11357] and the National Science Foundation under [0850546]

Algebraic shortcuts for leave-one-out cross-validation in supervised network inference

Author: Airola Antti
De Baets Bernard
Pahikkala Tapio
Stock Michiel
Waegeman Willem
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models