Search CORE

10,747 research outputs found

Patterns of subnet usage reveal distinct scales of regulation in the transcriptional regulatory network of Escherichia coli

Author: A Travers
C Marr
Carsten Marr
DP Sangurdekar
E Krause
Fabian J. Theis
G Balázsi
H Yu
J Vogel
J Ward Jr
JD Glasner
JJ Faith
Larry S. Liebovitch
M. Madan Babu
Marc-Thorsten Hütt
MJ Herrgard
N Blot
N Sonnenschein
NM Luscombe
O Alter
Q Cui
R Milo
R Milo
RM Gutierrez-Rios
S Gama-Castro
S Gottesman
S Mangan
S Mangan
S Mangan
SS Shen-Orr
T Beissbarth
VG Tusher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2010
Field of study

The set of regulatory interactions between genes, mediated by transcription factors, forms a species' transcriptional regulatory network (TRN). By comparing this network with measured gene expression data one can identify functional properties of the TRN and gain general insight into transcriptional control. We define the subnet of a node as the subgraph consisting of all nodes topologically downstream of the node, including itself. Using a large set of microarray expression data of the bacterium Escherichia coli, we find that the gene expression in different subnets exhibits a structured pattern in response to environmental changes and genotypic mutation. Subnets with less changes in their expression pattern have a higher fraction of feed-forward loop motifs and a lower fraction of small RNA targets within them. Our study implies that the TRN consists of several scales of regulatory organization: 1) subnets with more varying gene expression controlled by both transcription factors and post-transcriptional RNA regulation, and 2) subnets with less varying gene expression having more feed-forward loops and less post-transcriptional RNA regulation.Comment: 14 pages, 8 figures, to be published in PLoS Computational Biolog

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

City University of New York

Crossref

Directory of Open Access Journals

PubMed Central

PuSH

An introduction to Graph Data Management

Author: A Dries
A Gutiérrez
A Iosup
A Morari
A Poulovassilis
AD Zhu
AO Mendelzon
B Amann
B Elser
C Berge
C Vicknair
C Watters
C Weiss
CS Chang
D Conte
D Dominguez-Sal
D Theodoratos
DC Faye
DW Shipman
EF Codd
FW Tompa
G Malewicz
GM Kuper
H He
HS Kunii
IF Cruz
IF Cruz
J Hidders
J Paredaens
J Peckham
J. Hidders
Jonathan Hayes
K Zeng
L Kowalik
L Zou
M Atre
M Ciglan
M Consens
M Gemis
M Gyssens
M Han
M Levene
M Levene
M Levene
M Mainguenaud
M Schmidt
M Yannakakis
MA Bornea
MA Rodriguez
MA Rodriguez
Marc Andries
MP Consens
MP Consens
N Kiesel
N Roussopoulos
O Erling
P Barceló Baeza
P Buneman
P Yuan
Philippe Cudré-Mauroux
PPS Chen
PT Wood
PT Wood
R Agrawal
R Angles
R Angles
R Brijder
R Ronen
RH Güting
RS Xin
S Abiteboul
S Abiteboul
T Neumann
W Fan
W Kim
Y Guo
Y Low
Y Papakonstantinou
Y Tian
Y Zhao
YA Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/12/2017
Field of study

A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them

arXiv.org e-Print Archive

Crossref

Practical Bayesian Optimization of Machine Learning Algorithms

Author: Adams Ryan P.
Larochelle Hugo
Snoek Jasper
Publication venue
Publication date: 01/01/2012
Field of study

Machine learning algorithms frequently require careful tuning of model hyperparameters, regularization terms, and optimization parameters. Unfortunately, this tuning is often a "black art" that requires expert experience, unwritten rules of thumb, or sometimes brute-force search. Much more appealing is the idea of developing automatic approaches which can optimize the performance of a given learning algorithm to the task at hand. In this work, we consider the automatic tuning problem within the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). The tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next. Here we show how the effects of the Gaussian process prior and the associated inference procedure can have a large impact on the success or failure of Bayesian optimization. We show that thoughtful choices can lead to results that exceed expert-level performance in tuning machine learning algorithms. We also describe new algorithms that take into account the variable cost (duration) of learning experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization on a diverse set of contemporary algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks

arXiv.org e-Print Archive

CiteSeerX

Multiple instance learning for sequence data with across bag dependencies

Author: Aridhi Sabeur
Maddouri Mondher
Nguifo Engelbert Mephu
Zoghlami Manel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In Multiple Instance Learning (MIL) problem for sequence data, the instances inside the bags are sequences. In some real world applications such as bioinformatics, comparing a random couple of sequences makes no sense. In fact, each instance may have structural and/or functional relations with instances of other bags. Thus, the classification task should take into account this across bag relation. In this work, we present two novel MIL approaches for sequence data classification named ABClass and ABSim. ABClass extracts motifs from related instances and use them to encode sequences. A discriminative classifier is then applied to compute a partial classification result for each set of related sequences. ABSim uses a similarity measure to discriminate the related instances and to compute a scores matrix. For both approaches, an aggregation method is applied in order to generate the final classification result. We applied both approaches to solve the problem of bacterial Ionizing Radiation Resistance prediction. The experimental results of the presented approaches are satisfactory

arXiv.org e-Print Archive

HAL Clermont Université

INRIA a CCSD electronic archive server

Ranking and significance of variable-length similarity-based time series motifs

Author: Arcos Josep Lluis
Corral Álvaro
Serra Isabel
Serrà Joan
Publication venue: 'Elsevier BV'
Publication date: 06/03/2015
Field of study

The detection of very similar patterns in a time series, commonly called motifs, has received continuous and increasing attention from diverse scientific communities. In particular, recent approaches for discovering similar motifs of different lengths have been proposed. In this work, we show that such variable-length similarity-based motifs cannot be directly compared, and hence ranked, by their normalized dissimilarities. Specifically, we find that length-normalized motif dissimilarities still have intrinsic dependencies on the motif length, and that lowest dissimilarities are particularly affected by this dependency. Moreover, we find that such dependencies are generally non-linear and change with the considered data set and dissimilarity measure. Based on these findings, we propose a solution to rank those motifs and measure their significance. This solution relies on a compact but accurate model of the dissimilarity space, using a beta distribution with three parameters that depend on the motif length in a non-linear way. We believe the incomparability of variable-length dissimilarities could go beyond the field of time series, and that similar modeling strategies as the one used here could be of help in a more broad context.Comment: 20 pages, 10 figure

arXiv.org e-Print Archive

Digital.CSIC

Validating module network learning algorithms using simulated data

Author: A Battle
A Butte
AA Petti
AJ Butte
Anagha Joshi
AP Gasch
CE Shannon
CT Harbison
D Pe'er
D Pe'er
E Segal
E Segal
E Segal
Eric Bonnet
HW Ma
J Kasturi
J Sinkkonen
K Basso
K Lemmens
KA Heller
Kathleen Marchal
Koenraad Van Leemput
LH Hartwell
M Ashburner
MA Beer
Martin Kuiper
MJL de Hoon
N Friedman
N Friedman
NM Luscombe
Piet van Remortel
S Maere
Steven Maere
T Ideker
T Van den Bulcke
T Van den Bulcke
Tim Van den Bulcke
Tom Michoel
X Xu
Y Garten
Yvan Saeys
Yves Van de Peer
Z Bar-Joseph
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators.Comment: 13 pages, 6 figures + 2 pages, 2 figures supplementary informatio

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

Edinburgh Research Explorer