Search CORE

955 research outputs found

Information content of colored motifs in complex networks

Author: Arend Hintze
Christoph Adami
Jifeng Qian
Kolmogorov A.
Matthew Rupp
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2011
Field of study

We study complex networks in which the nodes of the network are tagged with different colors depending on the functionality of the nodes (colored graphs), using information theory applied to the distribution of motifs in such networks. We find that colored motifs can be viewed as the building blocks of the networks (much more so than the uncolored structural motifs can be) and that the relative frequency with which these motifs appear in the network can be used to define the information content of the network. This information is defined in such a way that a network with random coloration (but keeping the relative number of nodes with different colors the same) has zero color information content. Thus, colored motif information captures the exceptionality of coloring in the motifs that is maintained via selection. We study the motif information content of the C. elegans brain as well as the evolution of colored motif information in networks that reflect the interaction between instructions in genomes of digital life organisms. While we find that colored motif information appears to capture essential functionality in the C. elegans brain (where the color assignment of nodes is straightforward) it is not obvious whether the colored motif information content always increases during evolution, as would be expected from a measure that captures network complexity. For a single choice of color assignment of instructions in the digital life form Avida, we find rather that colored motif information content increases or decreases during evolution, depending on how the genomes are organized, and therefore could be an interesting tool to dissect genomic rearrangements.Comment: 21 pages, 8 figures, to appear in Artificial Lif

arXiv.org e-Print Archive

CiteSeerX

Crossref

Assessing the Exceptionality of Coloured Motifs in Networks

Author: Lacroix Vincent
Sagot Marie-France
Schbath Sophie
Publication venue: BioMed Central
Publication date: 26/10/2008
Field of study

Various methods have been recently employed to characterise the structure of biological networks. In particular, the concept of network motif and the related one of coloured motif have proven useful to model the notion of a functional/evolutionary building block. However, algorithms that enumerate all the motifs of a network may produce a very large output, and methods to decide which motifs should be selected for downstream analysis are needed. A widely used method is to assess if the motif is exceptional, that is, over- or under-represented with respect to a null hypothesis. Much effort has been put in the last thirty years to derive P-values for the frequencies of topological motifs, that is, fixed subgraphs. They rely either on (compound) Poisson and Gaussian approximations for the motif count distribution in Erdös-Rényi random graphs or on simulations in other models. We focus on a different definition of graph motifs that corresponds to coloured motifs. A coloured motif is a connected subgraph with fixed vertex colours but unspecified topology. Our work is the first analytical attempt to assess the exceptionality of coloured motifs in networks without any simulation. We first establish analytical formulae for the mean and the variance of the count of a coloured motif in an Erdös-Rényi random graph model. Using simulations under this model, we further show that a Pólya-Aeppli distribution bette

CiteSeerX

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

Directory of Open Access Journals

PubMed Central

HAL Descartes

UPF Digital Repository

An R Implementation of the Polya-Aeppli Distribution

Author: Burden Conrad J.
Publication venue
Publication date: 11/06/2014
Field of study

An efficient implementation of the Polya-Aeppli, or geometirc compound Poisson, distribution in the statistical programming language R is presented. The implementation is available as the package polyaAeppli and consists of functions for the mass function, cumulative distribution function, quantile function and random variate generation with those parameters conventionally provided for standard univatiate probability distributions in the stats package in RComment: 9 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Approximating stationary distributions of fast mixing Glauber dynamics, with applications to exponential random graphs

Author: Reinert Gesine
Ross Nathan
Publication venue
Publication date: 11/10/2018
Field of study

We provide a general bound on the Wasserstein distance between two arbitrary distributions of sequences of Bernoulli random variables. The bound is in terms of a mixing quantity for the Glauber dynamics of one of the sequences, and a simple expectation of the other. The result is applied to estimate, with explicit error, expectations of functions of random vectors for some Ising models and exponential random graphs in "high temperature" regimes.Comment: Ver3: 24 pages, major revision with new results; Ver2: updated reference; Ver1: 19 pages, 1 figur

arXiv.org e-Print Archive

Oxford University Research Archive

Variational Bayes model averaging for graphon functions and motif frequencies inference in W-graph models

Author: Latouche P.
Robin S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/11/2015
Field of study

W-graph refers to a general class of random graph models that can be seen as a random graph limit. It is characterized by both its graphon function and its motif frequencies. In this paper, relying on an existing variational Bayes algorithm for the stochastic block models along with the corresponding weights for model averaging, we derive an estimate of the graphon function as an average of stochastic block models with increasing number of blocks. In the same framework, we derive the variational posterior frequency of any motif. A simulation study and an illustration on a social network complete our work

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris1

Finding and counting vertex-colored subtrees

Author: A. Björklund
A.M. Ambalath
E. Alm
F. Hüffner
Florian Sikora
G. Blin
I. Koutis
I. Koutis
J. Flum
J. Flum
J. Nederlof
M.R. Fellows
N. Alon
N. Betzler
R. Dondi
R. Dondi
R. Karp
R. Sharan
R. Williams
S. Bruckner
S. Böcker
S. Guillemot
S. Schbath
Sylvain Guillemot
V. Arvind
V. Lacroix
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/02/2012
Field of study

The problems studied in this article originate from the Graph Motif problem introduced by Lacroix et al. in the context of biological networks. The problem is to decide if a vertex-colored graph has a connected subgraph whose colors equal a given multiset of colors

M

. It is a graph pattern-matching problem variant, where the structure of the occurrence of the pattern is not of interest but the only requirement is the connectedness. Using an algebraic framework recently introduced by Koutis et al., we obtain new FPT algorithms for Graph Motif and variants, with improved running times. We also obtain results on the counting versions of this problem, proving that the counting problem is FPT if M is a set, but becomes W[1]-hard if M is a multiset with two colors. Finally, we present an experimental evaluation of this approach on real datasets, showing that its performance compares favorably with existing software.Comment: Conference version in International Symposium on Mathematical Foundations of Computer Science (MFCS), Brno : Czech Republic (2010) Journal Version in Algorithmic

arXiv.org e-Print Archive

Crossref

Variational principle for scale-free network motifs

Author: Stegehuis Clara
van der Hofstad Remco
van Leeuwaarden Johan S. H.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

For scale-free networks with degrees following a power law with an exponent

\tau\in(2,3)

, the structures of motifs (small subgraphs) are not yet well understood. We introduce a method designed to identify the dominant structure of any given motif as the solution of an optimization problem. The unique optimizer describes the degrees of the vertices that together span the most likely motif, resulting in explicit asymptotic formulas for the motif count and its fluctuations. We then classify all motifs into two categories: motifs with small and large fluctuations

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Time series motifs statistical significance

Author: Azevedo Paulo J.
Castro Nuno Constantino
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2011
Field of study

Time series motif discovery is the task of extracting previously unknown recurrent patterns from time series data. It is an important problem within applications that range from finance to health. Many algorithms have been proposed for the task of eficiently finding motifs. Surprisingly, most of these proposals do not focus on how to evaluate the discovered motifs. They are typically evaluated by human experts. This is unfeasible even for moderately sized datasets, since the number of discovered motifs tends to be prohibitively large. Statistical significance tests are widely used in bioinformatics and association rules mining communities to evaluate the extracted patterns. In this work we present an approach to calculate time series motifs statistical significance. Our proposal leverages work from the bioinformatics community by using a symbolic definition of time series motifs to derive each motif's p-value. We estimate the expected frequency of a motif by using Markov Chain models. The p-value is then assessed by comparing the actual frequency to the estimated one using statistical hypothesis tests. Our contribution gives means to the application of a powerful technique - statistical tests - to a time series setting.This provides researchers and practitioners with an important tool to evaluate automatically the degree of relevance of each extracted motif.(undefined

CiteSeerX

Universidade do Minho: RepositoriUM

Crossref

SIGffRid: A tool to search for sigma factor binding sites in bacterial genomes using comparative approach and biologically driven statistics

Abstract Background Many programs have been developed to identify transcription factor binding sites. However, most of them are not able to infer two-word motifs with variable spacer lengths. This case is encountered for RNA polymerase Sigma (<it>σ</it>) Factor Binding Sites (SFBSs) usually composed of two boxes, called -35 and -10 in reference to the transcription initiation point. Our goal is to design an algorithm detecting SFBS by using combinational and statistical constraints deduced from biological observations. Results We describe a new approach to identify SFBSs by comparing two related bacterial genomes. The method, named SIGffRid (SIGma Factor binding sites Finder using R'MES to select Input Data), performs a simultaneous analysis of pairs of promoter regions of orthologous genes. SIGffRid uses a prior identification of over-represented patterns in whole genomes as selection criteria for potential -35 and -10 boxes. These patterns are then grouped using pairs of short seeds (of which one is possibly gapped), allowing a variable-length spacer between them. Next, the motifs are extended guided by statistical considerations, a feature that ensures a selection of motifs with statistically relevant properties. We applied our method to the pair of related bacterial genomes of <it>Streptomyces coelicolor </it>and <it>Streptomyces avermitilis</it>. Cross-check with the well-defined SFBSs of the SigR regulon in <it>S. coelicolor </it>is detailed, validating the algorithm. SFBSs for HrdB and BldN were also found; and the results suggested some new targets for these <it>σ </it>factors. In addition, consensus motifs for BldD and new SFBSs binding sites were defined, overlapping previously proposed consensuses. Relevant tests were carried out also on bacteria with moderate GC content (i.e. <it>Escherichia coli</it>/<it>Salmonella typhimurium </it>and <it>Bacillus subtilis</it>/<it>Bacillus licheniformis </it>pairs). Motifs of house-keeping <it>σ </it>factors were found as well as other SFBSs such as that of SigW in <it>Bacillus </it>strains. Conclusion We demonstrate that our approach combining statistical and biological criteria was successful to predict SFBSs. The method versatility autorizes the recognition of other kinds of two-box regulatory sites.</p

HAL - Lille 3

Crossref

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

HAL Descartes

Queensland University of Technology ePrints Archive

Swinburne Research Bank

ProdInra