955 research outputs found
Information content of colored motifs in complex networks
We study complex networks in which the nodes of the network are tagged with
different colors depending on the functionality of the nodes (colored graphs),
using information theory applied to the distribution of motifs in such
networks. We find that colored motifs can be viewed as the building blocks of
the networks (much more so than the uncolored structural motifs can be) and
that the relative frequency with which these motifs appear in the network can
be used to define the information content of the network. This information is
defined in such a way that a network with random coloration (but keeping the
relative number of nodes with different colors the same) has zero color
information content. Thus, colored motif information captures the
exceptionality of coloring in the motifs that is maintained via selection. We
study the motif information content of the C. elegans brain as well as the
evolution of colored motif information in networks that reflect the interaction
between instructions in genomes of digital life organisms. While we find that
colored motif information appears to capture essential functionality in the C.
elegans brain (where the color assignment of nodes is straightforward) it is
not obvious whether the colored motif information content always increases
during evolution, as would be expected from a measure that captures network
complexity. For a single choice of color assignment of instructions in the
digital life form Avida, we find rather that colored motif information content
increases or decreases during evolution, depending on how the genomes are
organized, and therefore could be an interesting tool to dissect genomic
rearrangements.Comment: 21 pages, 8 figures, to appear in Artificial Lif
Assessing the Exceptionality of Coloured Motifs in Networks
Various methods have been recently employed to characterise the structure of biological networks. In particular, the concept of network motif and the related one of coloured motif have proven useful to model the notion of a functional/evolutionary building block. However, algorithms that enumerate all the motifs of a network may produce a very large output, and methods to decide which motifs should be selected for downstream analysis are needed. A widely used method is to assess if the motif is exceptional, that is, over- or under-represented with respect to a null hypothesis. Much effort has been put in the last thirty years to derive P-values for the frequencies of topological motifs, that is, fixed subgraphs. They rely either on (compound) Poisson and Gaussian approximations for the motif count distribution in Erdös-Rényi random graphs or on simulations in other models. We focus on a different definition of graph motifs that corresponds to coloured motifs. A coloured motif is a connected subgraph with fixed vertex colours but unspecified topology. Our work is the first analytical attempt to assess the exceptionality of coloured motifs in networks without any simulation. We first establish analytical formulae for the mean and the variance of the count of a coloured motif in an Erdös-Rényi random graph model. Using simulations under this model, we further show that a Pólya-Aeppli distribution bette
An R Implementation of the Polya-Aeppli Distribution
An efficient implementation of the Polya-Aeppli, or geometirc compound
Poisson, distribution in the statistical programming language R is presented.
The implementation is available as the package polyaAeppli and consists of
functions for the mass function, cumulative distribution function, quantile
function and random variate generation with those parameters conventionally
provided for standard univatiate probability distributions in the stats package
in RComment: 9 pages, 2 figure
Approximating stationary distributions of fast mixing Glauber dynamics, with applications to exponential random graphs
We provide a general bound on the Wasserstein distance between two arbitrary
distributions of sequences of Bernoulli random variables. The bound is in terms
of a mixing quantity for the Glauber dynamics of one of the sequences, and a
simple expectation of the other. The result is applied to estimate, with
explicit error, expectations of functions of random vectors for some Ising
models and exponential random graphs in "high temperature" regimes.Comment: Ver3: 24 pages, major revision with new results; Ver2: updated
reference; Ver1: 19 pages, 1 figur
Variational Bayes model averaging for graphon functions and motif frequencies inference in W-graph models
W-graph refers to a general class of random graph models that can be seen as
a random graph limit. It is characterized by both its graphon function and its
motif frequencies. In this paper, relying on an existing variational Bayes
algorithm for the stochastic block models along with the corresponding weights
for model averaging, we derive an estimate of the graphon function as an
average of stochastic block models with increasing number of blocks. In the
same framework, we derive the variational posterior frequency of any motif. A
simulation study and an illustration on a social network complete our work
Finding and counting vertex-colored subtrees
The problems studied in this article originate from the Graph Motif problem
introduced by Lacroix et al. in the context of biological networks. The problem
is to decide if a vertex-colored graph has a connected subgraph whose colors
equal a given multiset of colors . It is a graph pattern-matching problem
variant, where the structure of the occurrence of the pattern is not of
interest but the only requirement is the connectedness. Using an algebraic
framework recently introduced by Koutis et al., we obtain new FPT algorithms
for Graph Motif and variants, with improved running times. We also obtain
results on the counting versions of this problem, proving that the counting
problem is FPT if M is a set, but becomes W[1]-hard if M is a multiset with two
colors. Finally, we present an experimental evaluation of this approach on real
datasets, showing that its performance compares favorably with existing
software.Comment: Conference version in International Symposium on Mathematical
Foundations of Computer Science (MFCS), Brno : Czech Republic (2010) Journal
Version in Algorithmic
Variational principle for scale-free network motifs
For scale-free networks with degrees following a power law with an exponent
, the structures of motifs (small subgraphs) are not yet well
understood. We introduce a method designed to identify the dominant structure
of any given motif as the solution of an optimization problem. The unique
optimizer describes the degrees of the vertices that together span the most
likely motif, resulting in explicit asymptotic formulas for the motif count and
its fluctuations. We then classify all motifs into two categories: motifs with
small and large fluctuations
Time series motifs statistical significance
Time series motif discovery is the task of extracting previously unknown recurrent patterns from time series data. It is an important problem within applications that range from finance to health. Many algorithms have been proposed for the task of eficiently finding motifs. Surprisingly, most of these proposals do not focus on how to evaluate the discovered motifs. They are typically evaluated by human experts. This is unfeasible even for moderately sized datasets, since the number of discovered motifs tends to be prohibitively large. Statistical significance tests are widely used in bioinformatics and association rules mining communities to evaluate the extracted patterns. In this work we present an approach to calculate time series motifs statistical significance. Our proposal leverages work from the bioinformatics community by using a symbolic definition of time series motifs to derive each motif's p-value. We estimate the expected frequency of a motif by using Markov Chain models. The p-value is then assessed by comparing the actual frequency to the estimated one using statistical hypothesis tests. Our contribution gives means to the application of a powerful technique - statistical tests - to a time series setting.This provides researchers and practitioners with an important tool to evaluate automatically the degree of relevance of each extracted motif.(undefined
SIGffRid: A tool to search for sigma factor binding sites in bacterial genomes using comparative approach and biologically driven statistics
<p>Abstract</p> <p>Background</p> <p>Many programs have been developed to identify transcription factor binding sites. However, most of them are not able to infer two-word motifs with variable spacer lengths. This case is encountered for RNA polymerase Sigma (<it>σ</it>) Factor Binding Sites (SFBSs) usually composed of two boxes, called -35 and -10 in reference to the transcription initiation point. Our goal is to design an algorithm detecting SFBS by using combinational and statistical constraints deduced from biological observations.</p> <p>Results</p> <p>We describe a new approach to identify SFBSs by comparing two related bacterial genomes. The method, named SIGffRid (SIGma Factor binding sites Finder using R'MES to select Input Data), performs a simultaneous analysis of pairs of promoter regions of orthologous genes. SIGffRid uses a prior identification of over-represented patterns in whole genomes as selection criteria for potential -35 and -10 boxes. These patterns are then grouped using pairs of short seeds (of which one is possibly gapped), allowing a variable-length spacer between them. Next, the motifs are extended guided by statistical considerations, a feature that ensures a selection of motifs with statistically relevant properties. We applied our method to the pair of related bacterial genomes of <it>Streptomyces coelicolor </it>and <it>Streptomyces avermitilis</it>. Cross-check with the well-defined SFBSs of the SigR regulon in <it>S. coelicolor </it>is detailed, validating the algorithm. SFBSs for HrdB and BldN were also found; and the results suggested some new targets for these <it>σ </it>factors. In addition, consensus motifs for BldD and new SFBSs binding sites were defined, overlapping previously proposed consensuses. Relevant tests were carried out also on bacteria with moderate GC content (i.e. <it>Escherichia coli</it>/<it>Salmonella typhimurium </it>and <it>Bacillus subtilis</it>/<it>Bacillus licheniformis </it>pairs). Motifs of house-keeping <it>σ </it>factors were found as well as other SFBSs such as that of SigW in <it>Bacillus </it>strains.</p> <p>Conclusion</p> <p>We demonstrate that our approach combining statistical and biological criteria was successful to predict SFBSs. The method versatility autorizes the recognition of other kinds of two-box regulatory sites.</p
- …