2,475 research outputs found
Generating constrained random graphs using multiple edge switches
The generation of random graphs using edge swaps provides a reliable method
to draw uniformly random samples of sets of graphs respecting some simple
constraints, e.g. degree distributions. However, in general, it is not
necessarily possible to access all graphs obeying some given con- straints
through a classical switching procedure calling on pairs of edges. We therefore
propose to get round this issue by generalizing this classical approach through
the use of higher-order edge switches. This method, which we denote by "k-edge
switching", makes it possible to progres- sively improve the covered portion of
a set of constrained graphs, thereby providing an increasing, asymptotically
certain confidence on the statistical representativeness of the obtained
sample.Comment: 15 page
Sampling motif-constrained ensembles of networks
The statistical significance of network properties is conditioned on null
models which satisfy spec- ified properties but that are otherwise random.
Exponential random graph models are a principled theoretical framework to
generate such constrained ensembles, but which often fail in practice, either
due to model inconsistency, or due to the impossibility to sample networks from
them. These problems affect the important case of networks with prescribed
clustering coefficient or number of small connected subgraphs (motifs). In this
paper we use the Wang-Landau method to obtain a multicanonical sampling that
overcomes both these problems. We sample, in polynomial time, net- works with
arbitrary degree sequences from ensembles with imposed motifs counts. Applying
this method to social networks, we investigate the relation between
transitivity and homophily, and we quantify the correlation between different
types of motifs, finding that single motifs can explain up to 60% of the
variation of motif profiles.Comment: Updated version, as published in the journal. 7 pages, 5 figures, one
Supplemental Materia
Some results on more flexible versions of Graph Motif
The problems studied in this paper originate from Graph Motif, a problem
introduced in 2006 in the context of biological networks. Informally speaking,
it consists in deciding if a multiset of colors occurs in a connected subgraph
of a vertex-colored graph. Due to the high rate of noise in the biological
data, more flexible definitions of the problem have been outlined. We present
in this paper two inapproximability results for two different optimization
variants of Graph Motif: one where the size of the solution is maximized, the
other when the number of substitutions of colors to obtain the motif from the
solution is minimized. We also study a decision version of Graph Motif where
the connectivity constraint is replaced by the well known notion of graph
modularity. While the problem remains NP-complete, it allows algorithms in FPT
for biologically relevant parameterizations
A generic algorithm for layout of biological networks
BackgroundBiological networks are widely used to represent processes in biological systems and to capture interactions and dependencies between biological entities. Their size and complexity is steadily increasing due to the ongoing growth of knowledge in the life sciences. To aid understanding of biological networks several algorithms for laying out and graphically representing networks and network analysis results have been developed. However, current algorithms are specialized to particular layout styles and therefore different algorithms are required for each kind of network and/or style of layout. This increases implementation effort and means that new algorithms must be developed for new layout styles. Furthermore, additional effort is necessary to compose different layout conventions in the same diagram. Also the user cannot usually customize the placement of nodes to tailor the layout to their particular need or task and there is little support for interactive network exploration.ResultsWe present a novel algorithm to visualize different biological networks and network analysis results in meaningful ways depending on network types and analysis outcome. Our method is based on constrained graph layout and we demonstrate how it can handle the drawing conventions used in biological networks.ConclusionThe presented algorithm offers the ability to produce many of the fundamental popular drawing styles while allowing the exibility of constraints to further tailor these layouts.publishe
Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity
Background: Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the “twilight-” or “midnight– ” zones where pair-wise sequence identities to known sequences fall below 25 % and sequence-based functional annotations often fail. Results: Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in “immunoevasins”, proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusions: We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty
Recommended from our members
A computer system to perform structure comparison using TOPS representations of protein structure
We describe the design and implementation of a fast topology–based method
for protein structure comparison. The approach uses the TOPS topological representation
of protein structure, aligning two structures using a common discovered
pattern and generating measure of distance derived from an insert score. Heavy
use is made of a constraint-based pattern matching algorithm for TOPS diagrams
that we have designed and described elsewhere Gilbert et al. (1999). The comparison
system is maintained at the European Bioinformatics Institute and is available
over the Web via the at tops.ebi.ac.uk/tops. Users submit a structure description in
Protein Data Bank (PDB) format and can compare it with structures in the entire
PDB or a representative subset of protein domains, receiving the results by email
- …