19,136 research outputs found
Subgraph covers -- An information theoretic approach to motif analysis in networks
Many real world networks contain a statistically surprising number of certain
subgraphs, called network motifs. In the prevalent approach to motif analysis,
network motifs are detected by comparing subgraph frequencies in the original
network with a statistical null model. In this paper we propose an alternative
approach to motif analysis where network motifs are defined to be connectivity
patterns that occur in a subgraph cover that represents the network using
minimal total information. A subgraph cover is defined to be a set of subgraphs
such that every edge of the graph is contained in at least one of the subgraphs
in the cover. Some recently introduced random graph models that can incorporate
significant densities of motifs have natural formulations in terms of subgraph
covers and the presented approach can be used to match networks with such
models. To prove the practical value of our approach we also present a
heuristic for the resulting NP-hard optimization problem and give results for
several real world networks.Comment: 10 pages, 7 tables, 1 Figur
Detecting communities of triangles in complex networks using spectral optimization
The study of the sub-structure of complex networks is of major importance to
relate topology and functionality. Many efforts have been devoted to the
analysis of the modular structure of networks using the quality function known
as modularity. However, generally speaking, the relation between topological
modules and functional groups is still unknown, and depends on the semantic of
the links. Sometimes, we know in advance that many connections are transitive
and, as a consequence, triangles have a specific meaning. Here we propose the
study of the modular structure of networks considering triangles as the
building blocks of modules. The method generalizes the standard modularity and
uses spectral optimization to find its maximum. We compare the partitions
obtained with those resulting from the optimization of the standard modularity
in several real networks. The results show that the information reported by the
analysis of modules of triangles complements the information of the classical
modularity analysis.Comment: Computer Communications (in press
Reciprocity in Social Networks with Capacity Constraints
Directed links -- representing asymmetric social ties or interactions (e.g.,
"follower-followee") -- arise naturally in many social networks and other
complex networks, giving rise to directed graphs (or digraphs) as basic
topological models for these networks. Reciprocity, defined for a digraph as
the percentage of edges with a reciprocal edge, is a key metric that has been
used in the literature to compare different directed networks and provide
"hints" about their structural properties: for example, are reciprocal edges
generated randomly by chance or are there other processes driving their
generation? In this paper we study the problem of maximizing achievable
reciprocity for an ensemble of digraphs with the same prescribed in- and
out-degree sequences. We show that the maximum reciprocity hinges crucially on
the in- and out-degree sequences, which may be intuitively interpreted as
constraints on some "social capacities" of nodes and impose fundamental limits
on achievable reciprocity. We show that it is NP-complete to decide the
achievability of a simple upper bound on maximum reciprocity, and provide
conditions for achieving it. We demonstrate that many real networks exhibit
reciprocities surprisingly close to the upper bound, which implies that users
in these social networks are in a sense more "social" than suggested by the
empirical reciprocity alone in that they are more willing to reciprocate,
subject to their "social capacity" constraints. We find some surprising linear
relationships between empirical reciprocity and the bound. We also show that a
particular type of small network motifs that we call 3-paths are the major
source of loss in reciprocity for real networks
Node similarity within subgraphs of protein interaction networks
We propose a biologically motivated quantity, twinness, to evaluate local
similarity between nodes in a network. The twinness of a pair of nodes is the
number of connected, labeled subgraphs of size n in which the two nodes possess
identical neighbours. The graph animal algorithm is used to estimate twinness
for each pair of nodes (for subgraph sizes n=4 to n=12) in four different
protein interaction networks (PINs). These include an Escherichia coli PIN and
three Saccharomyces cerevisiae PINs -- each obtained using state-of-the-art
high throughput methods. In almost all cases, the average twinness of node
pairs is vastly higher than expected from a null model obtained by switching
links. For all n, we observe a difference in the ratio of type A twins (which
are unlinked pairs) to type B twins (which are linked pairs) distinguishing the
prokaryote E. coli from the eukaryote S. cerevisiae. Interaction similarity is
expected due to gene duplication, and whole genome duplication paralogues in S.
cerevisiae have been reported to co-cluster into the same complexes. Indeed, we
find that these paralogous proteins are over-represented as twins compared to
pairs chosen at random. These results indicate that twinness can detect
ancestral relationships from currently available PIN data.Comment: 10 pages, 5 figures. Edited for typos, clarity, figures improved for
readabilit
A methodology for determining amino-acid substitution matrices from set covers
We introduce a new methodology for the determination of amino-acid
substitution matrices for use in the alignment of proteins. The new methodology
is based on a pre-existing set cover on the set of residues and on the
undirected graph that describes residue exchangeability given the set cover.
For fixed functional forms indicating how to obtain edge weights from the set
cover and, after that, substitution-matrix elements from weighted distances on
the graph, the resulting substitution matrix can be checked for performance
against some known set of reference alignments and for given gap costs. Finding
the appropriate functional forms and gap costs can then be formulated as an
optimization problem that seeks to maximize the performance of the substitution
matrix on the reference alignment set. We give computational results on the
BAliBASE suite using a genetic algorithm for optimization. Our results indicate
that it is possible to obtain substitution matrices whose performance is either
comparable to or surpasses that of several others, depending on the particular
scenario under consideration
The Graph Motif problem parameterized by the structure of the input graph
The Graph Motif problem was introduced in 2006 in the context of biological
networks. It consists of deciding whether or not a multiset of colors occurs in
a connected subgraph of a vertex-colored graph. Graph Motif has been mostly
analyzed from the standpoint of parameterized complexity. The main parameters
which came into consideration were the size of the multiset and the number of
colors. Though, in the many applications of Graph Motif, the input graph
originates from real-life and has structure. Motivated by this prosaic
observation, we systematically study its complexity relatively to graph
structural parameters. For a wide range of parameters, we give new or improved
FPT algorithms, or show that the problem remains intractable. For the FPT
cases, we also give some kernelization lower bounds as well as some ETH-based
lower bounds on the worst case running time. Interestingly, we establish that
Graph Motif is W[1]-hard (while in W[P]) for parameter max leaf number, which
is, to the best of our knowledge, the first problem to behave this way.Comment: 24 pages, accepted in DAM, conference version in IPEC 201
Some results on more flexible versions of Graph Motif
The problems studied in this paper originate from Graph Motif, a problem
introduced in 2006 in the context of biological networks. Informally speaking,
it consists in deciding if a multiset of colors occurs in a connected subgraph
of a vertex-colored graph. Due to the high rate of noise in the biological
data, more flexible definitions of the problem have been outlined. We present
in this paper two inapproximability results for two different optimization
variants of Graph Motif: one where the size of the solution is maximized, the
other when the number of substitutions of colors to obtain the motif from the
solution is minimized. We also study a decision version of Graph Motif where
the connectivity constraint is replaced by the well known notion of graph
modularity. While the problem remains NP-complete, it allows algorithms in FPT
for biologically relevant parameterizations
- …