37,800 research outputs found
Fingerprint for Network Topologies
A network's topology information can be given as an adjacency matrix. The
bitmap of sorted adjacency matrix(BOSAM) is a network visualisation tool which
can emphasise different network structures by just looking at reordered
adjacent matrixes. A BOSAM picture resembles the shape of a flower and is
characterised by a series of 'leaves'. Here we show and mathematically prove
that for most networks, there is a self-similar relation between the envelope
of the BOSAM leaves. This self-similar property allows us to use a single
envelope to predict all other envelopes and therefore reconstruct the outline
of a network's BOSAM picture. We analogise the BOSAM envelope to human's
fingerprint as they share a number of common features, e.g. both are simple,
easy to obtain, and strongly characteristic encoding essential information for
identification.Comment: 12papes, 3 figures, in pres
Going the distance for protein function prediction: a new distance metric for protein interaction networks
Due to an error introduced in the production process, the x-axes in the first panels of Figure 1 and Figure 7 are not formatted correctly. The correct Figure 1 can be viewed here: http://dx.doi.org/10.1371/annotation/343bf260-f6ff-48a2-93b2-3cc79af518a9In protein-protein interaction (PPI) networks, functional similarity is often inferred based on the function of directly interacting proteins, or more generally, some notion of interaction network proximity among proteins in a local neighborhood. Prior methods typically measure proximity as the shortest-path distance in the network, but this has only a limited ability to capture fine-grained neighborhood distinctions, because most proteins are close to each other, and there are many ties in proximity. We introduce diffusion state distance (DSD), a new metric based on a graph diffusion property, designed to capture finer-grained distinctions in proximity for transfer of functional annotation in PPI networks. We present a tool that, when input a PPI network, will output the DSD distances between every pair of proteins. We show that replacing the shortest-path metric by DSD improves the performance of classical function prediction methods across the board.MC, HZ, NMD and LJC were supported in part by National Institutes of Health (NIH) R01 grant GM080330. JP was supported in part by NIH grant R01 HD058880. This material is based upon work supported by the National Science Foundation under grant numbers CNS-0905565, CNS-1018266, CNS-1012910, and CNS-1117039, and supported by the Army Research Office under grant W911NF-11-1-0227 (to MEC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
Uncovering the overlapping community structure of complex networks in nature and society
Many complex systems in nature and society can be described in terms of
networks capturing the intricate web of connections among the units they are
made of. A key question is how to interpret the global organization of such
networks as the coexistence of their structural subunits (communities)
associated with more highly interconnected parts. Identifying these a priori
unknown building blocks (such as functionally related proteins, industrial
sectors and groups of people) is crucial to the understanding of the structural
and functional properties of networks. The existing deterministic methods used
for large networks find separated communities, whereas most of the actual
networks are made of highly overlapping cohesive groups of nodes. Here we
introduce an approach to analysing the main statistical features of the
interwoven sets of overlapping communities that makes a step towards uncovering
the modular structure of complex systems. After defining a set of new
characteristic quantities for the statistics of communities, we apply an
efficient technique for exploring overlapping communities on a large scale. We
find that overlaps are significant, and the distributions we introduce reveal
universal features of networks. Our studies of collaboration, word-association
and protein interaction graphs show that the web of communities has non-trivial
correlations and specific scaling properties.Comment: The free academic research software, CFinder, used for the
publication is available at the website of the publication:
http://angel.elte.hu/clusterin
A simple yet effective baseline for non-attributed graph classification
Graphs are complex objects that do not lend themselves easily to typical
learning tasks. Recently, a range of approaches based on graph kernels or graph
neural networks have been developed for graph classification and for
representation learning on graphs in general. As the developed methodologies
become more sophisticated, it is important to understand which components of
the increasingly complex methods are necessary or most effective.
As a first step, we develop a simple yet meaningful graph representation, and
explore its effectiveness in graph classification. We test our baseline
representation for the graph classification task on a range of graph datasets.
Interestingly, this simple representation achieves similar performance as the
state-of-the-art graph kernels and graph neural networks for non-attributed
graph classification. Its performance on classifying attributed graphs is
slightly weaker as it does not incorporate attributes. However, given its
simplicity and efficiency, we believe that it still serves as an effective
baseline for attributed graph classification. Our graph representation is
efficient (linear-time) to compute. We also provide a simple connection with
the graph neural networks.
Note that these observations are only for the task of graph classification
while existing methods are often designed for a broader scope including node
embedding and link prediction. The results are also likely biased due to the
limited amount of benchmark datasets available. Nevertheless, the good
performance of our simple baseline calls for the development of new, more
comprehensive benchmark datasets so as to better evaluate and analyze different
graph learning methods. Furthermore, given the computational efficiency of our
graph summary, we believe that it is a good candidate as a baseline method for
future graph classification (or even other graph learning) studies.Comment: 13 pages. Shorter version appears at 2019 ICLR Workshop:
Representation Learning on Graphs and Manifolds. arXiv admin note: text
overlap with arXiv:1810.00826 by other author
On the Convexity of Latent Social Network Inference
In many real-world scenarios, it is nearly impossible to collect explicit
social network data. In such cases, whole networks must be inferred from
underlying observations. Here, we formulate the problem of inferring latent
social networks based on network diffusion or disease propagation data. We
consider contagions propagating over the edges of an unobserved social network,
where we only observe the times when nodes became infected, but not who
infected them. Given such node infection times, we then identify the optimal
network that best explains the observed data. We present a maximum likelihood
approach based on convex programming with a l1-like penalty term that
encourages sparsity. Experiments on real and synthetic data reveal that our
method near-perfectly recovers the underlying network structure as well as the
parameters of the contagion propagation model. Moreover, our approach scales
well as it can infer optimal networks of thousands of nodes in a matter of
minutes.Comment: NIPS, 201
The BioGRID Interaction Database: 2011 update
The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein
interaction data from model organisms and humans
(http://www.thebiogrid.org). BioGRID currently holds 347 966
interactions (170 162 genetic, 177 804 protein) curated from both
high-throughput data sets and individual focused studies, as derived
from over 23 000 publications in the primary literature. Complete
coverage of the entire literature is maintained for budding yeast
(Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe)
and thale cress (Arabidopsis thaliana), and efforts to expand curation
across multiple metazoan species are underway. The BioGRID houses 48
831 human protein interactions that have been curated from 10 247
publications. Current curation drives are focused on particular areas
of biology to enable insights into conserved networks and pathways that
are relevant to human health. The BioGRID 3.0 web interface contains
new search and display features that enable rapid queries across
multiple data types and sources. An automated Interaction Management
System (IMS) is used to prioritize, coordinate and track curation
across international sites and projects. BioGRID provides interaction
data to several model organism databases, resources such as Entrez-Gene
and other interaction meta-databases. The entire BioGRID 3.0 data
collection may be downloaded in multiple file formats, including PSI MI
XML. Source code for BioGRID 3.0 is freely available without any
restrictions
- âŠ