4,839 research outputs found
When is a Network a Network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks
We introduce a framework for the modeling of sequential data capturing
pathways of varying lengths observed in a network. Such data are important,
e.g., when studying click streams in information networks, travel patterns in
transportation systems, information cascades in social networks, biological
pathways or time-stamped social interactions. While it is common to apply graph
analytics and network analysis to such data, recent works have shown that
temporal correlations can invalidate the results of such methods. This raises a
fundamental question: when is a network abstraction of sequential data
justified? Addressing this open question, we propose a framework which combines
Markov chains of multiple, higher orders into a multi-layer graphical model
that captures temporal correlations in pathways at multiple length scales
simultaneously. We develop a model selection technique to infer the optimal
number of layers of such a model and show that it outperforms previously used
Markov order detection techniques. An application to eight real-world data sets
on pathways and temporal networks shows that it allows to infer graphical
models which capture both topological and temporal characteristics of such
data. Our work highlights fallacies of network abstractions and provides a
principled answer to the open question when they are justified. Generalizing
network representations to multi-order graphical models, it opens perspectives
for new data mining and knowledge discovery algorithms.Comment: 10 pages, 4 figures, 1 table, companion python package pathpy
available on gitHu
Clustering and Community Detection in Directed Networks: A Survey
Networks (or graphs) appear as dominant structures in diverse domains,
including sociology, biology, neuroscience and computer science. In most of the
aforementioned cases graphs are directed - in the sense that there is
directionality on the edges, making the semantics of the edges non symmetric.
An interesting feature that real networks present is the clustering or
community structure property, under which the graph topology is organized into
modules commonly called communities or clusters. The essence here is that nodes
of the same community are highly similar while on the contrary, nodes across
communities present low similarity. Revealing the underlying community
structure of directed complex networks has become a crucial and
interdisciplinary topic with a plethora of applications. Therefore, naturally
there is a recent wealth of research production in the area of mining directed
graphs - with clustering being the primary method and tool for community
detection and evaluation. The goal of this paper is to offer an in-depth review
of the methods presented so far for clustering directed networks along with the
relevant necessary methodological background and also related applications. The
survey commences by offering a concise review of the fundamental concepts and
methodological base on which graph clustering algorithms capitalize on. Then we
present the relevant work along two orthogonal classifications. The first one
is mostly concerned with the methodological principles of the clustering
algorithms, while the second one approaches the methods from the viewpoint
regarding the properties of a good cluster in a directed network. Further, we
present methods and metrics for evaluating graph clustering results,
demonstrate interesting application domains and provide promising future
research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction
Large-scale protein signalling networks are useful for exploring complex biochemical pathways but do not reveal how pathways respond to specific stimuli. Such specificity is critical for understanding disease and designing drugs. Here we describe a computational approach—implemented in the free CNO software—for turning signalling networks into logical models and calibrating the models against experimental data. When a literature-derived network of 82 proteins covering the immediate-early responses of human cells to seven cytokines was modelled, we found that training against experimental data dramatically increased predictive power, despite the crudeness of Boolean approximations, while significantly reducing the number of interactions. Thus, many interactions in literature-derived networks do not appear to be functional in the liver cells from which we collected our data. At the same time, CNO identified several new interactions that improved the match of model to data. Although missing from the starting network, these interactions have literature support. Our approach, therefore, represents a means to generate predictive, cell-type-specific models of mammalian signalling from generic protein signalling networks
Algorithms to Explore the Structure and Evolution of Biological Networks
High-throughput experimental protocols have revealed thousands of relationships amongst genes and proteins under various conditions. These putative associations are being aggressively mined to decipher the structural and functional architecture of the cell. One useful tool for exploring this data has been computational network analysis. In this thesis, we propose a collection of novel algorithms to explore the structure and evolution of large, noisy, and sparsely annotated biological networks.
We first introduce two information-theoretic algorithms to extract interesting patterns and modules embedded in large graphs. The first, graph summarization, uses the minimum description length principle to find compressible parts of the graph. The second, VI-Cut, uses the variation of information to non-parametrically find groups of topologically cohesive and similarly annotated nodes in the network. We show that both algorithms find structure in biological data that is consistent with known biological processes, protein complexes, genetic diseases, and operational taxonomic units. We also propose several algorithms to systematically generate an ensemble of near-optimal network clusterings and show how these multiple views can be used together to identify clustering dynamics that any single solution approach would miss.
To facilitate the study of ancient networks, we introduce a framework called ``network archaeology'') for reconstructing the node-by-node and edge-by-edge arrival history of a network. Starting with a present-day network, we apply a probabilistic growth model backwards in time to find high-likelihood previous states of the graph. This allows us to explore how interactions and modules may have evolved over time. In experiments with real-world social and biological networks, we find that our algorithms can recover significant features of ancestral networks that have long since disappeared.
Our work is motivated by the need to understand large and complex biological systems that are being revealed to us by imperfect data. As data continues to pour in, we believe that computational network analysis will continue to be an essential tool towards this end
Learning structure and schemas from heterogeneous domains in networked systems: a survey
The rapidly growing amount of available digital documents of various formats and the possibility to access these through internet-based technologies in distributed environments, have led to the necessity to develop solid methods to properly organize and structure documents in large digital libraries and repositories. Specifically, the extremely large size of document collections make it impossible to manually organize such documents. Additionally, most of the document sexist in an unstructured form and do not follow any schemas. Therefore, research efforts in this direction are being dedicated to automatically infer structure and schemas. This is essential in order to better organize huge collections as well as to effectively and efficiently retrieve documents in heterogeneous domains in networked system. This paper presents a survey of the state-of-the-art methods for inferring structure from documents and schemas in networked environments. The survey is organized around the most important application domains, namely, bio-informatics, sensor networks, social networks, P2Psystems, automation and control, transportation and privacy preserving for which we analyze the recent developments on dealing with unstructured data in such domains.Peer ReviewedPostprint (published version
Recommended from our members
Network modularity and local environment similarity as descriptors of protein structure
As the number of solved protein structures increases, the opportunities for meta-analysis of this dataset increase too. Here we explore two approaches for analysing protein structure, both starting from the three-dimensional co-ordinates of each atom within the structure, which are then abstracted into a more useful form.
The first method transforms the protein into a network in which its amino acids are the nodes, and where the edges are generated using a simple proximity test. By applying the Infomap community detection algorithm, we can fragment the protein into highly intra-connected subregions - these subregions are compact and globular, and can be compared with known structural and functional subunits of the protein (also known as domains). By performing this fragmentation process systematically across a large set of proteins, and checking for structurally conserved fragments, we can search for novel candidate domains. This method for automatically decomposing a protein into compact substructures may also be useful in coarse-graining molecular dynamics, analysing the protein’s topology, in de novo protein design, or in fitting electron density maps derived from single particle electron microscopy.
The second method calculates a descriptor for each atom of the protein based on its local environment, known as a Smooth Overlap of Atomic Positions (SOAP) descriptor. Using these descriptors we can perform overall comparisons of the subregions identified above. In addition, by comparing the descriptors of a set of proteins known to share common structural or functional features (such as binding of a particular ligand), we can automatically identify the most highly conserved atoms of the set. These atoms may line ligand binding pockets or correspond to allosteric sites, which could inform drug design
Main phase transition in lipid bilayers: phase coexistence and line tension in a soft, solvent-free, coarse-grained model
We devise a soft, solvent-free, coarse-grained model for lipid bilayer
membranes. The non-bonded interactions take the form of a weighted-density
functional which allows us to describe the thermodynamics of self-assembly and
packing effects of the coarse-grained beads in terms of a density expansion of
the equation of state and the weighting functions that regularize the
microscopic bead densities, respectively. Identifying the length and energy
scales via the bilayer thickness and the thermal energy scale, kT, the model
qualitatively reproduces key characteristics (e.g., bending rigidity, area per
lipid molecules, and compressibility) of lipid membranes. We employ this model
to study the main phase transition between the liquid and the gel phase of the
bilayer membrane. We accurately locate the phase coexistence using free energy
calculations and also obtain estimates for the bare and the thermodynamic line
tension.Comment: 21 pages, 12 figures. Submitted to J. Chem. Phy
- …