Search CORE

35,755 research outputs found

Discriminative Link Prediction using Local Links, Node Features and Community Structure

Author: Chakrabarti Soumen
De Abir
Ganguly Niloy
Publication venue
Publication date: 17/10/2013
Field of study

A link prediction (LP) algorithm is given a graph, and has to rank, for each node, other nodes that are candidates for new linkage. LP is strongly motivated by social search and recommendation applications. LP techniques often focus on global properties (graph conductance, hitting or commute times, Katz score) or local properties (Adamic-Adar and many variations, or node feature vectors), but rarely combine these signals. Furthermore, neither of these extremes exploit link densities at the intermediate level of communities. In this paper we describe a discriminative LP algorithm that exploits two new signals. First, a co-clustering algorithm provides community level link density estimates, which are used to qualify observed links with a surprise value. Second, links in the immediate neighborhood of the link to be predicted are not interpreted at face value, but through a local model of node feature similarities. These signals are combined into a discriminative link predictor. We evaluate the new predictor using five diverse data sets that are standard in the literature. We report on significant accuracy boosts compared to standard LP methods (including Adamic-Adar and random walk). Apart from the new predictor, another contribution is a rigorous protocol for benchmarking and reporting LP algorithms, which reveals the regions of strengths and weaknesses of all the predictors studied here, and establishes the new proposal as the most robust.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Efficient computation of the Weighted Clustering Coefficient

Author: Lattanzi Silvio
Leonardi Stefano
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2016
Field of study

The clustering coefficient of an unweighted network has been extensively used to quantify how tightly connected is the neighbor around a node and it has been widely adopted for assessing the quality of nodes in a social network. The computation of the clustering coefficient is challenging since it requires to count the number of triangles in the graph. Several recent works proposed efficient sampling, streaming and MapReduce algorithms that allow to overcome this computational bottleneck. As a matter of fact, the intensity of the interaction between nodes, that is usually represented with weights on the edges of the graph, is also an important measure of the statistical cohesiveness of a network. Recently various notions of weighted clustering coefficient have been proposed but all those techniques are hard to implement on large-scale graphs. In this work we show how standard sampling techniques can be used to obtain efficient estimators for the most commonly used measures of weighted clustering coefficient. Furthermore we also propose a novel graph-theoretic notion of clustering coefficient in weighted networks. © 2016, Copyright © Taylor & Francis Group, LL

Archivio della ricerca- Università di Roma La Sapienza

A nonuniform popularity-similarity optimization (nPSO) model to efficiently generate realistic complex networks with communities

Author: Cannistraci Carlo Vittorio
Muscoloni Alessandro
Publication venue: 'IOP Publishing'
Publication date: 23/07/2017
Field of study

The hidden metric space behind complex network topologies is a fervid topic in current network science and the hyperbolic space is one of the most studied, because it seems associated to the structural organization of many real complex systems. The Popularity-Similarity-Optimization (PSO) model simulates how random geometric graphs grow in the hyperbolic space, reproducing strong clustering and scale-free degree distribution, however it misses to reproduce an important feature of real complex networks, which is the community organization. The Geometrical-Preferential-Attachment (GPA) model was recently developed to confer to the PSO also a community structure, which is obtained by forcing different angular regions of the hyperbolic disk to have variable level of attractiveness. However, the number and size of the communities cannot be explicitly controlled in the GPA, which is a clear limitation for real applications. Here, we introduce the nonuniform PSO (nPSO) model that, differently from GPA, forces heterogeneous angular node attractiveness by sampling the angular coordinates from a tailored nonuniform probability distribution, for instance a mixture of Gaussians. The nPSO differs from GPA in other three aspects: it allows to explicitly fix the number and size of communities; it allows to tune their mixing property through the network temperature; it is efficient to generate networks with high clustering. After several tests we propose the nPSO as a valid and efficient model to generate networks with communities in the hyperbolic space, which can be adopted as a realistic benchmark for different tasks such as community detection and link prediction

arXiv.org e-Print Archive

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Protein docking refinement by convex underestimation in the low-dimensional subspace of encounter complexes

Author: Kozakov Dmytro
Li Keyong
Moghadasi Mohammad
Nan Feng
Paschalidis Ioannis Ch.
Roshandelpoor Athar
Vajda Sandor
Vakili Pirooz
Zarbafian Shahrooz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/04/2018
Field of study

We propose a novel stochastic global optimization algorithm with applications to the refinement stage of protein docking prediction methods. Our approach can process conformations sampled from multiple clusters, each roughly corresponding to a different binding energy funnel. These clusters are obtained using a density-based clustering method. In each cluster, we identify a smooth “permissive” subspace which avoids high-energy barriers and then underestimate the binding energy function using general convex polynomials in this subspace. We use the underestimator to bias sampling towards its global minimum. Sampling and subspace underestimation are repeated several times and the conformations sampled at the last iteration form a refined ensemble. We report computational results on a comprehensive benchmark of 224 protein complexes, establishing that our refined ensemble significantly improves the quality of the conformations of the original set given to the algorithm. We also devise a method to enhance the ensemble from which near-native models are selected.Published versio

Boston University Institutional Repository (OpenBU)

Directory of Open Access Journals

Quantifying the consistency of scientific databases

Author: Bajec Marko
Boshkoska Biljana Mileva
Kastrin Andrej
Levnajić Zoran
Šubelj Lovro
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/05/2015
Field of study

Science is a social process with far-reaching impact on our modern society. In the recent years, for the first time we are able to scientifically study the science itself. This is enabled by massive amounts of data on scientific publications that is increasingly becoming available. The data is contained in several databases such as Web of Science or PubMed, maintained by various public and private entities. Unfortunately, these databases are not always consistent, which considerably hinders this study. Relying on the powerful framework of complex networks, we conduct a systematic analysis of the consistency among six major scientific databases. We found that identifying a single "best" database is far from easy. Nevertheless, our results indicate appreciable differences in mutual consistency of different databases, which we interpret as recipes for future bibliometric studies.Comment: 20 pages, 5 figures, 4 table

arXiv.org e-Print Archive

Directory of Open Access Journals

Repository of the University of Ljubljana

Degree Ranking Using Local Information

Author: Gera Ralucca
Iyengar S. R. S.
Saxena Akrati
Publication venue
Publication date: 10/06/2017
Field of study

Most real world dynamic networks are evolved very fast with time. It is not feasible to collect the entire network at any given time to study its characteristics. This creates the need to propose local algorithms to study various properties of the network. In the present work, we estimate degree rank of a node without having the entire network. The proposed methods are based on the power law degree distribution characteristic or sampling techniques. The proposed methods are simulated on synthetic networks, as well as on real world social networks. The efficiency of the proposed methods is evaluated using absolute and weighted error functions. Results show that the degree rank of a node can be estimated with high accuracy using only

1\%

samples of the network size. The accuracy of the estimation decreases from high ranked to low ranked nodes. We further extend the proposed methods for random networks and validate their efficiency on synthetic random networks, that are generated using Erd\H{o}s-R\'{e}nyi model. Results show that the proposed methods can be efficiently used for random networks as well

arXiv.org e-Print Archive

Calhoun, Institutional Archive of the Naval Postgraduate School