Search CORE

308 research outputs found

LiveRank: How to Refresh Old Crawls

Author: C Olston
M Bianchini
P Boldi
P Boldi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/12/2014
Field of study

International audienceThis paper considers the problem of refreshing a crawl. More precisely, given a collection of Web pages (with hyperlinks) gathered at some time, we want to identify a significant fraction of these pages that still exist at present time. The liveness of an old page can be tested through an online query at present time. We call LiveRank a ranking of the old pages so that active nodes are more likely to appear first. The quality of a LiveRank is measured by the number of queries necessary to identify a given fraction of the alive pages when using the LiveRank order. We study different scenarios from a static setting where the LiveRank is computed before any query is made, to dynamic settings where the LiveRank can be updated as queries are processed. Our results show that building on the PageRank can lead to efficient LiveRanks for Web graphs

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Spectral centrality measures in complex networks

Author: B. Bollobás
J. Kleinberg
J. Kleinberg
J. Scott
M. Kendall
Nicola Perra
P. Boldi
S. Wasserman
Santo Fortunato
Publication venue: 'American Physical Society (APS)'
Publication date: 05/09/2008
Field of study

Complex networks are characterized by heterogeneous distributions of the degree of nodes, which produce a large diversification of the roles of the nodes within the network. Several centrality measures have been introduced to rank nodes based on their topological importance within a graph. Here we review and compare centrality measures based on spectral properties of graph matrices. We shall focus on PageRank, eigenvector centrality and the hub/authority scores of HITS. We derive simple relations between the measures and the (in)degree of the nodes, in some limits. We also compare the rankings obtained with different centrality measures.Comment: 11 pages, 10 figures, 5 tables. Final version published in Physical Review

arXiv.org e-Print Archive

Crossref

Greenwich Academic Literature Archive

The Number of Convex Permutominoes

Author: M. Santini
P. Boldi
R. Radicioni
V. Lonati
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Permutominoes are polyominoes defined by suitable pairs of permutations. In this paper we provide a formula to count the number of convex permutominoes of given perimeter. To this aim we define the transform of a generic pair of permutations, we characterize the transform of any pair defining a convex permutomino, and we solve the counting problem in the transformed space

CiteSeerX

Elsevier - Publisher Connector

AIR Universita degli studi di Milano

Ranking and clustering of nodes in networks with smart teleportation

Author: A. Langville
B. Gonçalves
L. Adamic
L. Pretto
M. Rosvall
P. Boldi
R. Baeza-Yates
R. Lambiotte
S. Fortunato
Publication venue: 'American Physical Society (APS)'
Publication date: 08/05/2012
Field of study

Random teleportation is a necessary evil for ranking and clustering directed networks based on random walks. Teleportation enables ergodic solutions, but the solutions must necessarily depend on the exact implementation and parametrization of the teleportation. For example, in the commonly used PageRank algorithm, the teleportation rate must trade off a heavily biased solution with a uniform solution. Here we show that teleportation to links rather than nodes enables a much smoother trade-off and effectively more robust results. We also show that, by not recording the teleportation steps of the random walker, we can further reduce the effect of teleportation with dramatic effects on clustering.Comment: 10 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Repository of the University of Namur

Efficiently Clustering Very Large Attributed Graphs

Author: Akoglu L.
Boldi P.
Combe D.
Deza M.M.
Diestel R.
Duong K.-C.
Protter M. H.
Villa-Vialaneix N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic attributes and to the structure of the graph. However, time and space complexities of state of the art algorithms limit their scalability to medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a fast and scalable algorithm for partitioning large attributed graphs. The approach is robust, being compatible both with categorical and with quantitative attributes, and it is tailorable, allowing the user to weight the semantic and topological components. Further, the approach does not require the user to guess in advance the number of clusters. SToC relies on well known approximation techniques such as bottom-k sketches, traditional graph-theoretic concepts, and a new perspective on the composition of heterogeneous distance measures. Experimental results demonstrate its ability to efficiently compute high-quality partitions of large scale attributed graphs.Comment: This work has been published in ASONAM 2017. This version includes an appendix with validation of our attribute model and distance function, omitted in the converence version for lack of space. Please refer to the published versio

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio della Ricerca - Università di Roma 3

A 5G mobile network architecture to support vertical industries

Author: Banchs Roca Albert
Boldi Mauro
Fuentes Manuel
Gutiérrez-Estévez David M.
Provvedi Silvia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

The telecom industry is moving from a "horizontal" service delivery model, where services are defined independent of their consumers, toward a "vertical" delivery model, where the provided services are tailored to specific industry sectors and verticals. In order to enable this transition, an end-to-end comprehensive 5G architecture is needed, with capabilities to support the use cases of the different vertical industries. A key feature of this architecture is the implementation of network slicing over a single infrastructure to provision highly heterogeneous vertical services, as well as a network slicing management system capable of handling simultaneous slices. On top of the network slicing technology, functionality needs to be devised to deploy the slices required by the different vertical players and provide them with a suitable interface to manage their slice. In this article, we design a 5G mobile network architecture to support vertical industries. The proposed architecture builds on ongoing standardization efforts at 3GPP and ETSI, and incorporates additional modules to provide enhanced MANO and control functionality as well as artificial-intelligence-based data analytics. On top of these modules, a service layer is provided to offer vertical players an easyto- use interface to manage their services.This work was supported by the H2020 5G-TOURS European project (Grant Agreement No. 856950)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Evaluating the impact of topological protein features on the negative examples selection

Author: D. Malchiodi
M. Frasca
P. Boldi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2018
Field of study

Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives

AIR Universita degli studi di Milano

Directory of Open Access Journals

Estimation of urban sensible heat flux using a dense wireless network of observations

Author: Barrenetxea G.
Boldi M.-O
Bou-Zeid E.
Brutsaert W.
Couach O.
Nadeau Daniel
Parlange M.
Selker J.
Vetterli M.
Publication venue
Publication date: 18/06/2018
Field of study

The determination of the sensible heat flux over urban terrain is challenging due to irregular surface geometry and surface types. To address this, in 2006-07, a major field campaign (LUCE) took place at the École Polytechnique Fédérale de Lausanne campus, a moderately occupied urban site. A distributed network of 92 wireless weather stations was combined with routine atmospheric profiling, offering high temporal and spatial resolution meteorological measurements. The objective of this study is to estimate the sensible heat flux over the built environment under convective conditions. Calculations were based on Monin-Obukhov similarity for temperature in the surface layer. The results illustrate a good agreement between the sensible heat flux inferred from the thermal roughness length approach and independent calibrated measurements from a scintillometer located inside the urban canopy. It also shows that using only one well-selected station can provide a good estimate of the sensible heat flux over the campus for convective conditions. Overall, this study illustrates how an extensive network of meteorological measurements can be a useful tool to estimate the sensible heat flux in complex urban environment

RERO DOC Digital Library

Variations of the Itai-Rodeh Algorithm for Computing Anonymous Ring Size

Author: A Itai
E Chang
G Tel
M Andrés
M Timmer
P Boldi
W Fokkink
W Fokkink
WR Franklin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

VU Research Portal

Crossref

Risk-Averse Matchings over Uncertain Graph Databases

Author: A Khan
AE Roth
B Bollobás
D Liben-Nowell
G Kollios
J Edmonds
LG Valiant
M Kargar
M Kearns
M Potamias
N Bansal
N Chen
NJ Krogan
NN Dalvi
P Berman
P Boldi
RM Karp
S Asthana
YH Chan
Publication venue
Publication date: 09/01/2018
Field of study

A large number of applications such as querying sensor networks, and analyzing protein-protein interaction (PPI) networks, rely on mining uncertain graph and hypergraph databases. In this work we study the following problem: given an uncertain, weighted (hyper)graph, how can we efficiently find a (hyper)matching with high expected reward, and low risk? This problem naturally arises in the context of several important applications, such as online dating, kidney exchanges, and team formation. We introduce a novel formulation for finding matchings with maximum expected reward and bounded risk under a general model of uncertain weighted (hyper)graphs that we introduce in this work. Our model generalizes probabilistic models used in prior work, and captures both continuous and discrete probability distributions, thus allowing to handle privacy related applications that inject appropriately distributed noise to (hyper)edge weights. Given that our optimization problem is NP-hard, we turn our attention to designing efficient approximation algorithms. For the case of uncertain weighted graphs, we provide a

\frac{1}{3}

-approximation algorithm, and a

\frac{1}{5}

-approximation algorithm with near optimal run time. For the case of uncertain weighted hypergraphs, we provide a

\Omega(\frac{1}{k})

-approximation algorithm, where

k

is the rank of the hypergraph (i.e., any hyperedge includes at most

k

nodes), that runs in almost (modulo log factors) linear time. We complement our theoretical results by testing our approximation algorithms on a wide variety of synthetic experiments, where we observe in a controlled setting interesting findings on the trade-off between reward, and risk. We also provide an application of our formulation for providing recommendations of teams that are likely to collaborate, and have high impact.Comment: 25 page

arXiv.org e-Print Archive

Crossref