Search CORE

860 research outputs found

Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics

Author: Boldi Paolo
Vigna Sebastiano
Publication venue
Publication date: 11/08/2016
Field of study

Minimal-interval semantics associates with each query over a document a set of intervals, called witnesses, that are incomparable with respect to inclusion (i.e., they form an antichain): witnesses define the minimal regions of the document satisfying the query. Minimal-interval semantics makes it easy to define and compute several sophisticated proximity operators, provides snippets for user presentation, and can be used to rank documents. In this paper we provide algorithms for computing conjunction and disjunction that are linear in the number of intervals and logarithmic in the number of operands; for additional operators, such as ordered conjunction and Brouwerian difference, we provide linear algorithms. In all cases, space is linear in the number of operands. More importantly, we define a formal notion of optimal laziness, and either prove it, or prove its impossibility, for each algorithm. We cast our results in a general framework of antichains of intervals on total orders, making our algorithms directly applicable to other domains.Comment: 24 pages, 4 figures. A preliminary (now outdated) version was presented at SPIRE 200

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Four Degrees of Separation, Really

Author: Boldi Paolo
Vigna Sebastiano
Publication venue
Publication date: 01/01/2012
Field of study

We recently measured the average distance of users in the Facebook graph, spurring comments in the scientific community as well as in the general press ("Four Degrees of Separation"). A number of interesting criticisms have been made about the meaningfulness, methods and consequences of the experiment we performed. In this paper we want to discuss some methodological aspects that we deem important to underline in the form of answers to the questions we have read in newspapers, magazines, blogs, or heard from colleagues. We indulge in some reflections on the actual meaning of "average distance" and make a number of side observations showing that, yes, 3.74 "degrees of separation" are really few

arXiv.org e-Print Archive

CiteSeerX

AIR Universita degli studi di Milano

Entity-Linking via Graph-Distance Minimization

Author: Blanco Roi
Boldi Paolo
Marino Andrea
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2014
Field of study

Entity-linking is a natural-language-processing task that consists in identifying the entities mentioned in a piece of text, linking each to an appropriate item in some knowledge base; when the knowledge base is Wikipedia, the problem comes to be known as wikification (in this case, items are wikipedia articles). One instance of entity-linking can be formalized as an optimization problem on the underlying concept graph, where the quantity to be optimized is the average distance between chosen items. Inspired by this application, we define a new graph problem which is a natural variant of the Maximum Capacity Representative Set. We prove that our problem is NP-hard for general graphs; nonetheless, under some restrictive assumptions, it turns out to be solvable in linear time. For the general case, we propose two heuristics: one tries to enforce the above assumptions and another one is based on the notion of hitting distance; we show experimentally how these approaches perform with respect to some baselines on a real-world dataset.Comment: In Proceedings GRAPHITE 2014, arXiv:1407.7671. The second and third authors were supported by the EU-FET grant NADINE (GA 288956

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Directory of Open Access Journals

Archivio della Ricerca - Università di Pisa

Open Access Repository

Degree-degree correlations in random graphs with heavy-tailed degrees

Author: B. Bollobás
B. Bollobás
Nelly Litvak
P. Boldi
P. Boldi
P. Constantine
Remco van der Hofstad
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2013
Field of study

Mixing patterns in large self-organizing networks, such as the Internet, the World Wide Web, social and biological networks are often characterized by degree-degree {dependencies} between neighbouring nodes. One of the problems with the commonly used Pearson's correlation coefficient (termed as the assortativity coefficient) is that {in disassortative networks its magnitude decreases} with the network size. This makes it impossible to compare mixing patterns, for example, in two web crawls of different size. We start with a simple model of two heavy-tailed highly correlated random variable

X

and

Y

, and show that the sample correlation coefficient converges in distribution either to a proper random variable on

[-1,1]

, or to zero, and if

X,Y\ge 0

then the limit is non-negative. We next show that it is non-negative in the large graph limit when the degree distribution has an infinite third moment. We consider the alternative degree-degree dependency measure, based on the Spearman's rho, and prove that it converges to an appropriate limit under very general conditions. We verify that these conditions hold in common network models, such as configuration model and Preferential Attachment model. We conclude that rank correlations provide a suitable and informative method for uncovering network mixing patterns

arXiv.org e-Print Archive

Repository TU/e

Crossref

Pure OAI Repository

University of Twente Research Information

A Network Model characterized by a Latent Attribute Structure with Competition

Author: Boldi Paolo
Crimaldi Irene
Monti Corrado
Publication venue
Publication date: 01/07/2014
Field of study

The quest for a model that is able to explain, describe, analyze and simulate real-world complex networks is of uttermost practical as well as theoretical interest. In this paper we introduce and study a network model that is based on a latent attribute structure: each node is characterized by a number of features and the probability of the existence of an edge between two nodes depends on the features they share. Features are chosen according to a process of Indian-Buffet type but with an additional random "fitness" parameter attached to each node, that determines its ability to transmit its own features to other nodes. As a consequence, a node's connectivity does not depend on its age alone, so also "young" nodes are able to compete and succeed in acquiring links. One of the advantages of our model for the latent bipartite "node-attribute" network is that it depends on few parameters with a straightforward interpretation. We provide some theoretical, as well experimental, results regarding the power-law behaviour of the model and the estimation of the parameters. By experimental data, we also show how the proposed model for the attribute structure naturally captures most local and global properties (e.g., degree distributions, connectivity and distance distributions) real networks exhibit. keyword: Complex network, social network, attribute matrix, Indian Buffet processComment: 34 pages, second version (date of the first version: July, 2014). Submitte

arXiv.org e-Print Archive

CiteSeerX

IMT Institutional Repository