Search CORE

151 research outputs found

Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics

Author: Boldi Paolo
Vigna Sebastiano
Publication venue
Publication date: 11/08/2016
Field of study

Minimal-interval semantics associates with each query over a document a set of intervals, called witnesses, that are incomparable with respect to inclusion (i.e., they form an antichain): witnesses define the minimal regions of the document satisfying the query. Minimal-interval semantics makes it easy to define and compute several sophisticated proximity operators, provides snippets for user presentation, and can be used to rank documents. In this paper we provide algorithms for computing conjunction and disjunction that are linear in the number of intervals and logarithmic in the number of operands; for additional operators, such as ordered conjunction and Brouwerian difference, we provide linear algorithms. In all cases, space is linear in the number of operands. More importantly, we define a formal notion of optimal laziness, and either prove it, or prove its impossibility, for each algorithm. We cast our results in a general framework of antichains of intervals on total orders, making our algorithms directly applicable to other domains.Comment: 24 pages, 4 figures. A preliminary (now outdated) version was presented at SPIRE 200

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Four Degrees of Separation, Really

Author: Boldi Paolo
Vigna Sebastiano
Publication venue
Publication date: 01/01/2012
Field of study

We recently measured the average distance of users in the Facebook graph, spurring comments in the scientific community as well as in the general press ("Four Degrees of Separation"). A number of interesting criticisms have been made about the meaningfulness, methods and consequences of the experiment we performed. In this paper we want to discuss some methodological aspects that we deem important to underline in the form of answers to the questions we have read in newspapers, magazines, blogs, or heard from colleagues. We indulge in some reflections on the actual meaning of "average distance" and make a number of side observations showing that, yes, 3.74 "degrees of separation" are really few

arXiv.org e-Print Archive

CiteSeerX

AIR Universita degli studi di Milano

Entity-Linking via Graph-Distance Minimization

Author: Blanco Roi
Boldi Paolo
Marino Andrea
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2014
Field of study

Entity-linking is a natural-language-processing task that consists in identifying the entities mentioned in a piece of text, linking each to an appropriate item in some knowledge base; when the knowledge base is Wikipedia, the problem comes to be known as wikification (in this case, items are wikipedia articles). One instance of entity-linking can be formalized as an optimization problem on the underlying concept graph, where the quantity to be optimized is the average distance between chosen items. Inspired by this application, we define a new graph problem which is a natural variant of the Maximum Capacity Representative Set. We prove that our problem is NP-hard for general graphs; nonetheless, under some restrictive assumptions, it turns out to be solvable in linear time. For the general case, we propose two heuristics: one tries to enforce the above assumptions and another one is based on the notion of hitting distance; we show experimentally how these approaches perform with respect to some baselines on a real-world dataset.Comment: In Proceedings GRAPHITE 2014, arXiv:1407.7671. The second and third authors were supported by the EU-FET grant NADINE (GA 288956

arXiv.org e-Print Archive

Crossref

AIR Universita degli studi di Milano

Directory of Open Access Journals

Archivio della Ricerca - Università di Pisa

Open Access Repository

A Network Model characterized by a Latent Attribute Structure with Competition

Author: Boldi Paolo
Crimaldi Irene
Monti Corrado
Publication venue
Publication date: 01/07/2014
Field of study

The quest for a model that is able to explain, describe, analyze and simulate real-world complex networks is of uttermost practical as well as theoretical interest. In this paper we introduce and study a network model that is based on a latent attribute structure: each node is characterized by a number of features and the probability of the existence of an edge between two nodes depends on the features they share. Features are chosen according to a process of Indian-Buffet type but with an additional random "fitness" parameter attached to each node, that determines its ability to transmit its own features to other nodes. As a consequence, a node's connectivity does not depend on its age alone, so also "young" nodes are able to compete and succeed in acquiring links. One of the advantages of our model for the latent bipartite "node-attribute" network is that it depends on few parameters with a straightforward interpretation. We provide some theoretical, as well experimental, results regarding the power-law behaviour of the model and the estimation of the parameters. By experimental data, we also show how the proposed model for the attribute structure naturally captures most local and global properties (e.g., degree distributions, connectivity and distance distributions) real networks exhibit. keyword: Complex network, social network, attribute matrix, Indian Buffet processComment: 34 pages, second version (date of the first version: July, 2014). Submitte

arXiv.org e-Print Archive

CiteSeerX

IMT Institutional Repository

HyperANF: Approximating the Neighbourhood Function of Very Large Graphs on a Budget

Author: Boldi Paolo
Rosa Marco
Vigna Sebastiano
Publication venue
Publication date: 01/01/2011
Field of study

The neighbourhood function N(t) of a graph G gives, for each t, the number of pairs of nodes such that y is reachable from x in less that t hops. The neighbourhood function provides a wealth of information about the graph (e.g., it easily allows one to compute its diameter), but it is very expensive to compute it exactly. Recently, the ANF algorithm (approximate neighbourhood function) has been proposed with the purpose of approximating NG(t) on large graphs. We describe a breakthrough improvement over ANF in terms of speed and scalability. Our algorithm, called HyperANF, uses the new HyperLogLog counters and combines them efficiently through broadword programming; our implementation uses overdecomposition to exploit multi-core parallelism. With HyperANF, for the first time we can compute in a few hours the neighbourhood function of graphs with billions of nodes with a small error and good confidence using a standard workstation. Then, we turn to the study of the distribution of the shortest paths between reachable nodes (that can be efficiently approximated by means of HyperANF), and discover the surprising fact that its index of dispersion provides a clear-cut characterisation of proper social networks vs. web graphs. We thus propose the spid (Shortest-Paths Index of Dispersion) of a graph as a new, informative statistics that is able to discriminate between the above two types of graphs. We believe this is the first proposal of a significant new non-local structural index for complex networks whose computation is highly scalable

arXiv.org e-Print Archive

CiteSeerX

AIR Universita degli studi di Milano