860 research outputs found
Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics
Minimal-interval semantics associates with each query over a document a set
of intervals, called witnesses, that are incomparable with respect to inclusion
(i.e., they form an antichain): witnesses define the minimal regions of the
document satisfying the query. Minimal-interval semantics makes it easy to
define and compute several sophisticated proximity operators, provides snippets
for user presentation, and can be used to rank documents. In this paper we
provide algorithms for computing conjunction and disjunction that are linear in
the number of intervals and logarithmic in the number of operands; for
additional operators, such as ordered conjunction and Brouwerian difference, we
provide linear algorithms. In all cases, space is linear in the number of
operands. More importantly, we define a formal notion of optimal laziness, and
either prove it, or prove its impossibility, for each algorithm. We cast our
results in a general framework of antichains of intervals on total orders,
making our algorithms directly applicable to other domains.Comment: 24 pages, 4 figures. A preliminary (now outdated) version was
presented at SPIRE 200
Four Degrees of Separation, Really
We recently measured the average distance of users in the Facebook graph,
spurring comments in the scientific community as well as in the general press
("Four Degrees of Separation"). A number of interesting criticisms have been
made about the meaningfulness, methods and consequences of the experiment we
performed. In this paper we want to discuss some methodological aspects that we
deem important to underline in the form of answers to the questions we have
read in newspapers, magazines, blogs, or heard from colleagues. We indulge in
some reflections on the actual meaning of "average distance" and make a number
of side observations showing that, yes, 3.74 "degrees of separation" are really
few
Entity-Linking via Graph-Distance Minimization
Entity-linking is a natural-language-processing task that consists in
identifying the entities mentioned in a piece of text, linking each to an
appropriate item in some knowledge base; when the knowledge base is Wikipedia,
the problem comes to be known as wikification (in this case, items are
wikipedia articles). One instance of entity-linking can be formalized as an
optimization problem on the underlying concept graph, where the quantity to be
optimized is the average distance between chosen items. Inspired by this
application, we define a new graph problem which is a natural variant of the
Maximum Capacity Representative Set. We prove that our problem is NP-hard for
general graphs; nonetheless, under some restrictive assumptions, it turns out
to be solvable in linear time. For the general case, we propose two heuristics:
one tries to enforce the above assumptions and another one is based on the
notion of hitting distance; we show experimentally how these approaches perform
with respect to some baselines on a real-world dataset.Comment: In Proceedings GRAPHITE 2014, arXiv:1407.7671. The second and third
authors were supported by the EU-FET grant NADINE (GA 288956
Degree-degree correlations in random graphs with heavy-tailed degrees
Mixing patterns in large self-organizing networks, such as the Internet, the
World Wide Web, social and biological networks are often characterized by
degree-degree {dependencies} between neighbouring nodes. One of the problems
with the commonly used Pearson's correlation coefficient (termed as the
assortativity coefficient) is that {in disassortative networks its magnitude
decreases} with the network size. This makes it impossible to compare mixing
patterns, for example, in two web crawls of different size.
We start with a simple model of two heavy-tailed highly correlated random
variable and , and show that the sample correlation coefficient
converges in distribution either to a proper random variable on , or to
zero, and if then the limit is non-negative. We next show that it is
non-negative in the large graph limit when the degree distribution has an
infinite third moment. We consider the alternative degree-degree dependency
measure, based on the Spearman's rho, and prove that it converges to an
appropriate limit under very general conditions. We verify that these
conditions hold in common network models, such as configuration model and
Preferential Attachment model. We conclude that rank correlations provide a
suitable and informative method for uncovering network mixing patterns
A Network Model characterized by a Latent Attribute Structure with Competition
The quest for a model that is able to explain, describe, analyze and simulate
real-world complex networks is of uttermost practical as well as theoretical
interest. In this paper we introduce and study a network model that is based on
a latent attribute structure: each node is characterized by a number of
features and the probability of the existence of an edge between two nodes
depends on the features they share. Features are chosen according to a process
of Indian-Buffet type but with an additional random "fitness" parameter
attached to each node, that determines its ability to transmit its own features
to other nodes. As a consequence, a node's connectivity does not depend on its
age alone, so also "young" nodes are able to compete and succeed in acquiring
links. One of the advantages of our model for the latent bipartite
"node-attribute" network is that it depends on few parameters with a
straightforward interpretation. We provide some theoretical, as well
experimental, results regarding the power-law behaviour of the model and the
estimation of the parameters. By experimental data, we also show how the
proposed model for the attribute structure naturally captures most local and
global properties (e.g., degree distributions, connectivity and distance
distributions) real networks exhibit. keyword: Complex network, social network,
attribute matrix, Indian Buffet processComment: 34 pages, second version (date of the first version: July, 2014).
Submitte
- …