57,150 research outputs found
Stream Aggregation Through Order Sampling
This is paper introduces a new single-pass reservoir weighted-sampling stream
aggregation algorithm, Priority-Based Aggregation (PBA). While order sampling
is a powerful and e cient method for weighted sampling from a stream of
uniquely keyed items, there is no current algorithm that realizes the benefits
of order sampling in the context of stream aggregation over non-unique keys. A
naive approach to order sample regardless of key then aggregate the results is
hopelessly inefficient. In distinction, our proposed algorithm uses a single
persistent random variable across the lifetime of each key in the cache, and
maintains unbiased estimates of the key aggregates that can be queried at any
point in the stream. The basic approach can be supplemented with a Sample and
Hold pre-sampling stage with a sampling rate adaptation controlled by PBA. This
approach represents a considerable reduction in computational complexity
compared with the state of the art in adapting Sample and Hold to operate with
a fixed cache size. Concerning statistical properties, we prove that PBA
provides unbiased estimates of the true aggregates. We analyze the
computational complexity of PBA and its variants, and provide a detailed
evaluation of its accuracy on synthetic and trace data. Weighted relative error
is reduced by 40% to 65% at sampling rates of 5% to 17%, relative to Adaptive
Sample and Hold; there is also substantial improvement for rank queriesComment: 10 page
RankMerging: A supervised learning-to-rank framework to predict links in large social network
Uncovering unknown or missing links in social networks is a difficult task
because of their sparsity and because links may represent different types of
relationships, characterized by different structural patterns. In this paper,
we define a simple yet efficient supervised learning-to-rank framework, called
RankMerging, which aims at combining information provided by various
unsupervised rankings. We illustrate our method on three different kinds of
social networks and show that it substantially improves the performances of
unsupervised metrics of ranking. We also compare it to other combination
strategies based on standard methods. Finally, we explore various aspects of
RankMerging, such as feature selection and parameter estimation and discuss its
area of relevance: the prediction of an adjustable number of links on large
networks.Comment: 43 pages, published in Machine Learning Journa
Unsupervised Graph-based Rank Aggregation for Improved Retrieval
This paper presents a robust and comprehensive graph-based rank aggregation
approach, used to combine results of isolated ranker models in retrieval tasks.
The method follows an unsupervised scheme, which is independent of how the
isolated ranks are formulated. Our approach is able to combine arbitrary
models, defined in terms of different ranking criteria, such as those based on
textual, image or hybrid content representations.
We reformulate the ad-hoc retrieval problem as a document retrieval based on
fusion graphs, which we propose as a new unified representation model capable
of merging multiple ranks and expressing inter-relationships of retrieval
results automatically. By doing so, we claim that the retrieval system can
benefit from learning the manifold structure of datasets, thus leading to more
effective results. Another contribution is that our graph-based aggregation
formulation, unlike existing approaches, allows for encapsulating contextual
information encoded from multiple ranks, which can be directly used for
ranking, without further computations and post-processing steps over the
graphs. Based on the graphs, a novel similarity retrieval score is formulated
using an efficient computation of minimum common subgraphs. Finally, another
benefit over existing approaches is the absence of hyperparameters.
A comprehensive experimental evaluation was conducted considering diverse
well-known public datasets, composed of textual, image, and multimodal
documents. Performed experiments demonstrate that our method reaches top
performance, yielding better effectiveness scores than state-of-the-art
baseline methods and promoting large gains over the rankers being fused, thus
demonstrating the successful capability of the proposal in representing queries
based on a unified graph-based model of rank fusions
- …