Search CORE

581,205 research outputs found

Optimal algorithms for selecting top-k combinations of attributes : theory and applications

Author: Lin Chunbin
Lu Jiaheng
Wang Jianguo
Wei Zhewei
Xiao Xiaokui
Publication venue
Publication date: 01/01/2017
Field of study

Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

DR-NTU (Digital Repository of NTU)

Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation

Author: Danil Nemirovsky
Elena Smirnova
Konstantin Avrachenkov
Marina Sokol
Nelly Litvak
Thème Com
Publication venue
Publication date: 01/01/2010
Field of study

We study a problem of quick detection of top-k Personalized PageRank lists. This problem has a number of important applications such as finding local cuts in large graphs, estimation of similarity distance and name disambiguation. In particular, we apply our results to construct efficient algorithms for the person name disambiguation problem. We argue that when finding top-k Personalized PageRank lists two observations are important. Firstly, it is crucial that we detect fast the top-k most important neighbours of a node, while the exact order in the top-k list as well as the exact values of PageRank are by far not so crucial. Secondly, a little number of wrong elements in top-k lists do not really degrade the quality of top-k lists, but it can lead to significant computational saving. Based on these two key observations we propose Monte Carlo methods for fast detection of top-k Personalized PageRank lists. We provide performance evaluation of the proposed methods and supply stopping criteria. Then, we apply the methods to the person name disambiguation problem. The developed algorithm for the person name disambiguation problem has achieved the second place in the WePS 2010 competition

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

University of Twente Research Information

HAL-Rennes 1

Approximating the Largest Root and Applications to Interlacing Families

Author: Anari Nima
Gharan Shayan Oveis
Saberi Amin
Srivastava Nikhil
Publication venue
Publication date: 12/04/2017
Field of study

We study the problem of approximating the largest root of a real-rooted polynomial of degree

n

using its top

k

coefficients and give nearly matching upper and lower bounds. We present algorithms with running time polynomial in

k

that use the top

k

coefficients to approximate the maximum root within a factor of

n^{1/k}

and

1+O(\tfrac{\log n}{k})^2

when

k\leq \log n

and

k>\log n

respectively. We also prove corresponding information-theoretic lower bounds of

n^{\Omega(1/k)}

and

1+\Omega\left(\frac{\log \frac{2n}{k}}{k}\right)^2

, and show strong lower bounds for noisy version of the problem in which one is given access to approximate coefficients. This problem has applications in the context of the method of interlacing families of polynomials, which was used for proving the existence of Ramanujan graphs of all degrees, the solution of the Kadison-Singer problem, and bounding the integrality gap of the asymmetric traveling salesman problem. All of these involve computing the maximum root of certain real-rooted polynomials for which the top few coefficients are accessible in subexponential time. Our results yield an algorithm with the running time of

2^{\tilde O(\sqrt[3]n)}

for all of them

arXiv.org e-Print Archive

Crossref

OPTIMAL CONSTRUCTION OF A LAYER-ORDERED HEAP AND ITS APPLICATIONS

Author: Pennington Jake
Publication venue: University of Montana, Maureen and Mike Mansfield Library
Publication date: 01/01/2021
Field of study

The layer-ordered heap (LOH) is a simple data structure used in algorithms that perform optimal top-

k

X+Y

, algorithms with the best known runtime for top-

k

X_1+X_2+\cdots+X_m

, and the fastest method in practice for computing the most abundant isotopologue peaks in a chemical compound. In the analysis of these algorithms, the rank,

\alpha

, has been treated as a constant and

n

, the size of the array, has been treated as the sole parameter. Here, we explore the algorithmic complexity of LOH construction with

\alpha

as a parameter, introduce a few algorithms for constructing LOHs, analyze their complexity in both

n

and

\alpha

, and demonstrate that one algorithm is optimal in both

n

and

\alpha

for building a LOH of any rank. We then apply this to improve performance in applications where they are employed, find an estimate for the optimal

\alpha

given an

n

and

k

for top-

k

X+Y

, and derive a novel algorithm for top-

k

on a multinomial distribution. Finally, we show that the results of our LOH analysis correspond with empirical experiments of runtimes when applying the LOH construction algorithms to both a common task in machine learning and top-

k

X_1+X_2+\cdots+X_m

and that our estimate of the optimal

\alpha

for top-

k

X+Y

corresponds well with empirical data

University of Montana

Surrogate Functions for Maximizing Precision at the Top

Author: Jain Prateek
Kar Purushottam
Narasimhan Harikrishna
Publication venue
Publication date: 26/05/2015
Field of study

The problem of maximizing precision at the top of a ranked list, often dubbed Precision@k (prec@k), finds relevance in myriad learning applications such as ranking, multi-label classification, and learning with severe label imbalance. However, despite its popularity, there exist significant gaps in our understanding of this problem and its associated performance measure. The most notable of these is the lack of a convex upper bounding surrogate for prec@k. We also lack scalable perceptron and stochastic gradient descent algorithms for optimizing this performance measure. In this paper we make key contributions in these directions. At the heart of our results is a family of truly upper bounding surrogates for prec@k. These surrogates are motivated in a principled manner and enjoy attractive properties such as consistency to prec@k under various natural margin/noise conditions. These surrogates are then used to design a class of novel perceptron algorithms for optimizing prec@k with provable mistake bounds. We also devise scalable stochastic gradient descent style methods for this problem with provable convergence bounds. Our proofs rely on novel uniform convergence bounds which require an in-depth analysis of the structural properties of prec@k and its surrogates. We conclude with experimental results comparing our algorithms with state-of-the-art cutting plane and stochastic gradient algorithms for maximizing [email protected]: To appear in the the proceedings of the 32nd International Conference on Machine Learning (ICML 2015

arXiv.org e-Print Archive

CiteSeerX

Distributed top-k aggregation queries at large

Author: A. Marian
Gerhard Weikum
H. David
I.F. Ilyas
K. Church
K. Schnaitter
Matthias Bender
N. Bruno
Peter Triantafillou
R. Akbarinia
R. Fagin
Ralf Schenkel
S. Chaudhuri
S. Madden
Sebastian Michel
T. Cormen
Thomas Neumann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Springer - Publisher Connector

Enlighten

MPG.PuRe

Quasi-Convex Scoring Functions in Branch-and-Bound Ranked Search

Author: Peter Poensgen
Ralf Möller
Publication venue: RonPub
Publication date: 01/01/2019
Field of study

For answering top-k queries in which attributes are aggregated to a scalar value for defining a ranking, usually the well-known branch-and-bound principle can be used for efficient query answering. Standard algorithms (e.g., Branch-and-Bound Ranked Search, BRS for short) require scoring functions to be monotone, such that a top-k ranking can be computed in sublinear time in the average case. If monotonicity cannot be guaranteed, efficient query answering algorithms are not known. To make branch-and-bound effective with descending or ascending rankings (maximum top-k or minimum top-k queries, respectively), BRS must be able to identify bounds for exploring search partitions, and only for monotonic ranking functions this is trivial. In this paper, we investigate the class of quasi-convex functions used for scoring objects, and we examine how bounds for exploring data partitions can correctly and efficiently be computed for quasi-convex functions in BRS for maximum top-k queries. Given that quasi-convex scoring functions can usefully be employed for ranking objects in a variety of applications, the mathematical findings presented in this paper are indeed significant for practical top-k query answering

RonPub -- Research Online Publishing