43 research outputs found

    Distributed top-k aggregation queries at large

    Get PDF
    Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

    On the Complexity of Query Result Diversification

    Get PDF
    Query result diversification is a bi-criteria optimization problem for ranking query results. Given a database D, a query Q and a positive integer k, it is to find a set of k tuples from Q(D) such that the tuples are as relevant as possible to the query, and at the same time, as diverse as possible to each other. Subsets of Q(D) are ranked by an objective function defined in terms of relevance and diversity. Query result diversification has found a variety of applications in databases, information retrieval and operations research. This paper studies the complexity of result diversification for relational queries. We identify three problems in connection with query result diversification, to determine whether there exists a set of k tuples that is ranked above a bound with respect to relevance and diversity, to assess the rank of a given k-element set, and to count how many k-element sets are ranked above a given bound. We study these problems for a variety of query languages and for three objective functions. We establish the upper and lower bounds of these problems, all matching, for both combined complexity and data complexity. We also investigate several special settings of these problems, identifying tractable cases. 1

    Top-k String Auto-Completion with Synonyms

    Get PDF
    Auto-completion is one of the most prominent features of modern information systems. The existing solutions of auto-completion provide the suggestions based on the beginning of the currently input character sequence (i.e. prefix). However, in many real applications, one entity often has synonyms or abbreviations. For example, "DBMS" is an abbreviation of "Database Management Systems". In this paper, we study a novel type of auto-completion by using synonyms and abbreviations. We propose three trie-based algorithms to solve the top-k auto-completion with synonyms; each one with different space and time complexity trade-offs. Experiments on large-scale datasets show that it is possible to support effective and efficient synonym-based retrieval of completions of a million strings with thousands of synonyms rules at about a microsecond per-completion, while taking small space overhead (i.e. 160-200 bytes per string).Peer reviewe

    Usos do conceito de eventos privados à luz de proposições pragmatistas Uses of the concept of private events from the standpoint of pragmatist assumptions

    Get PDF
    O conceito de eventos privados tem sido apontado na literatura da Análise do Comportamento como central para a abordagem de fenômenos relativos à subjetividade, no contexto de uma adesão à instrumentalidade como critério de verdade. Este trabalho discute os usos do conceito de eventos privados a partir de questões levantadas pelo pragmatismo, filosofia com a qual aquele critério de verdade tem sido consistentemente identificado. É examinado em particular o enfoque relacional verbal na análise de conceitos relativos à privacidade, e como esse enfoque se reflete em uma rejeição do mentalismo e do organicismo. O trabalho segue discutindo a importância da comunidade verbal na produção do "mundo privado" individual. Por fim, ressalta-se que alguns autores afastam-se de uma referência funcional/instrumental ao elaborarem o problema da imprecisão de auto-descrições de eventos privados.<br>The concept of private events has been used in behavior-analytic literature as a major topic to the treatment phenomena related to subjectivity, in the context of adopting instrumentality as a criterion of truth. This paper aims to discuss uses of the concept of private events from the standpoint of Pragmatism, a philosophy in which that criterion of truth has been consistently identified. It is particularly examined the focus on verbal relations in the analysis of contents related to privacy, and also how that focus reflects a rejection on mentalism and organicism. The paper continues to discuss the importance of the verbal community in the production of an individual "inside world". In the end, the paper points out that some authors depart from functionalist/instrumentalist reference when working on the problem of imprecision of private events
    corecore