5,732 research outputs found

    The space complexity of inner product filters

    Get PDF
    Motivated by the problem of filtering candidate pairs in inner product similarity joins we study the following inner product estimation problem: Given parameters dNd\in {\bf N}, α>β0\alpha>\beta\geq 0 and unit vectors x,yRdx,y\in {\bf R}^{d} consider the task of distinguishing between the cases x,yβ\langle x, y\rangle\leq\beta and x,yα\langle x, y\rangle\geq \alpha where x,y=i=1dxiyi\langle x, y\rangle = \sum_{i=1}^d x_i y_i is the inner product of vectors xx and yy. The goal is to distinguish these cases based on information on each vector encoded independently in a bit string of the shortest length possible. In contrast to much work on compressing vectors using randomized dimensionality reduction, we seek to solve the problem deterministically, with no probability of error. Inner product estimation can be solved in general via estimating x,y\langle x, y\rangle with an additive error bounded by ε=αβ\varepsilon = \alpha - \beta. We show that dlog2(1βε)±Θ(d)d \log_2 \left(\tfrac{\sqrt{1-\beta}}{\varepsilon}\right) \pm \Theta(d) bits of information about each vector is necessary and sufficient. Our upper bound is constructive and improves a known upper bound of dlog2(1/ε)+O(d)d \log_2(1/\varepsilon) + O(d) by up to a factor of 2 when β\beta is close to 11. The lower bound holds even in a stronger model where one of the vectors is known exactly, and an arbitrary estimation function is allowed.Comment: To appear at ICDT 202

    Proceedings of the first international VLDB workshop on Management of Uncertain Data

    Get PDF

    Building Wavelet Histograms on Large Data in MapReduce

    Full text link
    MapReduce is becoming the de facto framework for storing and processing massive data, due to its excellent scalability, reliability, and elasticity. In many MapReduce applications, obtaining a compact accurate summary of data is essential. Among various data summarization tools, histograms have proven to be particularly important and useful for summarizing data, and the wavelet histogram is one of the most widely used histograms. In this paper, we investigate the problem of building wavelet histograms efficiently on large datasets in MapReduce. We measure the efficiency of the algorithms by both end-to-end running time and communication cost. We demonstrate straightforward adaptations of existing exact and approximate methods for building wavelet histograms to MapReduce clusters are highly inefficient. To that end, we design new algorithms for computing exact and approximate wavelet histograms and discuss their implementation in MapReduce. We illustrate our techniques in Hadoop, and compare to baseline solutions with extensive experiments performed in a heterogeneous Hadoop cluster of 16 nodes, using large real and synthetic datasets, up to hundreds of gigabytes. The results suggest significant (often orders of magnitude) performance improvement achieved by our new algorithms.Comment: VLDB201

    Superlative Quantifiers as Modifiers of Meta-Speech Acts

    Get PDF
    The superlative quantifiers, at least and at most, are commonly assumed to have the same truth-conditions as the comparative quantifiers more than and fewer than. However, as Geurts & Nouwen (2007) have demonstrated, this is wrong, and several theories have been proposed to account for them. In this paper we propose that superlative quantifiers are illocutionary operators; specifically, they modify meta-speech acts. Meta speech-acts are operators that do not express a speech act, but a willingness to make or refrain from making a certain speech act. The classic example is speech act denegation, e.g. I don\u27t promise to come, where the speaker is explicitly refraining from performing the speech act of promising What denegations do is to delimit the future development of conversation, that is, they delimit future admissible speech acts. Hence we call them meta-speech acts. They are not moves in a game, but rather commitments to behave in certain ways in the future. We formalize the notion of meta speech acts as commitment development spaces, which are rooted graphs: The root of the graph describes the commitment development up to the current point in conversation; the continuations from the root describe the admissible future directions. We define and formalize the meta-speech act GRANT, which indicates that the speaker, while not necessarily subscribing to a proposition, refrains from asserting its negation. We propose that superlative quantifiers are quantifiers over GRANTs. Thus, Mary petted at least three rabbits means that the minimal number n such that the speaker GRANTs that Mary petted n rabbits is n = 3. In other words, the speaker denies that Mary petted two, one, or no rabbits, but GRANTs that she petted more. We formalize this interpretation of superlative quantifiers in terms of commitment development spaces, and show how the truth conditions that are derived from it are partly entailed and partly conversationally implicated. We demonstrates how the theory accounts for a wide variety of phenomena regarding the interpretation of superlative quantifiers, their distribution, and the contexts in which they can be embedded

    The AFIT ENgineer, Volume 5, Issue 2

    Get PDF
    In this issue: Quantum information science (QIS) research at AFIT Engineers Week Returns to AFIT AFIT Joins U.S. Space Command’s Academic Engagement Enterprise Digital Innovation and Integration Center of Excellence (DIICE) FY22 External Sponsor Funding summar

    Engineering Aggregation Operators for Relational In-Memory Database Systems

    Get PDF
    In this thesis we study the design and implementation of Aggregation operators in the context of relational in-memory database systems. In particular, we identify and address the following challenges: cache-efficiency, CPU-friendliness, parallelism within and across processors, robust handling of skewed data, adaptive processing, processing with constrained memory, and integration with modern database architectures. Our resulting algorithm outperforms the state-of-the-art by up to 3.7x
    corecore