85 research outputs found

    Learned cardinalities: Estimating correlated joins with deep learning

    Get PDF
    We describe a new deep learning approach to cardinality estimation. MSCN is a multi-set convolutional network, tailored to representing relational query plans, that employs set semantics to capture query features and true cardinalities. MSCN builds on sampling-based estimation, addressing its weaknesses when no sampled tuples qualify a predicate, and in capturing join-crossing correlations. Our evaluation of MSCN using a real-world dataset shows that deep learning signiicantly enhances the quality of cardinality estimation, which is the core problem in query optimization

    Estimating cardinalities with deep sketches

    Get PDF
    We introduce Deep Sketches, which are compact models of databases that allow us to estimate the result sizes of SQL queries. Deep Sketches are powered by a new deep learning approach to cardinality estimation that can capture correlations between columns, even across tables. Our demonstration allows users to define such sketches on the TPC-H and IMDb datasets, monitor the training process, and run ad-hoc queries against trained sketches. We also estimate query cardinalities with HyPer and PostgreSQL to visualize the gains over traditional cardinality estimators

    Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines

    Get PDF
    Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for the compilation of data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes the underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for the fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines so that tuples are never evicted from registers. We evaluate our approach with three query types: (i) a table scan query based on TPC-H Query 1, that performs up to 34% faster when addressing underutilization, (ii) a hashjoin query, where we observe up to 25% higher performance, and (iii) an approximate geospatial join query, which shows performance improvements of up to 30%

    Make the most out of your SIMD investments: Counter control flow divergence in compiled query pipelines

    Get PDF
    Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for compiling efficient data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines without introducing inefficient memory materializations. We evaluate our approach with a high-performance geospatial join query, which shows performance improvements of up to 35%

    On-the-fly mobility event detection over aircraft trajectories

    Get PDF
    We present an application framework that consumes streaming positions from a large fleet of flying aircrafts monitored in real time over a wide geographical area. Tailored for aviation surveillance, this online processing scheme only retains locations conveying salient mobility events along each flight, and annotates them as stop, change of speed, heading or altitude, etc. Such evolving trajectory synopses must keep in pace with the incoming raw streams so as to get incrementally annotated with minimal loss in accuracy. We also develop one-pass heuristics to eliminate inherent noise and provide reliable trajectory representations. Our prototype implementation on top of Apache Flink and Kafka has been tested against various real and synthetic datasets offering concrete evidence of its timeliness, scalability, and compression efficiency, with tolerable concessions to the quality of resulting trajectory approximations. K. Patroumpas, N. Pelekis, and Y. Theodoridis: "On-the-fly Mobility Event Detection over Aircraft Trajectories". In proceeding of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2018), November 6 - 9, 2018 Seattle, Washington, USA Document type: Conference objec

    Adaptive geospatial joins for modern hardware

    Get PDF
    Geospatial joins are a core building block of connected mobility applications. An especially challenging problem are joins between streaming points and static polygons. Since points are not known beforehand, they cannot be indexed. Nevertheless, points need to be mapped to polygons with low latencies to enable real-time feedback. We present an adaptive geospatial join that uses true hit filtering to avoid expensive geometric computations in most cases. Our technique uses a quadtree-based hierarchical grid to approximate polygons and stores these approximations in a specialized radix tree. We emphasize on an approximate version of our algorithm that guarantees a user-defined precision. The exact version of our algorithm can adapt to the expected point distribution by refining the index. We optimized our implementation for modern hardware architectures with wide SIMD vector processing units, including Intel’s brand new Knights Landing. Overall, our approach can perform up to two orders of magnitude faster than existing techniques

    Approximate geospatial joins with precision guarantees

    Get PDF
    Geospatial joins are a core building block of con- nected mobility applications. An especially challenging problem are joins between streaming points and static polygons. Since points are not known beforehand, they cannot be indexed. Nevertheless, points need to be mapped to polygons with low latencies to enable real-time feedback. We present an approximate geospatial join that guarantees a user-defined precision. Our technique uses a quadtree-based hierarchical grid to approximate polygons and stores these approximations in a specialized radix tree. Our approach can perform up to several orders of magnitude faster than existing techniques while providing sufficiently precise results for many applications

    Poet: Product-oriented Video Captioner for E-commerce

    Full text link
    In e-commerce, a growing number of user-generated videos are used for product promotion. How to generate video descriptions that narrate the user-preferred product characteristics depicted in the video is vital for successful promoting. Traditional video captioning methods, which focus on routinely describing what exists and happens in a video, are not amenable for product-oriented video captioning. To address this problem, we propose a product-oriented video captioner framework, abbreviated as Poet. Poet firstly represents the videos as product-oriented spatial-temporal graphs. Then, based on the aspects of the video-associated product, we perform knowledge-enhanced spatial-temporal inference on those graphs for capturing the dynamic change of fine-grained product-part characteristics. The knowledge leveraging module in Poet differs from the traditional design by performing knowledge filtering and dynamic memory modeling. We show that Poet achieves consistent performance improvement over previous methods concerning generation quality, product aspects capturing, and lexical diversity. Experiments are performed on two product-oriented video captioning datasets, buyer-generated fashion video dataset (BFVD) and fan-generated fashion video dataset (FFVD), collected from Mobile Taobao. We will release the desensitized datasets to promote further investigations on both video captioning and general video analysis problems.Comment: 10 pages, 3 figures, to appear in ACM MM 2020 proceeding
    • …
    corecore