Search CORE

85 research outputs found

Learned cardinalities: Estimating correlated joins with deep learning

Author: Boncz P.A. (Peter)
Kemper A. (Alfons)
Kipf A. (Andreas)
Kipf T. (Thomas)
Leis V. (Viktor)
Radke B. (Bernhard)
Publication venue
Publication date: 01/01/2019
Field of study

We describe a new deep learning approach to cardinality estimation. MSCN is a multi-set convolutional network, tailored to representing relational query plans, that employs set semantics to capture query features and true cardinalities. MSCN builds on sampling-based estimation, addressing its weaknesses when no sampled tuples qualify a predicate, and in capturing join-crossing correlations. Our evaluation of MSCN using a real-world dataset shows that deep learning signiicantly enhances the quality of cardinality estimation, which is the core problem in query optimization

CWI's Institutional Repository

Learned Cardinalities: Estimating Correlated Joins with Deep Learning

Author: Boncz P.A. (Peter)
Kemper A. (Alfons)
Kipf A. (Andreas)
Kipf T. (Thomas)
Leis V. (Viktor)
Radke B. (Bernhard)
Publication venue
Publication date: 18/12/2018
Field of study

CWI's Institutional Repository

Estimating cardinalities with deep sketches

Author: Boncz P.A. (Peter)
Kemper A. (Alfons)
Kipf A. (Andreas)
Kipf T. (Thomas)
Leis V. (Viktor)
Müller J. (Jonas)
Neumann T. (Thomas)
Radke B. (Bernhard)
Vorona (Dimitri)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/06/2019
Field of study

We introduce Deep Sketches, which are compact models of databases that allow us to estimate the result sizes of SQL queries. Deep Sketches are powered by a new deep learning approach to cardinality estimation that can capture correlations between columns, even across tables. Our demonstration allows users to define such sketches on the TPC-H and IMDb datasets, monitor the training process, and run ad-hoc queries against trained sketches. We also estimate query cardinalities with HyPer and PostgreSQL to visualize the gains over traditional cardinality estimators

CWI's Institutional Repository

Make the most out of your SIMD investments: Counter control flow divergence in compiled query pipelines

Author: Boncz P.A. (Peter)
Kemper A. (Alfons)
Kipf A. (Andreas)
Lang H. (Harald)
Neumann T. (Thomas)
Passing L.K. (Linnea)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/06/2018
Field of study

Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for compiling efficient data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines without introducing inefficient memory materializations. We evaluate our approach with a high-performance geospatial join query, which shows performance improvements of up to 35%

Crossref

CWI's Institutional Repository

Scipedia

On-the-fly mobility event detection over aircraft trajectories

Author: Carbone P.
Hagedorn S.
Kipf A.
Patroumpas K.
Publication venue
Publication date: 05/11/2018
Field of study

We present an application framework that consumes streaming positions from a large fleet of flying aircrafts monitored in real time over a wide geographical area. Tailored for aviation surveillance, this online processing scheme only retains locations conveying salient mobility events along each flight, and annotates them as stop, change of speed, heading or altitude, etc. Such evolving trajectory synopses must keep in pace with the incoming raw streams so as to get incrementally annotated with minimal loss in accuracy. We also develop one-pass heuristics to eliminate inherent noise and provide reliable trajectory representations. Our prototype implementation on top of Apache Flink and Kafka has been tested against various real and synthetic datasets offering concrete evidence of its timeliness, scalability, and compression efficiency, with tolerable concessions to the quality of resulting trajectory approximations. K. Patroumpas, N. Pelekis, and Y. Theodoridis: "On-the-fly Mobility Event Detection over Aircraft Trajectories". In proceeding of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2018), November 6 - 9, 2018 Seattle, Washington, USA Document type: Conference objec

Crossref

Scipedia

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines

Author: Boncz P.A. (Peter)
Kemper A. (Alfons)
Kipf A. (Andreas)
Lang H. (Harald)
Neumann T. (Thomas)
Passing L.K. (Linnea)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/07/2019
Field of study

Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for the compilation of data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes the underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for the fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines so that tuples are never evicted from registers. We evaluate our approach with three query types: (i) a table scan query based on TPC-H Query 1, that performs up to 34% faster when addressing underutilization, (ii) a hashjoin query, where we observe up to 25% higher performance, and (iii) an approximate geospatial join query, which shows performance improvements of up to 30%

CWI's Institutional Repository

Adaptive geospatial joins for modern hardware

Author: Boncz P.A. (Peter)
Kemper A. (Alfons)
Kipf A. (Andreas)
Lang H. (Harald)
Neumann T. (Thomas)
Pandey V.N. (Varun)
Persa R.A. (Raul Alexandru)
Publication venue
Publication date: 26/02/2018
Field of study

Geospatial joins are a core building block of connected mobility applications. An especially challenging problem are joins between streaming points and static polygons. Since points are not known beforehand, they cannot be indexed. Nevertheless, points need to be mapped to polygons with low latencies to enable real-time feedback. We present an adaptive geospatial join that uses true hit filtering to avoid expensive geometric computations in most cases. Our technique uses a quadtree-based hierarchical grid to approximate polygons and stores these approximations in a specialized radix tree. We emphasize on an approximate version of our algorithm that guarantees a user-defined precision. The exact version of our algorithm can adapt to the expected point distribution by refining the index. We optimized our implementation for modern hardware architectures with wide SIMD vector processing units, including Intel’s brand new Knights Landing. Overall, our approach can perform up to two orders of magnitude faster than existing techniques

CWI's Institutional Repository

Approximate geospatial joins with precision guarantees

Author: Boncz P.A. (Peter)
Kemper A. (Alfons)
Kipf A. (Andreas)
Lang H. (Harald)
Neumann T. (Thomas)
Pandey V.N. (Varun)
Persa R.A. (Raul Alexandru)
Publication venue
Publication date: 16/04/2018
Field of study

Geospatial joins are a core building block of con- nected mobility applications. An especially challenging problem are joins between streaming points and static polygons. Since points are not known beforehand, they cannot be indexed. Nevertheless, points need to be mapped to polygons with low latencies to enable real-time feedback. We present an approximate geospatial join that guarantees a user-defined precision. Our technique uses a quadtree-based hierarchical grid to approximate polygons and stores these approximations in a specialized radix tree. Our approach can perform up to several orders of magnitude faster than existing techniques while providing sufficiently precise results for many applications

Crossref

CWI's Institutional Repository

Poet: Product-oriented Video Captioner for E-commerce

Author: Banerjee Satanjeev
Das Pradipto
David
Kipf Thomas N
Kipf Thomas N
Lin Chin-Yew
Liu Jingyuan
Liu Ziwei
Lu Jiasen
Papineni Kishore
Regneri Michaela
Sigurdsson Gunnar A.
Speer Robyn
Wang Bairui
Weston Jason
Whitehead Spencer
Yao Li
Zeng Kuo-Hao
Zhang Junchao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/08/2020
Field of study

In e-commerce, a growing number of user-generated videos are used for product promotion. How to generate video descriptions that narrate the user-preferred product characteristics depicted in the video is vital for successful promoting. Traditional video captioning methods, which focus on routinely describing what exists and happens in a video, are not amenable for product-oriented video captioning. To address this problem, we propose a product-oriented video captioner framework, abbreviated as Poet. Poet firstly represents the videos as product-oriented spatial-temporal graphs. Then, based on the aspects of the video-associated product, we perform knowledge-enhanced spatial-temporal inference on those graphs for capturing the dynamic change of fine-grained product-part characteristics. The knowledge leveraging module in Poet differs from the traditional design by performing knowledge filtering and dynamic memory modeling. We show that Poet achieves consistent performance improvement over previous methods concerning generation quality, product aspects capturing, and lexical diversity. Experiments are performed on two product-oriented video captioning datasets, buyer-generated fashion video dataset (BFVD) and fan-generated fashion video dataset (FFVD), collected from Mobile Taobao. We will release the desensitized datasets to promote further investigations on both video captioning and general video analysis problems.Comment: 10 pages, 3 figures, to appear in ACM MM 2020 proceeding

arXiv.org e-Print Archive

Crossref