85 research outputs found
Learned cardinalities: Estimating correlated joins with deep learning
We describe a new deep learning approach to cardinality estimation. MSCN is a multi-set convolutional network, tailored to representing relational query plans, that employs set semantics to capture query features and true cardinalities. MSCN builds on sampling-based estimation, addressing its weaknesses when no sampled tuples qualify a predicate, and in capturing join-crossing correlations. Our evaluation of MSCN using a real-world dataset shows that deep learning signiicantly enhances the quality of cardinality estimation, which is the core problem in query optimization
Estimating cardinalities with deep sketches
We introduce Deep Sketches, which are compact models of databases that allow us to estimate the result sizes of SQL queries. Deep Sketches are powered by a new deep learning approach to cardinality estimation that can capture correlations between columns, even across tables. Our demonstration allows users to define such sketches on the TPC-H and IMDb datasets, monitor the training process, and run ad-hoc queries against trained sketches. We also estimate query cardinalities with HyPer and PostgreSQL to visualize the gains over traditional cardinality estimators
Make the most out of your SIMD investments: Counter control flow divergence in compiled query pipelines
Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for compiling efficient data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines without introducing inefficient memory materializations. We evaluate our approach with a high-performance geospatial join query, which shows performance improvements of up to 35%
On-the-fly mobility event detection over aircraft trajectories
We present an application framework that consumes streaming positions from a large fleet of flying aircrafts monitored in real time over a wide geographical area. Tailored for aviation surveillance, this online processing scheme only retains locations conveying salient mobility events along each flight, and annotates them as stop, change of speed, heading or altitude, etc. Such evolving trajectory synopses must keep in pace with the incoming raw streams so as to get incrementally annotated with minimal loss in accuracy. We also develop one-pass heuristics to eliminate inherent noise and provide reliable trajectory representations. Our prototype implementation on top of Apache Flink and Kafka has been tested against various real and synthetic datasets offering concrete evidence of its timeliness, scalability, and compression efficiency, with tolerable concessions to the quality of resulting trajectory approximations. K. Patroumpas, N. Pelekis, and Y. Theodoridis: "On-the-fly Mobility Event Detection over Aircraft Trajectories". In proceeding of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2018), November 6 - 9, 2018 Seattle, Washington, USA
Document type: Conference objec
Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines
Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for the compilation of data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes the underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for the fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines so that tuples are never evicted from registers. We evaluate our approach with three query types: (i) a table scan query based on TPC-H Query 1, that performs up to 34% faster when addressing underutilization, (ii) a hashjoin query, where we observe up to 25% higher performance, and (iii) an approximate geospatial join query, which shows performance improvements of up to 30%
Adaptive geospatial joins for modern hardware
Geospatial joins are a core building block of connected
mobility applications. An especially challenging problem
are joins between streaming points and static polygons. Since
points are not known beforehand, they cannot be indexed.
Nevertheless, points need to be mapped to polygons with low
latencies to enable real-time feedback.
We present an adaptive geospatial join that uses true hit
filtering to avoid expensive geometric computations in most
cases. Our technique uses a quadtree-based hierarchical grid
to approximate polygons and stores these approximations in a
specialized radix tree. We emphasize on an approximate version
of our algorithm that guarantees a user-defined precision. The
exact version of our algorithm can adapt to the expected point
distribution by refining the index. We optimized our implementation
for modern hardware architectures with wide SIMD vector
processing units, including Intel’s brand new Knights Landing.
Overall, our approach can perform up to two orders of magnitude
faster than existing techniques
Approximate geospatial joins with precision guarantees
Geospatial joins are a core building block of con-
nected mobility applications. An especially challenging problem
are joins between streaming points and static polygons. Since
points are not known beforehand, they cannot be indexed.
Nevertheless, points need to be mapped to polygons with low
latencies to enable real-time feedback.
We present an approximate geospatial join that guarantees
a user-defined precision. Our technique uses a quadtree-based
hierarchical grid to approximate polygons and stores these
approximations in a specialized radix tree. Our approach can
perform up to several orders of magnitude faster than existing
techniques while providing sufficiently precise results for many
applications
Poet: Product-oriented Video Captioner for E-commerce
In e-commerce, a growing number of user-generated videos are used for product
promotion. How to generate video descriptions that narrate the user-preferred
product characteristics depicted in the video is vital for successful
promoting. Traditional video captioning methods, which focus on routinely
describing what exists and happens in a video, are not amenable for
product-oriented video captioning. To address this problem, we propose a
product-oriented video captioner framework, abbreviated as Poet. Poet firstly
represents the videos as product-oriented spatial-temporal graphs. Then, based
on the aspects of the video-associated product, we perform knowledge-enhanced
spatial-temporal inference on those graphs for capturing the dynamic change of
fine-grained product-part characteristics. The knowledge leveraging module in
Poet differs from the traditional design by performing knowledge filtering and
dynamic memory modeling. We show that Poet achieves consistent performance
improvement over previous methods concerning generation quality, product
aspects capturing, and lexical diversity. Experiments are performed on two
product-oriented video captioning datasets, buyer-generated fashion video
dataset (BFVD) and fan-generated fashion video dataset (FFVD), collected from
Mobile Taobao. We will release the desensitized datasets to promote further
investigations on both video captioning and general video analysis problems.Comment: 10 pages, 3 figures, to appear in ACM MM 2020 proceeding
- …