71 research outputs found

    Book Reviews

    Get PDF
    The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or S\mathcal{S}-VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, N\mathcal{N}-VAE, in low dimensions on other data types.Comment: GitHub repository: http://github.com/nicola-decao/s-vae-tf, Blogpost: https://nicola-decao.github.io/s-va

    Learned cardinalities: Estimating correlated joins with deep learning

    Get PDF
    We describe a new deep learning approach to cardinality estimation. MSCN is a multi-set convolutional network, tailored to representing relational query plans, that employs set semantics to capture query features and true cardinalities. MSCN builds on sampling-based estimation, addressing its weaknesses when no sampled tuples qualify a predicate, and in capturing join-crossing correlations. Our evaluation of MSCN using a real-world dataset shows that deep learning signiicantly enhances the quality of cardinality estimation, which is the core problem in query optimization

    Estimating cardinalities with deep sketches

    Get PDF
    We introduce Deep Sketches, which are compact models of databases that allow us to estimate the result sizes of SQL queries. Deep Sketches are powered by a new deep learning approach to cardinality estimation that can capture correlations between columns, even across tables. Our demonstration allows users to define such sketches on the TPC-H and IMDb datasets, monitor the training process, and run ad-hoc queries against trained sketches. We also estimate query cardinalities with HyPer and PostgreSQL to visualize the gains over traditional cardinality estimators

    Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines

    Get PDF
    Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for the compilation of data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes the underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for the fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines so that tuples are never evicted from registers. We evaluate our approach with three query types: (i) a table scan query based on TPC-H Query 1, that performs up to 34% faster when addressing underutilization, (ii) a hashjoin query, where we observe up to 25% higher performance, and (iii) an approximate geospatial join query, which shows performance improvements of up to 30%

    Make the most out of your SIMD investments: Counter control flow divergence in compiled query pipelines

    Get PDF
    Increasing single instruction multiple data (SIMD) capabilities in modern hardware allows for compiling efficient data-parallel query pipelines. This means GPU-alike challenges arise: control flow divergence causes underutilization of vector-processing units. In this paper, we present efficient algorithms for the AVX-512 architecture to address this issue. These algorithms allow for fine-grained assignment of new tuples to idle SIMD lanes. Furthermore, we present strategies for their integration with compiled query pipelines without introducing inefficient memory materializations. We evaluate our approach with a high-performance geospatial join query, which shows performance improvements of up to 35%

    Community detection‐based deep neural network architectures: A fully automated framework based on Likert‐scale data.

    Get PDF
    Deep neural networks (DNNs) have emerged as a state‐of‐the‐art tool in very different research fields due to its adaptive power to the decision space since they do not presuppose any linear relationship between data. Some of the main disadvantages of these trending models are that the choice of the network underlying architecture profoundly influences the performance of the model and that the architecture design requires prior knowledge of the field of study. The use of questionnaires is hugely extended in social/behavioral sciences. The main contribution of this work is to automate the process of a DNN architecture design by using an agglomerative hierarchical algorithm that mimics the conceptual structure of such surveys. Although the train had regression purposes, it is easily convertible to deal with classification tasks. Our proposed methodology will be tested with a database containing socio‐demographic data and the responses to five psychometric Likert scales related to the prediction of happiness. These scales have been already used to design a DNN architecture based on the subdimension of the scales. We show that our new network configurations outperform the previous existing DNN architectures

    Adaptive geospatial joins for modern hardware

    Get PDF
    Geospatial joins are a core building block of connected mobility applications. An especially challenging problem are joins between streaming points and static polygons. Since points are not known beforehand, they cannot be indexed. Nevertheless, points need to be mapped to polygons with low latencies to enable real-time feedback. We present an adaptive geospatial join that uses true hit filtering to avoid expensive geometric computations in most cases. Our technique uses a quadtree-based hierarchical grid to approximate polygons and stores these approximations in a specialized radix tree. We emphasize on an approximate version of our algorithm that guarantees a user-defined precision. The exact version of our algorithm can adapt to the expected point distribution by refining the index. We optimized our implementation for modern hardware architectures with wide SIMD vector processing units, including Intel’s brand new Knights Landing. Overall, our approach can perform up to two orders of magnitude faster than existing techniques
    • …
    corecore