Search CORE

7,616 research outputs found

Multi-query optimization for sketch-based estimation

Author: Alin Dobra
Cormen
Johannes Gehrke
Minos Garofalakis
Rajeev Rastogi
Sellis
Publication venue: 'Elsevier BV'
Publication date
Field of study

Estimating Cardinalities with Deep Sketches

Author: Boncz Peter
Kemper Alfons
Kipf Andreas
Kipf Thomas
Leis Viktor
Müller Jonas
Neumann Thomas
Radke Bernhard
Vorona Dimitri
Publication venue
Publication date: 17/04/2019
Field of study

We introduce Deep Sketches, which are compact models of databases that allow us to estimate the result sizes of SQL queries. Deep Sketches are powered by a new deep learning approach to cardinality estimation that can capture correlations between columns, even across tables. Our demonstration allows users to define such sketches on the TPC-H and IMDb datasets, monitor the training process, and run ad-hoc queries against trained sketches. We also estimate query cardinalities with HyPer and PostgreSQL to visualize the gains over traditional cardinality estimators.Comment: To appear in SIGMOD'1

arXiv.org e-Print Archive

Crossref

Statistical structures for internet-scale data management

Author: Ntarmos N.
Triantafillou P.
Weikum G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability

CiteSeerX

Springer - Publisher Connector

Enlighten

MPG.PuRe

One Table to Count Them All: Parallel Frequency Estimation on Single-Board Computers

Author: G Cormode
G Cormode
G Zipf
Graham Cormode
M Cafaro
M Charikar
M Thorup
Mihai Pǎtraşcu
S Das
S Muthukrishnan
Publication venue
Publication date: 02/03/2019
Field of study

Sketches are probabilistic data structures that can provide approximate results within mathematically proven error bounds while using orders of magnitude less memory than traditional approaches. They are tailored for streaming data analysis on architectures even with limited memory such as single-board computers that are widely exploited for IoT and edge computing. Since these devices offer multiple cores, with efficient parallel sketching schemes, they are able to manage high volumes of data streams. However, since their caches are relatively small, a careful parallelization is required. In this work, we focus on the frequency estimation problem and evaluate the performance of a high-end server, a 4-core Raspberry Pi and an 8-core Odroid. As a sketch, we employed the widely used Count-Min Sketch. To hash the stream in parallel and in a cache-friendly way, we applied a novel tabulation approach and rearranged the auxiliary tables into a single one. To parallelize the process with performance, we modified the workflow and applied a form of buffering between hash computations and sketch updates. Today, many single-board computers have heterogeneous processors in which slow and fast cores are equipped together. To utilize all these cores to their full potential, we proposed a dynamic load-balancing mechanism which significantly increased the performance of frequency estimation.Comment: 12 pages, 4 figures, 3 algorithms, 1 table, submitted to EuroPar'1

arXiv.org e-Print Archive

Crossref

Sabanci University Research Database

Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval

Author: Alameda-Pineda Xavier
Ricci Elisa
Sebe Nicu
Song Jingkuan
Xu Dan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

In this paper we address the problem of learning robust cross-domain representations for sketch-based image retrieval (SBIR). While most SBIR approaches focus on extracting low- and mid-level descriptors for direct feature matching, recent works have shown the benefit of learning coupled feature representations to describe data from two related sources. However, cross-domain representation learning methods are typically cast into non-convex minimization problems that are difficult to optimize, leading to unsatisfactory performance. Inspired by self-paced learning, a learning methodology designed to overcome convergence issues related to local optima by exploiting the samples in a meaningful order (i.e. easy to hard), we introduce the cross-paced partial curriculum learning (CPPCL) framework. Compared with existing self-paced learning methods which only consider a single modality and cannot deal with prior knowledge, CPPCL is specifically designed to assess the learning pace by jointly handling data from dual sources and modality-specific prior information provided in the form of partial curricula. Additionally, thanks to the learned dictionaries, we demonstrate that the proposed CPPCL embeds robust coupled representations for SBIR. Our approach is extensively evaluated on four publicly available datasets (i.e. CUFS, Flickr15K, QueenMary SBIR and TU-Berlin Extension datasets), showing superior performance over competing SBIR methods

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Archivio della ricerca - Fondazione Bruno Kessler

INRIA a CCSD electronic archive server

HAL-Rennes 1