Search CORE

100 research outputs found

CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads

Author: Ailamaki Anastasia
Dash Debabrata
Polyzotis Neoklis
Publication venue
Publication date: 16/04/2011
Field of study

Index tuning, i.e., selecting the indexes appropriate for a workload, is a crucial problem in database system tuning. In this paper, we solve index tuning for large problem instances that are common in practice, e.g., thousands of queries in the workload, thousands of candidate indexes and several hard and soft constraints. Our work is the first to reveal that the index tuning problem has a well structured space of solutions, and this space can be explored efficiently with well known techniques from linear optimization. Experimental results demonstrate that our approach outperforms state-of-the-art commercial and research techniques by a significant margin (up to an order of magnitude).Comment: VLDB201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

SeeDB: automatically generating query visualizations

Author: Aditya Parameswaran
Manasi Vartak
Mit
Neoklis Polyzotis
Samuel Madden
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2014
Field of study

Data analysts operating on large volumes of data often rely on visualizations to interpret the results of queries. However, finding the right visualization for a query is a laborious and time-consuming task. We demonstrate SeeDB, a system that partially automates this task: given a query, SeeDB explores the space of all possible visualizations, and automatically identifies and recommends to the analyst those visualizations it finds to be most "interesting" or "useful". In our demonstration, conference attendees will see SeeDB in action for a variety of queries on multiple real-world datasets

CiteSeerX

DSpace@MIT

Crossref

Web information management with access control

Author: Abiteboul Serge
Galland Alban
Polyzotis Neoklis
Publication venue: HAL CCSD
Publication date: 12/06/2011
Field of study

International audienceWe investigate the problem of sharing private information on the Web, where the information is hosted on diﬀerent machines that may use diﬀerent access control and distribution schemes. We introduce a distributed knowledge-base model, termed WebdamExchange, that comprises logical statements for specifying data, access control, distribution and knowledge about other peers. The statements can be communicated, replicated, queried, and updated, while keeping track of time and provenance. This uniﬁed base allows applications to reason declaratively about what data is accessible, where it resides, and how to retrieve it securely

INRIA a CCSD electronic archive server

Improving Differentially Private Models with Active Learning

Author: Odena Augustus
Papernot Nicolas
Polyzotis Neoklis
Singh Sameer
Zhao Zhengli
Publication venue
Publication date: 02/10/2019
Field of study

Broad adoption of machine learning techniques has increased privacy concerns for models trained on sensitive data such as medical records. Existing techniques for training differentially private (DP) models give rigorous privacy guarantees, but applying these techniques to neural networks can severely degrade model performance. This performance reduction is an obstacle to deploying private models in the real world. In this work, we improve the performance of DP models by fine-tuning them through active learning on public data. We introduce two new techniques - DIVERSEPUBLIC and NEARPRIVATE - for doing this fine-tuning in a privacy-aware way. For the MNIST and SVHN datasets, these techniques improve state-of-the-art accuracy for DP models while retaining privacy guarantees

arXiv.org e-Print Archive

eScholarship - University of California

Predictable performance and high query concurrency for data analytics

Author: Candea George
Polyzotis Neoklis
Vingralek Radek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/05/2011
Field of study

Conventional data warehouses employ the query- at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention and thrashing, because the physical plans—unaware of each other—compete for access to the underlying I/O and computation resources. As a result, while modern systems can efficiently optimize and evaluate a single complex data analysis query, their performance suffers significantly and can be highly erratic when multiple complex queries run at the same time. We present in this paper Cjoin , a new design that substantially improves throughput in large-scale data analytics systems processing many concurrent join queries. In contrast to the conventional query-at-a-time model, our approach employs a single physical plan that shares I/O, computation, and tuple storage across all in-flight join queries. We use an “always on” pipeline of non-blocking operators, managed by a controller that continuously examines the current query mix and optimizes the pipeline on the fly. Our design enables data analytics engines to scale gracefully to large data sets, provide predictable execution times, and reduce contention. We implemented Cjoin as an extension to the PostgreSQL DBMS. This prototype outperforms conventional commercial systems by an order of magnitude for tens to hundreds of concurrent queries

Infoscience - École polytechnique fédérale de Lausanne

Towards a Workload for Evolutionary Analytics

Author: Hacigumus Hakan
LeFevre Jeff
Polyzotis Neoklis
Sankaranarayanan Jagan
Tatemura Junichi
Publication venue
Publication date: 01/01/2013
Field of study

Emerging data analysis involves the ingestion and exploration of new data sets, application of complex functions, and frequent query revisions based on observing prior query answers. We call this new type of analysis evolutionary analytics and identify its properties. This type of analysis is not well represented by current benchmark workloads. In this paper, we present a workload and identify several metrics to test system support for evolutionary analytics. Along with our metrics, we present methodologies for running the workload that capture this analytical scenario.Comment: 10 page

arXiv.org e-Print Archive

Crossref