Search CORE

58 research outputs found

EFQ: Why-Not Answer Polynomials in Action

Author: Bidoit Nicole
Herschel Melanie
Tzompanaki Katerina
Publication venue: HAL CCSD
Publication date: 31/08/2015
Field of study

International audienceOne important issue in modern database applications is supporting the user with efficient tools to debug and fix queries because such tasks are both time and skill demanding. One particular problem is known as Why-Not question and focusses on the reasons for missing tuples from query results. The EFQ platform demonstrated here has been designed in this context to efficiently leverage Why-Not Answers polynomials, a novel approach that provides the user with complete explanations to Why-Not questions and allows for automatic, relevant query refinements

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

PigReuse: A Reuse-based Optimizer for Pig Latin

Author: Camacho-Rodríguez Jesús
Colazzo Dario
Herschel Melanie
Manolescu Ioana
Roy Chowdhury Soudip
Publication venue: HAL CCSD
Publication date: 15/08/2016
Field of study

Pig Latin is a popular language which is widely used for parallel processing of massive data sets. Currently, subexpressions occurring repeatedly in Pig Latin scripts are executed as many times as they appear, and the current Pig Latin optimizer does not identify reuse opportunities.We present a novel optimization approach aiming at identifying and reusing repeated subexpressions in Pig Latin scripts. Our optimization algorithm, named PigReuse, operates on a particular algebraic representation of Pig Latin scripts. PigReuse identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and reuses their results as needed in order to compute exactly the same output as the original scripts. Our experiments demonstrate the effectiveness of our approach

INRIA a CCSD electronic archive server

HAL-Polytechnique

Combining Programming-by-Example with Transformation Discovery from large Databases

Author: Abedjan Ziawasch
Esmailoghli Mahdi
Herschel Melanie
Lehner Wolfgang
Sattler Kai-Uwe
Özmen Aslihan
Publication venue: Bonn : Ges. für Informatik
Publication date: 01/01/2021
Field of study

Data transformation discovery is one of the most tedious tasks in data preparation. In particular, the generation of transformation programs for semantic transformations is tricky because additional sources for look-up operations are necessary. Current systems for semantic transformation discovery face two major problems: either they follow a program synthesis approach that only scales to a small set of input tables, or they rely on extraction of transformation functions from large corpora, which requires the identification of exact transformations in those resources and is prone to noisy data. In this paper, we try to combine approaches to benefit from large corpora and the sophistication of program synthesis. To do so, we devise a retrieval and pruning strategy ensemble that extracts the most relevant tables for a given transformation task. The extracted resources can then be processed by a program synthesis engine to generate more accurate transformation results than state-of-the-art

Institutionelles Repositorium der Leibniz Universität Hannover

Silentium! Run-Analyse-Eradicate the Noise out of the DB/OS Stack

Author: Edson R. Lucas F.
Herschel Melanie
Lehner Wolfgang
Lohmann Daniel
Mauerer Wolfgang
Ramsauer Ralf
Sattler Kai-Uwe
Scherzinger Stefanie
Publication venue: Bonn : Ges. für Informatik
Publication date: 01/01/2021
Field of study

When multiple tenants compete for resources, database performance tends to suffer. Yet there are scenarios where guaranteed sub-millisecond latencies are crucial, such as in real-time data processing, IoT devices, or when operating in safety-critical environments. In this paper, we study how to make query latencies deterministic in the face of noise (whether caused by other tenants or unrelated operating system tasks). We perform controlled experiments with an in-memory database engine in a multi-tenant setting, where we successively eradicate noisy interference from within the system software stack, to the point where the engine runs close to bare-metal on the underlying hardware. We show that we can achieve query latencies comparable to the database engine running as the sole tenant, but without noticeably impacting the workload of competing tenants. We discuss these results in the context of ongoing efforts to build custom operating systems for database workloads, and point out that for certain use cases, the margin for improvement is rather narrow. In fact, for scenarios like ours, existing operating systems might just be good enough, provided that they are expertly configured. We then critically discuss these findings in the light of a broader family of database systems (e.g. including disk-based), and how to extend the approach of this paper accordingly

Institutionelles Repositorium der Leibniz Universität Hannover

From Plate to Prevention: A Dietary Nutrient-aided Platform for Health Promotion in Singapore

Author: Chong Jesslyn Hwei Sing
Goh Charlene Enhui
Herschel Melanie
Lee Hee Hoon
Liu Changshuo
Nguyen Thao
Ooi Beng Chin
Wang Wei
Yip James
Zheng Kaiping
Publication venue
Publication date: 28/03/2023
Field of study

Singapore has been striving to improve the provision of healthcare services to her people. In this course, the government has taken note of the deficiency in regulating and supervising people's nutrient intake, which is identified as a contributing factor to the development of chronic diseases. Consequently, this issue has garnered significant attention. In this paper, we share our experience in addressing this issue and attaining medical-grade nutrient intake information to benefit Singaporeans in different aspects. To this end, we develop the FoodSG platform to incubate diverse healthcare-oriented applications as a service in Singapore, taking into account their shared requirements. We further identify the profound meaning of localized food datasets and systematically clean and curate a localized Singaporean food dataset FoodSG-233. To overcome the hurdle in recognition performance brought by Singaporean multifarious food dishes, we propose to integrate supervised contrastive learning into our food recognition model FoodSG-SCL for the intrinsic capability to mine hard positive/negative samples and therefore boost the accuracy. Through a comprehensive evaluation, we present performance results of the proposed model and insights on food-related healthcare applications. The FoodSG-233 dataset has been released in https://foodlg.comp.nus.edu.sg/

arXiv.org e-Print Archive

Rab29 activation of the Parkinson's disease-associated LRRK2 kinase

Author: Adil R Sarhan
Dario R Alessi
Elena Purlyte
Francesca Tonelli
Hatcher JM
Herschel S Dhekne
Lis P
Melanie Wightman
Pawel Lis
Rachel Gomez
Suzanne R Pfeffer
Terina N Martinez
Publication venue: 'EMBO'
Publication date: 06/12/2017
Field of study

Crossref

University of Dundee Online Publications

it - Information technology Special issue on data integration

Author: Herschel Melanie
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/06/2012
Field of study

International audienc

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Entity Resolution

Author: Herschel Melanie
Publication venue: HAL CCSD
Publication date: 01/05/2013
Field of study

International audienc

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Wondering why data are missing from query results? Ask Conseil Why-Not

Author: Herschel Melanie
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceIn analyzing and debugging data transformations, or more specifically relational queries, a subproblem is to understand why some data are not part of the query result. This problem has recently been addressed from different perspectives for various fragments of relational queries. The different perspectives yield different, yet complementary explanations of such missing-answers. This paper first aims at unifying the different approaches by defining a new type of explanation, called hybrid explanation, that encompasses the variety of previously defined types of explanations. This solution goes beyond simply forming the union of explanations produced by different algorithms and is shown to be able to explain a larger set of missing-answers. Second, we present Conseil, an algorithm to generate hybrid explanations. Conseil is also the first algorithm to handle non-monotonic queries. Experiments on efficiency and explanation quality show that Conseil is comparable to and even outperforms previous algorithms

HAL-CentraleSupelec

INRIA a CCSD electronic archive server