Scientific discovery through weighted sampling

Boncz, P.A. (Peter); Kersten, M.L. (Martin); Sidirourgos, E. (Eleftherios)

Scientific discovery through weighted sampling

Authors: P.A. (Peter) Boncz
M.L. (Martin) Kersten
E. (Eleftherios) Sidirourgos
Publication date: 25 November 2013
Publisher

Abstract

Scientific discovery has shifted from being an exercise of theory and computation, to become the exploration of an ocean of observational data. Scientists explore data originated from modern scientific instruments in order to discover interesting aspects of it and formulate their hypothesis. Such workloads press for new database functionality. We aim at sampling scientific databases to create many different impres- sions of the data, on which the scientists can quickly evaluate exploratory queries. However, scientific databases introduce different challenges for sample construction compared to classical business analytical applications. We propose adaptive weighted sampling as an alternative to uniform sampling. With weighted sampling only the most informative data is being sampled, thus more relevant data to the scientific discovery is available to examine a hypothesis. Relevant data is considered to be the focal points of the scientific search, and can be defined either a priori with the use of functions, or by monitoring the query workload. We study such query workloads, and we detail different families of weight functions. Finally, we give a quantitative and qualitative evaluation of weighted sampling

Similar works

Full text

Available Versions

CWI's Institutional Repository

oai:cwi.nl:25204

Last time updated on 18/04/2020