Scientific discovery has shifted from being an
exercise of theory and computation, to become the exploration
of an ocean of observational data. Scientists explore data
originated from modern scientific instruments in order to
discover interesting aspects of it and formulate their hypothesis.
Such workloads press for new database functionality. We aim
at
sampling
scientific databases to create many different
impres-
sions
of the data, on which the scientists can quickly evaluate
exploratory queries. However, scientific databases introduce
different challenges for sample construction compared to
classical business analytical applications. We propose
adaptive
weighted sampling
as an alternative to uniform sampling. With
weighted sampling
only the most informative data is being
sampled, thus more relevant data to the scientific discovery is
available to examine a hypothesis. Relevant data is considered
to be the focal points of the scientific search, and can be defined
either a priori with the use of functions, or by monitoring
the query workload. We study such query workloads, and we
detail different families of weight functions. Finally, we give a
quantitative and qualitative evaluation of weighted sampling