2 research outputs found
MISS: Finding Optimal Sample Sizes for Approximate Analytics
Nowadays, sampling-based Approximate Query Processing (AQP) is widely
regarded as a promising way to achieve interactivity in big data analytics. To
build such an AQP system, finding the minimal sample size for a query regarding
given error constraints in general, called Sample Size Optimization (SSO), is
an essential yet unsolved problem. Ideally, the goal of solving the SSO problem
is to achieve statistical accuracy, computational efficiency and broad
applicability all at the same time. Existing approaches either make idealistic
assumptions on the statistical properties of the query, or completely disregard
them. This may result in overemphasizing only one of the three goals while
neglect the others.
To overcome these limitations, we first examine carefully the statistical
properties shared by common analytical queries. Then, based on the properties,
we propose a linear model describing the relationship between sample sizes and
the approximation errors of a query, which is called the error model. Then, we
propose a Model-guided Iterative Sample Selection (MISS) framework to solve the
SSO problem generally. Afterwards, based on the MISS framework, we propose a
concrete algorithm, called Miss, to find optimal sample sizes under the
norm error metric. Moreover, we extend the Miss algorithm to handle
other error metrics. Finally, we show theoretically and empirically that the
Miss algorithm and its extensions achieve satisfactory accuracy and
efficiency for a considerably wide range of analytical queries
Visualization by Example
While visualizations play a crucial role in gaining insights from data,
generating useful visualizations from a complex dataset is far from an easy
task. Besides understanding the functionality provided by existing
visualization libraries, generating the desired visualization also requires
reshaping and aggregating the underlying data as well as composing different
visual elements to achieve the intended visual narrative. This paper aims to
simplify visualization tasks by automatically synthesizing the required program
from simple visual sketches provided by the user. Specifically, given an input
data set and a visual sketch that demonstrates how to visualize a very small
subset of this data, our technique automatically generates a program that can
be used to visualize the entire data set.
Automating visualization poses several challenges. First, because many
visualization tasks require data wrangling in addition to generating plots, we
need to decompose the end-to-end synthesis task into two separate sub-problems.
Second, because the intermediate specification that results from the
decomposition is necessarily imprecise, this makes the data wrangling task
particularly challenging in our context. In this paper, we address these
problems by developing a new compositional visualization-by-example technique
that (a) decomposes the end-to-end task into two different synthesis problems
over different DSLs and (b) leverages bi-directional program analysis to deal
with the complexity that arises from having an imprecise intermediate
specification.
We implemented our visualization-by-example algorithm and evaluate it on 83
visualization tasks collected from on-line forums and tutorials. Viser can
solve 84% of these benchmarks within a 600 second time limit, and, for those
tasks that can be solved, the desired visualization is among the top-5
generated by Viser in 70% of the cases