17,189 research outputs found
A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation
Many visual representations, such as volume-rendered images and metro maps,
feature a noticeable amount of information loss. At a glance, there seem to be
numerous opportunities for viewers to misinterpret the data being visualized,
hence undermining the benefits of these visual representations. In practice,
there is little doubt that these visual representations are useful. The
recently-proposed information-theoretic measure for analyzing the cost-benefit
ratio of visualization processes can explain such usefulness experienced in
practice, and postulate that the viewers' knowledge can reduce the potential
distortion (e.g., misinterpretation) due to information loss. This suggests
that viewers' knowledge can be estimated by comparing the potential distortion
without any knowledge and the actual distortion with some knowledge. In this
paper, we describe several case studies for collecting instances that can (i)
support the evaluation of several candidate measures for estimating the
potential distortion distortion in visualization, and (ii) demonstrate their
applicability in practical scenarios. Because the theoretical discourse on
choosing an appropriate bounded measure for estimating the potential distortion
is yet conclusive, it is the real world data about visualization further
informs the selection of a bounded measure, providing practical evidence to aid
a theoretical conclusion. Meanwhile, once we can measure the potential
distortion in a bounded manner, we can interpret the numerical values
characterizing the benefit of visualization more intuitively.Comment: Following the SciVis 2020 reviewers' request for more explanation and
clarification, the origianl article, "A Bounded Measure for Estimating the
Benefit of Visualization, arxiv:2002.05282", has been split into two
articles, on "Theoretical Discourse and Conceptual Evaluation" and "Case
Studies and Empirical Evaluation" respectively. This is the second articl
Rapid Sampling for Visualizations with Ordering Guarantees
Visualizations are frequently used as a means to understand trends and gather
insights from datasets, but often take a long time to generate. In this paper,
we focus on the problem of rapidly generating approximate visualizations while
preserving crucial visual proper- ties of interest to analysts. Our primary
focus will be on sampling algorithms that preserve the visual property of
ordering; our techniques will also apply to some other visual properties. For
instance, our algorithms can be used to generate an approximate visualization
of a bar chart very rapidly, where the comparisons between any two bars are
correct. We formally show that our sampling algorithms are generally applicable
and provably optimal in theory, in that they do not take more samples than
necessary to generate the visualizations with ordering guarantees. They also
work well in practice, correctly ordering output groups while taking orders of
magnitude fewer samples and much less time than conventional sampling schemes.Comment: Tech Report. 17 pages. Condensed version to appear in VLDB Vol. 8 No.
Database Learning: Toward a Database that Becomes Smarter Every Time
In today's databases, previous query answers rarely benefit answering future
queries. For the first time, to the best of our knowledge, we change this
paradigm in an approximate query processing (AQP) context. We make the
following observation: the answer to each query reveals some degree of
knowledge about the answer to another query because their answers stem from the
same underlying distribution that has produced the entire dataset. Exploiting
and refining this knowledge should allow us to answer queries more
analytically, rather than by reading enormous amounts of raw data. Also,
processing more queries should continuously enhance our knowledge of the
underlying distribution, and hence lead to increasingly faster response times
for future queries.
We call this novel idea---learning from past query answers---Database
Learning. We exploit the principle of maximum entropy to produce answers, which
are in expectation guaranteed to be more accurate than existing sample-based
approximations. Empowered by this idea, we build a query engine on top of Spark
SQL, called Verdict. We conduct extensive experiments on real-world query
traces from a large customer of a major database vendor. Our results
demonstrate that Verdict supports 73.7% of these queries, speeding them up by
up to 23.0x for the same accuracy level compared to existing AQP systems.Comment: This manuscript is an extended report of the work published in ACM
SIGMOD conference 201
- …