17,189 research outputs found

    A Bounded Measure for Estimating the Benefit of Visualization: Case Studies and Empirical Evaluation

    Full text link
    Many visual representations, such as volume-rendered images and metro maps, feature a noticeable amount of information loss. At a glance, there seem to be numerous opportunities for viewers to misinterpret the data being visualized, hence undermining the benefits of these visual representations. In practice, there is little doubt that these visual representations are useful. The recently-proposed information-theoretic measure for analyzing the cost-benefit ratio of visualization processes can explain such usefulness experienced in practice, and postulate that the viewers' knowledge can reduce the potential distortion (e.g., misinterpretation) due to information loss. This suggests that viewers' knowledge can be estimated by comparing the potential distortion without any knowledge and the actual distortion with some knowledge. In this paper, we describe several case studies for collecting instances that can (i) support the evaluation of several candidate measures for estimating the potential distortion distortion in visualization, and (ii) demonstrate their applicability in practical scenarios. Because the theoretical discourse on choosing an appropriate bounded measure for estimating the potential distortion is yet conclusive, it is the real world data about visualization further informs the selection of a bounded measure, providing practical evidence to aid a theoretical conclusion. Meanwhile, once we can measure the potential distortion in a bounded manner, we can interpret the numerical values characterizing the benefit of visualization more intuitively.Comment: Following the SciVis 2020 reviewers' request for more explanation and clarification, the origianl article, "A Bounded Measure for Estimating the Benefit of Visualization, arxiv:2002.05282", has been split into two articles, on "Theoretical Discourse and Conceptual Evaluation" and "Case Studies and Empirical Evaluation" respectively. This is the second articl

    Rapid Sampling for Visualizations with Ordering Guarantees

    Get PDF
    Visualizations are frequently used as a means to understand trends and gather insights from datasets, but often take a long time to generate. In this paper, we focus on the problem of rapidly generating approximate visualizations while preserving crucial visual proper- ties of interest to analysts. Our primary focus will be on sampling algorithms that preserve the visual property of ordering; our techniques will also apply to some other visual properties. For instance, our algorithms can be used to generate an approximate visualization of a bar chart very rapidly, where the comparisons between any two bars are correct. We formally show that our sampling algorithms are generally applicable and provably optimal in theory, in that they do not take more samples than necessary to generate the visualizations with ordering guarantees. They also work well in practice, correctly ordering output groups while taking orders of magnitude fewer samples and much less time than conventional sampling schemes.Comment: Tech Report. 17 pages. Condensed version to appear in VLDB Vol. 8 No.

    Database Learning: Toward a Database that Becomes Smarter Every Time

    Full text link
    In today's databases, previous query answers rarely benefit answering future queries. For the first time, to the best of our knowledge, we change this paradigm in an approximate query processing (AQP) context. We make the following observation: the answer to each query reveals some degree of knowledge about the answer to another query because their answers stem from the same underlying distribution that has produced the entire dataset. Exploiting and refining this knowledge should allow us to answer queries more analytically, rather than by reading enormous amounts of raw data. Also, processing more queries should continuously enhance our knowledge of the underlying distribution, and hence lead to increasingly faster response times for future queries. We call this novel idea---learning from past query answers---Database Learning. We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. Empowered by this idea, we build a query engine on top of Spark SQL, called Verdict. We conduct extensive experiments on real-world query traces from a large customer of a major database vendor. Our results demonstrate that Verdict supports 73.7% of these queries, speeding them up by up to 23.0x for the same accuracy level compared to existing AQP systems.Comment: This manuscript is an extended report of the work published in ACM SIGMOD conference 201
    • …
    corecore