27 research outputs found

    Representation Independent Analytics Over Structured Data

    Full text link
    Database analytics algorithms leverage quantifiable structural properties of the data to predict interesting concepts and relationships. The same information, however, can be represented using many different structures and the structural properties observed over particular representations do not necessarily hold for alternative structures. Thus, there is no guarantee that current database analytics algorithms will still provide the correct insights, no matter what structures are chosen to organize the database. Because these algorithms tend to be highly effective over some choices of structure, such as that of the databases used to validate them, but not so effective with others, database analytics has largely remained the province of experts who can find the desired forms for these algorithms. We argue that in order to make database analytics usable, we should use or develop algorithms that are effective over a wide range of choices of structural organizations. We introduce the notion of representation independence, study its fundamental properties for a wide range of data analytics algorithms, and empirically analyze the amount of representation independence of some popular database analytics algorithms. Our results indicate that most algorithms are not generally representation independent and find the characteristics of more representation independent heuristics under certain representational shifts

    How Does User Behavior Evolve During Exploratory Visual Analysis?

    Full text link
    Exploratory visual analysis (EVA) is an essential stage of the data science pipeline, where users often lack clear analysis goals at the start and iteratively refine them as they learn more about their data. Accurate models of users' exploration behavior are becoming increasingly vital to developing responsive and personalized tools for exploratory visual analysis. Yet we observe a discrepancy between the static view of human exploration behavior adopted by many computational models versus the dynamic nature of EVA. In this paper, we explore potential parallels between the evolution of users' interactions with visualization tools during data exploration and assumptions made in popular online learning techniques. Through a series of empirical analyses, we seek to answer the question: how might users' exploration behavior evolve in response to what they have learned from the data during EVA? We present our findings and discuss their implications for the future of user modeling for system design

    Effective Entity Augmentation By Querying External Data Sources

    Get PDF
    Users often want to augment and enrich entities in their datasets with relevant information from external data sources. As many external sources are accessible only via keyword-search interfaces, a user usually has to manually formulate a keyword query that extract relevant information for each entity. This approach is challenging as many data sources contain numerous tuples, only a small fraction of which may contain entity-relevant information. Furthermore, different datasets may represent the same information in distinct forms and under different terms (e.g., different data source may use different names to refer to the same person). In such cases, it is difficult to formulate a query that precisely retrieves information relevant to an entity. Current methods for information enrichment mainly rely on lengthy and resource-intensive manual effort to formulate queries to discover relevant information. However, in increasingly many settings, it is important for users to get initial answers quickly and without substantial investment in resources (such as human attention). We propose a progressive approach to discovering entity-relevant information from external sources with minimal expert intervention. It leverages end users\u27 feedback to progressively learn how to retrieve information relevant to each entity in a dataset from external data sources. Our empirical evaluation shows that our approach learns accurate strategies to deliver relevant information quickly