27 research outputs found
Representation Independent Analytics Over Structured Data
Database analytics algorithms leverage quantifiable structural properties of
the data to predict interesting concepts and relationships. The same
information, however, can be represented using many different structures and
the structural properties observed over particular representations do not
necessarily hold for alternative structures. Thus, there is no guarantee that
current database analytics algorithms will still provide the correct insights,
no matter what structures are chosen to organize the database. Because these
algorithms tend to be highly effective over some choices of structure, such as
that of the databases used to validate them, but not so effective with others,
database analytics has largely remained the province of experts who can find
the desired forms for these algorithms. We argue that in order to make database
analytics usable, we should use or develop algorithms that are effective over a
wide range of choices of structural organizations. We introduce the notion of
representation independence, study its fundamental properties for a wide range
of data analytics algorithms, and empirically analyze the amount of
representation independence of some popular database analytics algorithms. Our
results indicate that most algorithms are not generally representation
independent and find the characteristics of more representation independent
heuristics under certain representational shifts
How Does User Behavior Evolve During Exploratory Visual Analysis?
Exploratory visual analysis (EVA) is an essential stage of the data science
pipeline, where users often lack clear analysis goals at the start and
iteratively refine them as they learn more about their data. Accurate models of
users' exploration behavior are becoming increasingly vital to developing
responsive and personalized tools for exploratory visual analysis. Yet we
observe a discrepancy between the static view of human exploration behavior
adopted by many computational models versus the dynamic nature of EVA. In this
paper, we explore potential parallels between the evolution of users'
interactions with visualization tools during data exploration and assumptions
made in popular online learning techniques. Through a series of empirical
analyses, we seek to answer the question: how might users' exploration behavior
evolve in response to what they have learned from the data during EVA? We
present our findings and discuss their implications for the future of user
modeling for system design
Effective Entity Augmentation By Querying External Data Sources
Users often want to augment and enrich entities in their datasets with relevant information from external data sources. As many external sources are accessible only via keyword-search interfaces, a user usually has to manually formulate a keyword query that extract relevant information for each entity. This approach is challenging as many data sources contain numerous tuples, only a small fraction of which may contain entity-relevant information. Furthermore, different datasets may represent the same information in distinct forms and under different terms (e.g., different data source may use different names to refer to the same person). In such cases, it is difficult to formulate a query that precisely retrieves information relevant to an entity. Current methods for information enrichment mainly rely on lengthy and resource-intensive manual effort to formulate queries to discover relevant information. However, in increasingly many settings, it is important for users to get initial answers quickly and without substantial investment in resources (such as human attention). We propose a progressive approach to discovering entity-relevant information from external sources with minimal expert intervention. It leverages end users\u27 feedback to progressively learn how to retrieve information relevant to each entity in a dataset from external data sources. Our empirical evaluation shows that our approach learns accurate strategies to deliver relevant information quickly