155 research outputs found
rTisane: Externalizing conceptual models for data analysis increases engagement with domain knowledge and improves statistical model quality
Statistical models should accurately reflect analysts' domain knowledge about
variables and their relationships. While recent tools let analysts express
these assumptions and use them to produce a resulting statistical model, it
remains unclear what analysts want to express and how externalization impacts
statistical model quality. This paper addresses these gaps. We first conduct an
exploratory study of analysts using a domain-specific language (DSL) to express
conceptual models. We observe a preference for detailing how variables relate
and a desire to allow, and then later resolve, ambiguity in their conceptual
models. We leverage these findings to develop rTisane, a DSL for expressing
conceptual models augmented with an interactive disambiguation process. In a
controlled evaluation, we find that rTisane's DSL helps analysts engage more
deeply with and accurately externalize their assumptions. rTisane also leads to
statistical models that match analysts' assumptions, maintain analysis intent,
and better fit the data
Orion: A system for modeling, transformation and visualization of multidimensional heterogeneous networks
The study of complex activities such as scientific production and software development often require modeling connections among heterogeneous entities including people, institutions and artifacts. Despite numerous advances in algorithms and visualization techniques for understanding such social networks, the process of constructing network models and performing exploratory analysis remains difficult and time-consuming. In this paper we present Orion, a system for interactive modeling, transformation and visualization of network data. Orion’s interface enables the rapid manipulation of large graphs — including the specification of complex linking relationships — using simple drag-and-drop operations with desired node types. Orion maps these user interactions to statements in a declarative workflow language that incorporates both relational operators (e.g., selection, aggregation and joins) and network analytics (e.g., centrality measures). We demonstrate how these features enable analysts to flexibly construct and compare networks in domains such as online health communities, academic collaboration and distributed software development
EVM: Incorporating Model Checking into Exploratory Visual Analysis
Visual analytics (VA) tools support data exploration by helping analysts
quickly and iteratively generate views of data which reveal interesting
patterns. However, these tools seldom enable explicit checks of the resulting
interpretations of data -- e.g., whether patterns can be accounted for by a
model that implies a particular structure in the relationships between
variables. We present EVM, a data exploration tool that enables users to
express and check provisional interpretations of data in the form of
statistical models. EVM integrates support for visualization-based model checks
by rendering distributions of model predictions alongside user-generated views
of data. In a user study with data scientists practicing in the private and
public sector, we evaluate how model checks influence analysts' thinking during
data exploration. Our analysis characterizes how participants use model checks
to scrutinize expectations about data generating process and surfaces further
opportunities to scaffold model exploration in VA tools
ScatterShot: Interactive In-context Example Curation for Text Transformation
The in-context learning capabilities of LLMs like GPT-3 allow annotators to
customize an LLM to their specific tasks with a small number of examples.
However, users tend to include only the most obvious patterns when crafting
examples, resulting in underspecified in-context functions that fall short on
unseen cases. Further, it is hard to know when "enough" examples have been
included even for known patterns. In this work, we present ScatterShot, an
interactive system for building high-quality demonstration sets for in-context
learning. ScatterShot iteratively slices unlabeled data into task-specific
patterns, samples informative inputs from underexplored or not-yet-saturated
slices in an active learning manner, and helps users label more efficiently
with the help of an LLM and the current example set. In simulation studies on
two text perturbation scenarios, ScatterShot sampling improves the resulting
few-shot functions by 4-5 percentage points over random sampling, with less
variance as more examples are added. In a user study, ScatterShot greatly helps
users in covering different patterns in the input space and labeling in-context
examples more efficiently, resulting in better in-context learning and less
user effort.Comment: IUI 2023: 28th International Conference on Intelligent User
Interface
How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study
Data analysis is challenging as analysts must navigate nuanced decisions that
may yield divergent conclusions. AI assistants have the potential to support
analysts in planning their analyses, enabling more robust decision making.
Though AI-based assistants that target code execution (e.g., Github Copilot)
have received significant attention, limited research addresses assistance for
both analysis execution and planning. In this work, we characterize helpful
planning suggestions and their impacts on analysts' workflows. We first review
the analysis planning literature and crowd-sourced analysis studies to
categorize suggestion content. We then conduct a Wizard-of-Oz study (n=13) to
observe analysts' preferences and reactions to planning assistance in a
realistic scenario. Our findings highlight subtleties in contextual factors
that impact suggestion helpfulness, emphasizing design implications for
supporting different abstractions of assistance, forms of initiative, increased
engagement, and alignment of goals between analysts and assistants.Comment: Accepted to CHI 202
- …