21 research outputs found
Supporting Annotators with Affordances for Efficiently Labeling Conversational Data
Without well-labeled ground truth data, machine learning-based systems would
not be as ubiquitous as they are today, but these systems rely on substantial
amounts of correctly labeled data. Unfortunately, crowdsourced labeling is time
consuming and expensive. To address the concerns of effort and tedium, we
designed CAL, a novel interface to aid in data labeling. We made several key
design decisions for CAL, which include preventing inapt labels from being
selected, guiding users in selecting an appropriate label when they need
assistance, incorporating labeling documentation into the interface, and
providing an efficient means to view previous labels. We implemented a
production-quality implementation of CAL and report a user-study evaluation
that compares CAL to a standard spreadsheet. Key findings of our study include
users using CAL reported lower cognitive load, did not increase task time,
users rated CAL to be easier to use, and users preferred CAL over the
spreadsheet
Conversational Challenges in AI-Powered Data Science: Obstacles, Needs, and Design Opportunities
Large Language Models (LLMs) are being increasingly employed in data science
for tasks like data preprocessing and analytics. However, data scientists
encounter substantial obstacles when conversing with LLM-powered chatbots and
acting on their suggestions and answers. We conducted a mixed-methods study,
including contextual observations, semi-structured interviews (n=14), and a
survey (n=114), to identify these challenges. Our findings highlight key issues
faced by data scientists, including contextual data retrieval, formulating
prompts for complex tasks, adapting generated code to local environments, and
refining prompts iteratively. Based on these insights, we propose actionable
design recommendations, such as data brushing to support context selection, and
inquisitive feedback loops to improve communications with AI-based assistants
in data-science tools.Comment: 24 pages, 8 figure
CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs
Timely, personalized feedback is essential for students learning programming.
LLM-powered tools like ChatGPT offer instant support, but reveal direct answers
with code, which may hinder deep conceptual engagement. We developed CodeAid,
an LLM-powered programming assistant delivering helpful, technically correct
responses, without revealing code solutions. CodeAid answers conceptual
questions, generates pseudo-code with line-by-line explanations, and annotates
student's incorrect code with fix suggestions. We deployed CodeAid in a
programming class of 700 students for a 12-week semester. A thematic analysis
of 8,000 usages of CodeAid was performed, further enriched by weekly surveys,
and 22 student interviews. We then interviewed eight programming educators to
gain further insights. Our findings reveal four design considerations for
future educational AI assistants: D1) exploiting AI's unique benefits; D2)
simplifying query formulation while promoting cognitive engagement; D3)
avoiding direct responses while encouraging motivated learning; and D4)
maintaining transparency and control for students to asses and steer AI
responses.Comment: CHI 2024 Paper - The paper includes 17 pages, 8 figures, 2 tables,
along with a 2-page appendi
Semantically Aligned Question and Code Generation for Automated Insight Generation
Automated insight generation is a common tactic for helping knowledge
workers, such as data scientists, to quickly understand the potential value of
new and unfamiliar data. Unfortunately, automated insights produced by
large-language models can generate code that does not correctly correspond (or
align) to the insight. In this paper, we leverage the semantic knowledge of
large language models to generate targeted and insightful questions about data
and the corresponding code to answer those questions. Then through an empirical
study on data from Open-WikiTable, we show that embeddings can be effectively
used for filtering out semantically unaligned pairs of question and code.
Additionally, we found that generating questions and code together yields more
diverse questions
To Fix or to Learn? How Production Bias Affects Developers’ Information Foraging during Debugging
Developers performing maintenance activities must balance their efforts to learn the code vs. their efforts to actually change it. This balancing act is consistent with the “production bias” that, according to Carroll’s minimalist learning theory, generally affects software users during everyday tasks. This suggests that developers’ focus on efficiency should have marked effects on how they forage for the information they think they need to fix bugs. To investigate how developers balance fixing versus learning during debugging, we conducted the first empirical investigation of the interplay between production bias and information foraging. Our theory-based study involved 11 participants: half tasked with fixing a bug, and half tasked with learning enough to help someone else fix it. Despite the subtlety of difference between their tasks, participants foraged remarkably differently—making foraging decisions from different types of “patches,” with different types of information, and succeeding with different foraging tactics
Interface Fluctuations on a Hierarchical Lattice
We consider interface fluctuations on a two-dimensional layered lattice where
the couplings follow a hierarchical sequence. This problem is equivalent to the
diffusion process of a quantum particle in the presence of a one-dimensional
hierarchical potential. According to a modified Harris criterion this type of
perturbation is relevant and one expects anomalous fluctuating behavior. By
transfer-matrix techniques and by an exact renormalization group transformation
we have obtained analytical results for the interface fluctuation exponents,
which are discontinuous at the homogeneous lattice limit.Comment: 14 pages plain Tex, one Figure upon request, Phys Rev E (in print
A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits
Context: Tangled commits are changes to software that address multiple
concerns at once. For researchers interested in bugs, tangled commits mean that
they actually study not only bugs, but also other concerns irrelevant for the
study of bugs.
Objective: We want to improve our understanding of the prevalence of tangling
and the types of changes that are tangled within bug fixing commits.
Methods: We use a crowd sourcing approach for manual labeling to validate
which changes contribute to bug fixes for each line in bug fixing commits. Each
line is labeled by four participants. If at least three participants agree on
the same label, we have consensus.
Results: We estimate that between 17% and 32% of all changes in bug fixing
commits modify the source code to fix the underlying problem. However, when we
only consider changes to the production code files this ratio increases to 66%
to 87%. We find that about 11% of lines are hard to label leading to active
disagreements between participants. Due to confirmed tangling and the
uncertainty in our data, we estimate that 3% to 47% of data is noisy without
manual untangling, depending on the use case.
Conclusion: Tangled commits have a high prevalence in bug fixes and can lead
to a large amount of noise in the data. Prior research indicates that this
noise may alter results. As researchers, we should be skeptics and assume that
unvalidated data is likely very noisy, until proven otherwise.Comment: Status: Accepted at Empirical Software Engineerin
Yestercode: Improving code-change support in visual dataflow programming environments
In this paper, we present the Yestercode tool for supporting code changes in visual dataflow programming environments. In a formative investigation of LabVIEW programmers, we found that making code changes posed a significant challenge. To address this issue, we designed Yestercode to enable the efficient recording, retrieval, and juxtaposition of visual dataflow code while making code changes. To evaluate Yestercode, we implemented our design as a prototype extension to the LabVIEW programming environment, and ran a user study involving 14 professional LabVIEW programmers that compared Yestercode-extended LabVIEW to the standard LabVIEW IDE. Our results showed that Yestercode users introduced fewer bugs during tasks, completed tasks in about the same time, and experienced lower cognitive loads on tasks. Moreover, participants generally reported that Yestercode was easy to use and that it helped in making change tasks easier
The patchworks code editor
Increasingly, people are faced with navigating large information spaces, and making such navigation efficient is of paramount concern. In this paper, we focus on the problems programmers face in navigating large code bases, and propose a novel code editor, Patchworks, that addresses the problems. In particular, Patchworks leverages two new interface idioms-the patch grid and the ribbon-to help programmers navigate more quickly, make fewer navigation errors, and spend less time arranging their code. To validate Patchworks, we conducted a user study that compared Patchworks to two existing code editors: The traditional file-based editor, Eclipse, and the newer canvas-based editor, Code Bubbles. Our results showed (1) that programmers using Patchworks were able to navigate significantly faster than with Eclipse (and comparably with Code Bubbles), (2) that programmers using Patchworks made significantly fewer navigation errors than with Code Bubbles or Eclipse, and (3) that programmers using Patchworks spent significantly less time arranging their code than with Code Bubbles (and comparably with Eclipse). Copyright © 2014 ACM