16 research outputs found
Parachute: Evaluating Interactive Human-LM Co-writing Systems
A surge of advances in language models (LMs) has led to significant interest
in using LMs to build co-writing systems, in which humans and LMs interactively
contribute to a shared writing artifact. However, there is a lack of studies
assessing co-writing systems in interactive settings. We propose a
human-centered evaluation framework, Parachute, for interactive co-writing
systems. Parachute showcases an integrative view of interaction evaluation,
where each evaluation aspect consists of categorized practical metrics.
Furthermore, we present Parachute with a use case to demonstrate how to
evaluate and compare co-writing systems using Parachute.Comment: Accepted by CHI'23 In2Writing Worksho
Is AI the better programming partner? Human-Human Pair Programming vs. Human-AI pAIr Programming
The emergence of large-language models (LLMs) that excel at code generation
and commercial products such as GitHub's Copilot has sparked interest in
human-AI pair programming (referred to as "pAIr programming") where an AI
system collaborates with a human programmer. While traditional pair programming
between humans has been extensively studied, it remains uncertain whether its
findings can be applied to human-AI pair programming. We compare human-human
and human-AI pair programming, exploring their similarities and differences in
interaction, measures, benefits, and challenges. We find that the effectiveness
of both approaches is mixed in the literature (though the measures used for
pAIr programming are not as comprehensive). We summarize moderating factors on
the success of human-human pair programming, which provides opportunities for
pAIr programming research. For example, mismatched expertise makes pair
programming less productive, therefore well-designed AI programming assistants
may adapt to differences in expertise levels.Comment: 8 pages (without references), 2 table
Synergi: A Mixed-Initiative System for Scholarly Synthesis and Sensemaking
Efficiently reviewing scholarly literature and synthesizing prior art are
crucial for scientific progress. Yet, the growing scale of publications and the
burden of knowledge make synthesis of research threads more challenging than
ever. While significant research has been devoted to helping scholars interact
with individual papers, building research threads scattered across multiple
papers remains a challenge. Most top-down synthesis (and LLMs) make it
difficult to personalize and iterate on the output, while bottom-up synthesis
is costly in time and effort. Here, we explore a new design space of
mixed-initiative workflows. In doing so we develop a novel computational
pipeline, Synergi, that ties together user input of relevant seed threads with
citation graphs and LLMs, to expand and structure them, respectively. Synergi
allows scholars to start with an entire threads-and-subthreads structure
generated from papers relevant to their interests, and to iterate and customize
on it as they wish. In our evaluation, we find that Synergi helps scholars
efficiently make sense of relevant threads, broaden their perspectives, and
increases their curiosity. We discuss future design implications for
thread-based, mixed-initiative scholarly synthesis support tools.Comment: ACM UIST'2
ScatterShot: Interactive In-context Example Curation for Text Transformation
The in-context learning capabilities of LLMs like GPT-3 allow annotators to
customize an LLM to their specific tasks with a small number of examples.
However, users tend to include only the most obvious patterns when crafting
examples, resulting in underspecified in-context functions that fall short on
unseen cases. Further, it is hard to know when "enough" examples have been
included even for known patterns. In this work, we present ScatterShot, an
interactive system for building high-quality demonstration sets for in-context
learning. ScatterShot iteratively slices unlabeled data into task-specific
patterns, samples informative inputs from underexplored or not-yet-saturated
slices in an active learning manner, and helps users label more efficiently
with the help of an LLM and the current example set. In simulation studies on
two text perturbation scenarios, ScatterShot sampling improves the resulting
few-shot functions by 4-5 percentage points over random sampling, with less
variance as more examples are added. In a user study, ScatterShot greatly helps
users in covering different patterns in the input space and labeling in-context
examples more efficiently, resulting in better in-context learning and less
user effort.Comment: IUI 2023: 28th International Conference on Intelligent User
Interface
Providing Suggestions of Expanded Text from Abbreviated Text Input
This disclosure describes techniques to provide suggestions of expanded text from abbreviated or compressed text that has been input by a user. A language model is used to determine and present the most likely full words and phrases that match user intent based on the user’s abbreviated text input, such as the first letter of each word of a phrase and/or omission of one or more words of the phrase. The described techniques can greatly improve speed of text entry to devices via a keyboard or other input modality
Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong
Large Language Models (LLMs) are increasingly used for accessing information
on the web. Their truthfulness and factuality are thus of great interest. To
help users make the right decisions about the information they're getting, LLMs
should not only provide but also help users fact-check information. In this
paper, we conduct experiments with 80 crowdworkers in total to compare language
models with search engines (information retrieval systems) at facilitating
fact-checking by human users. We prompt LLMs to validate a given claim and
provide corresponding explanations. Users reading LLM explanations are
significantly more efficient than using search engines with similar accuracy.
However, they tend to over-rely the LLMs when the explanation is wrong. To
reduce over-reliance on LLMs, we ask LLMs to provide contrastive information -
explain both why the claim is true and false, and then we present both sides of
the explanation to users. This contrastive explanation mitigates users'
over-reliance on LLMs, but cannot significantly outperform search engines.
However, showing both search engine results and LLM explanations offers no
complementary benefits as compared to search engines alone. Taken together,
natural language explanations by LLMs may not be a reliable replacement for
reading the retrieved passages yet, especially in high-stakes settings where
over-relying on wrong AI explanations could lead to critical consequences.Comment: preprin
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Many recent advances in natural language generation have been fueled by
training large language models on internet-scale data. However, this paradigm
can lead to models that generate toxic, inaccurate, and unhelpful content, and
automatic evaluation metrics often fail to identify these behaviors. As models
become more capable, human feedback is an invaluable signal for evaluating and
improving models. This survey aims to provide an overview of the recent
research that has leveraged human feedback to improve natural language
generation. First, we introduce an encompassing formalization of feedback, and
identify and organize existing research into a taxonomy following this
formalization. Next, we discuss how feedback can be described by its format and
objective, and cover the two approaches proposed to use feedback (either for
training or decoding): directly using the feedback or training feedback models.
We also discuss existing datasets for human-feedback data collection, and
concerns surrounding feedback collection. Finally, we provide an overview of
the nascent field of AI feedback, which exploits large language models to make
judgments based on a set of principles and minimize the need for human
intervention.Comment: Work in Progres