78 research outputs found
Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection
Linguistically diverse datasets are critical for training and evaluating
robust machine learning systems, but data collection is a costly process that
often requires experts. Crowdsourcing the process of paraphrase generation is
an effective means of expanding natural language datasets, but there has been
limited analysis of the trade-offs that arise when designing tasks. In this
paper, we present the first systematic study of the key factors in
crowdsourcing paraphrase collection. We consider variations in instructions,
incentives, data domains, and workflows. We manually analyzed paraphrases for
correctness, grammaticality, and linguistic diversity. Our observations provide
new insight into the trade-offs between accuracy and diversity in crowd
responses that arise as a result of task design, providing guidance for future
paraphrase generation procedures.Comment: Published at ACL 201
Scalable and Quality-Aware Training Data Acquisition for Conversational Cognitive Services
Dialog Systems (or simply bots) have recently become a popular human-computer interface for performing user's tasks, by invoking the appropriate back-end APIs (Application Programming Interfaces) based on the user's request in natural language. Building task-oriented bots, which aim at performing real-world tasks (e.g., booking flights), has become feasible with the continuous advances in Natural Language Processing (NLP), Artificial Intelligence (AI), and the countless number of devices which allow third-party software systems to invoke their back-end APIs.
Nonetheless, bot development technologies are still in their preliminary stages, with several unsolved theoretical and technical challenges stemming from the ambiguous nature of human languages. Given the richness of natural language, supervised models require a large number of user utterances paired with their corresponding tasks -- called intents.
To build a bot, developers need to manually translate APIs to utterances (called canonical utterances) and paraphrase them to obtain a diverse set of utterances. Crowdsourcing has been widely used to obtain such datasets,
by paraphrasing the initial utterances generated by the bot developers for each task. However, there are several unsolved issues. First, generating canonical utterances requires manual efforts, making bot development both expensive and hard to scale. Second, since crowd workers may be anonymous and are asked to provide open-ended text (paraphrases), crowdsourced paraphrases may be noisy and incorrect (not conveying the same intent as the given task).
This thesis first surveys the state-of-the-art approaches for collecting large training utterances for task-oriented bots. Next, we conduct an empirical study to identify quality issues of crowdsourced utterances (e.g., grammatical errors, semantic completeness). Moreover, we propose novel approaches for identifying unqualified crowd workers and eliminating malicious workers from crowdsourcing tasks. Particularly, we propose a novel technique to promote the diversity of crowdsourced paraphrases by dynamically generating word suggestions while crowd workers are paraphrasing a particular utterance. Moreover, we propose a novel technique to automatically translate APIs to canonical utterances. Finally, we present our platform to automatically generate bots out of API specifications. We also conduct thorough experiments to validate the proposed techniques and models
Recommended from our members
Towards Democratizing Data Science with Natural Language Interfaces
Data science has the potential to reshape many sectors of the modern society. This potential can be realized to its maximum only when data science becomes democratized, instead of centralized in a small group of expert data scientists. However, with data becoming more massive and heterogeneous, standing in stark contrast to the spreading demand of data science is the growing gap between human users and data: Every type of data requires extensive specialized training, either to learn a specific query language or a data analytics software. Towards the democratization of data science, in this dissertation we systematically investigate a promising research direction, natural language interface, to bridge the gap between users and data, and make it easier for users who are less technically proficient to access the data analytics power needed for on-demand problem solving and decision making.One of the largest obstacles for general users to access data is the proficiency requirement on formal languages (e.g., SQL) that machines use. Automatically parsing natural language commands from users into formal languages, natural language interfaces can thus play a critical role in democratizing data science. However, a pressing question that is largely left unanswered so far is, how to bootstrap a natural language interface for a new domain? The high cost of data collection and the data-hungry nature of the mainstream neural network models are significantly limiting the wide application of natural language interfaces. The main technical contribution of this dissertation is a systematic framework for bootstrapping natural language interfaces for new domains. Specifically, the proposed framework consists of three complimentary methods: (1) Collecting data at a low cost via crowdsourcing, (2) leveraging existing NLI data from other domains via transfer learning, and (3) letting a bootstrapped model to interact with real users so that it can refine itself over time. Combining the three methods forms a closed data loop for bootstrapping and refining natural language interfaces for any domain.The developed methodologies and frameworks in this dissertation hence pave the path for building data science platforms that everyone can use to process, query, and analyze their data without extensive specialized training. With such AI-powered platforms, users can stay focused on high-level thinking and decision making, instead of overwhelmed by low-level implementation and programming details --- ``\emph{Let machines understand human thinking. Don't let humans think like machines}.'
Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models
We introduce VL2NL, a Large Language Model (LLM) framework that generates
rich and diverse NL datasets using only Vega-Lite specifications as input,
thereby streamlining the development of Natural Language Interfaces (NLIs) for
data visualization. To synthesize relevant chart semantics accurately and
enhance syntactic diversity in each NL dataset, we leverage 1) a guided
discovery incorporated into prompting so that LLMs can steer themselves to
create faithful NL datasets in a self-directed manner; 2) a score-based
paraphrasing to augment NL syntax along with four language axes. We also
present a new collection of 1,981 real-world Vega-Lite specifications that have
increased diversity and complexity than existing chart collections. When tested
on our chart collection, VL2NL extracted chart semantics and generated L1/L2
captions with 89.4% and 76.0% accuracy, respectively. It also demonstrated
generating and paraphrasing utterances and questions with greater diversity
compared to the benchmarks. Last, we discuss how our NL datasets and framework
can be utilized in real-world scenarios. The codes and chart collection are
available at https://github.com/hyungkwonko/chart-llm.Comment: 22 pages, 5 figure
LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity
Cross-task generalization is a significant outcome that defines mastery in
natural language understanding. Humans show a remarkable aptitude for this, and
can solve many different types of tasks, given definitions in the form of
textual instructions and a small set of examples. Recent work with pre-trained
language models mimics this learning style: users can define and exemplify a
task for the model to attempt as a series of natural language prompts or
instructions. While prompting approaches have led to higher cross-task
generalization compared to traditional supervised learning, analyzing 'bias' in
the task instructions given to the model is a difficult problem, and has thus
been relatively unexplored. For instance, are we truly modeling a task, or are
we modeling a user's instructions? To help investigate this, we develop LINGO,
a novel visual analytics interface that supports an effective, task-driven
workflow to (1) help identify bias in natural language task instructions, (2)
alter (or create) task instructions to reduce bias, and (3) evaluate
pre-trained model performance on debiased task instructions. To robustly
evaluate LINGO, we conduct a user study with both novice and expert instruction
creators, over a dataset of 1,616 linguistic tasks and their natural language
instructions, spanning 55 different languages. For both user groups, LINGO
promotes the creation of more difficult tasks for pre-trained models, that
contain higher linguistic diversity and lower instruction bias. We additionally
discuss how the insights learned in developing and evaluating LINGO can aid in
the design of future dashboards that aim to minimize the effort involved in
prompt creation across multiple domains.Comment: 13 pages, 6 figures, Eurovis 202
Conversational Question Answering on Heterogeneous Sources
Conversational question answering (ConvQA) tackles sequential informationneeds where contexts in follow-up questions are left implicit. Current ConvQAsystems operate over homogeneous sources of information: either a knowledgebase (KB), or a text corpus, or a collection of tables. This paper addressesthe novel issue of jointly tapping into all of these together, this wayboosting answer coverage and confidence. We present CONVINSE, an end-to-endpipeline for ConvQA over heterogeneous sources, operating in three stages: i)learning an explicit structured representation of an incoming question and itsconversational context, ii) harnessing this frame-like representation touniformly capture relevant evidences from KB, text, and tables, and iii)running a fusion-in-decoder model to generate the answer. We construct andrelease the first benchmark, ConvMix, for ConvQA over heterogeneous sources,comprising 3000 real-user conversations with 16000 questions, along with entityannotations, completed question utterances, and question paraphrases.Experiments demonstrate the viability and advantages of our method, compared tostate-of-the-art baselines.<br
- …