78 research outputs found

    Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection

    Full text link
    Linguistically diverse datasets are critical for training and evaluating robust machine learning systems, but data collection is a costly process that often requires experts. Crowdsourcing the process of paraphrase generation is an effective means of expanding natural language datasets, but there has been limited analysis of the trade-offs that arise when designing tasks. In this paper, we present the first systematic study of the key factors in crowdsourcing paraphrase collection. We consider variations in instructions, incentives, data domains, and workflows. We manually analyzed paraphrases for correctness, grammaticality, and linguistic diversity. Our observations provide new insight into the trade-offs between accuracy and diversity in crowd responses that arise as a result of task design, providing guidance for future paraphrase generation procedures.Comment: Published at ACL 201

    Scalable and Quality-Aware Training Data Acquisition for Conversational Cognitive Services

    Full text link
    Dialog Systems (or simply bots) have recently become a popular human-computer interface for performing user's tasks, by invoking the appropriate back-end APIs (Application Programming Interfaces) based on the user's request in natural language. Building task-oriented bots, which aim at performing real-world tasks (e.g., booking flights), has become feasible with the continuous advances in Natural Language Processing (NLP), Artificial Intelligence (AI), and the countless number of devices which allow third-party software systems to invoke their back-end APIs. Nonetheless, bot development technologies are still in their preliminary stages, with several unsolved theoretical and technical challenges stemming from the ambiguous nature of human languages. Given the richness of natural language, supervised models require a large number of user utterances paired with their corresponding tasks -- called intents. To build a bot, developers need to manually translate APIs to utterances (called canonical utterances) and paraphrase them to obtain a diverse set of utterances. Crowdsourcing has been widely used to obtain such datasets, by paraphrasing the initial utterances generated by the bot developers for each task. However, there are several unsolved issues. First, generating canonical utterances requires manual efforts, making bot development both expensive and hard to scale. Second, since crowd workers may be anonymous and are asked to provide open-ended text (paraphrases), crowdsourced paraphrases may be noisy and incorrect (not conveying the same intent as the given task). This thesis first surveys the state-of-the-art approaches for collecting large training utterances for task-oriented bots. Next, we conduct an empirical study to identify quality issues of crowdsourced utterances (e.g., grammatical errors, semantic completeness). Moreover, we propose novel approaches for identifying unqualified crowd workers and eliminating malicious workers from crowdsourcing tasks. Particularly, we propose a novel technique to promote the diversity of crowdsourced paraphrases by dynamically generating word suggestions while crowd workers are paraphrasing a particular utterance. Moreover, we propose a novel technique to automatically translate APIs to canonical utterances. Finally, we present our platform to automatically generate bots out of API specifications. We also conduct thorough experiments to validate the proposed techniques and models

    Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models

    Full text link
    We introduce VL2NL, a Large Language Model (LLM) framework that generates rich and diverse NL datasets using only Vega-Lite specifications as input, thereby streamlining the development of Natural Language Interfaces (NLIs) for data visualization. To synthesize relevant chart semantics accurately and enhance syntactic diversity in each NL dataset, we leverage 1) a guided discovery incorporated into prompting so that LLMs can steer themselves to create faithful NL datasets in a self-directed manner; 2) a score-based paraphrasing to augment NL syntax along with four language axes. We also present a new collection of 1,981 real-world Vega-Lite specifications that have increased diversity and complexity than existing chart collections. When tested on our chart collection, VL2NL extracted chart semantics and generated L1/L2 captions with 89.4% and 76.0% accuracy, respectively. It also demonstrated generating and paraphrasing utterances and questions with greater diversity compared to the benchmarks. Last, we discuss how our NL datasets and framework can be utilized in real-world scenarios. The codes and chart collection are available at https://github.com/hyungkwonko/chart-llm.Comment: 22 pages, 5 figure

    LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity

    Full text link
    Cross-task generalization is a significant outcome that defines mastery in natural language understanding. Humans show a remarkable aptitude for this, and can solve many different types of tasks, given definitions in the form of textual instructions and a small set of examples. Recent work with pre-trained language models mimics this learning style: users can define and exemplify a task for the model to attempt as a series of natural language prompts or instructions. While prompting approaches have led to higher cross-task generalization compared to traditional supervised learning, analyzing 'bias' in the task instructions given to the model is a difficult problem, and has thus been relatively unexplored. For instance, are we truly modeling a task, or are we modeling a user's instructions? To help investigate this, we develop LINGO, a novel visual analytics interface that supports an effective, task-driven workflow to (1) help identify bias in natural language task instructions, (2) alter (or create) task instructions to reduce bias, and (3) evaluate pre-trained model performance on debiased task instructions. To robustly evaluate LINGO, we conduct a user study with both novice and expert instruction creators, over a dataset of 1,616 linguistic tasks and their natural language instructions, spanning 55 different languages. For both user groups, LINGO promotes the creation of more difficult tasks for pre-trained models, that contain higher linguistic diversity and lower instruction bias. We additionally discuss how the insights learned in developing and evaluating LINGO can aid in the design of future dashboards that aim to minimize the effort involved in prompt creation across multiple domains.Comment: 13 pages, 6 figures, Eurovis 202

    Conversational Question Answering on Heterogeneous Sources

    Get PDF
    Conversational question answering (ConvQA) tackles sequential informationneeds where contexts in follow-up questions are left implicit. Current ConvQAsystems operate over homogeneous sources of information: either a knowledgebase (KB), or a text corpus, or a collection of tables. This paper addressesthe novel issue of jointly tapping into all of these together, this wayboosting answer coverage and confidence. We present CONVINSE, an end-to-endpipeline for ConvQA over heterogeneous sources, operating in three stages: i)learning an explicit structured representation of an incoming question and itsconversational context, ii) harnessing this frame-like representation touniformly capture relevant evidences from KB, text, and tables, and iii)running a fusion-in-decoder model to generate the answer. We construct andrelease the first benchmark, ConvMix, for ConvQA over heterogeneous sources,comprising 3000 real-user conversations with 16000 questions, along with entityannotations, completed question utterances, and question paraphrases.Experiments demonstrate the viability and advantages of our method, compared tostate-of-the-art baselines.<br

    Conversational Question Answering on Heterogeneous Sources

    Get PDF
    • …
    corecore