Search CORE

14 research outputs found

RACE: Large-scale ReAding Comprehension Dataset From Examinations

Author: Hovy Eduard
Lai Guokun
Liu Hanxiao
Xie Qizhe
Yang Yiming
Publication venue
Publication date: 01/01/2017
Field of study

We present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28,000 passages and near 100,000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluating the students' ability in understanding and reasoning. In particular, the proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models (43%) and the ceiling human performance (95%). We hope this new dataset can serve as a valuable resource for research and evaluation in machine comprehension. The dataset is freely available at http://www.cs.cmu.edu/~glai1/data/race/ and the code is available at https://github.com/qizhex/RACE_AR_baselines.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest

Author: Ai Fangzhou
Fan Zhen
Gao Luyu
Lai Guokun
Yang Hongxia
Yang Yiming
Zhang Ruohong
Zhang Zheng
Zheng Chen
Publication venue
Publication date: 17/11/2023
Field of study

Large Language Models (LLMs), despite their great power in language generation, often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains. This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources, and the adaptive training of a chatbot with domain-specific inquiries. Our two-step approach starts from training a knowledge miner, namely LLMiner, which autonomously extracts Question-Answer pairs from relevant documents through a chain-of-thought reasoning process. Subsequently, we blend the mined QA pairs with a conversational dataset to fine-tune the LLM as a chatbot, thereby enriching its domain-specific expertise and conversational capabilities. We also developed a new evaluation benchmark which comprises four domain-specific text corpora and associated human-crafted QA pairs for testing. Our model shows remarkable performance improvement over generally aligned LLM and surpasses domain-adapted models directly fine-tuned on domain corpus. In particular, LLMiner achieves this with minimal human intervention, requiring only 600 seed instances, thereby providing a pathway towards self-improvement of LLMs through model-synthesized training data.Comment: Work in progres

arXiv.org e-Print Archive

Controlling Risk of Web Question Answering

Author: Devlin Jacob
Dunn Matthew
Ferrucci David
Gal Yarin
Geifman Yonatan
Guo Chuan
Lai Guokun
Levy Omer
Malinin Andrey
Nguyen Tri
Richardson Matthew
Vinyals Oriol
Voorhees Ellen M.
Wang Shuohang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/07/2019
Field of study

Web question answering (QA) has become an indispensable component in modern search systems, which can significantly improve users' search experience by providing a direct answer to users' information need. This could be achieved by applying machine reading comprehension (MRC) models over the retrieved passages to extract answers with respect to the search query. With the development of deep learning techniques, state-of-the-art MRC performances have been achieved by recent deep methods. However, existing studies on MRC seldom address the predictive uncertainty issue, i.e., how likely the prediction of an MRC model is wrong, leading to uncontrollable risks in real-world Web QA applications. In this work, we first conduct an in-depth investigation over the risk of Web QA. We then introduce a novel risk control framework, which consists of a qualify model for uncertainty estimation using the probe idea, and a decision model for selectively output. For evaluation, we introduce risk-related metrics, rather than the traditional EM and F1 in MRC, for the evaluation of risk-aware Web QA. The empirical results over both the real-world Web QA dataset and the academic MRC benchmark collection demonstrate the effectiveness of our approach.Comment: 42nd International ACM SIGIR Conference on Research and Development in Information Retrieva

arXiv.org e-Print Archive

Crossref

Natural Questions: A Benchmark for Question Answering Research

Author: Bowman Samuel R.
Chen Danqi
Choi Eunsol
Clark Christopher
Devroye Luc
He Wei
Hearst Marti A.
Hermann Karl Moritz
Hill Felix
Jia Robin
Joshi Mandar
Lai Guokun
Mihaylov Todor
Nguyen Tri
Onishi Takeshi
Paperno Denis
Papineni Kishore
Parikh Ankur
Rajpurkar Pranav
Rajpurkar Pranav
Richardson Matthew
Williams Adina
Yang Zhilin
Yi Yang
Publication venue: 'MIT Press - Journals'
Publication date
Field of study

Crossref

DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension

Crossref