7,972 research outputs found

    A large annotated corpus for learning natural language inference

    Full text link
    Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.Comment: To appear at EMNLP 2015. The data will be posted shortly before the conference (the week of 14 Sep) at http://nlp.stanford.edu/projects/snli

    Automatic Comprehension of Customer Queries for Feedback Generation

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg in fulfillment of the requirements for the degree of Master of Science, 2018One major challenge in customer-driven industries is the response to large volumes ofqueries. Inresponsetothisbusinessneed,FrequentlyAskedQuestions(FAQs)have been used for over four decades to provide customers with a repository of questions and associated answers. However, FAQs require some efforts on the part of the customers to search, especially when the FAQ repository is large and poorly indexed or structured. Thisevengetsdifficultwhenanorganisationhashundredsofqueriesinits repository of FAQs. One way of dealing with this rigorous task is to allow customers to ask their questions in a Natural Language, extract the meaning of the input text and automatically provide feedback from a pool of FAQs. This is an Information Retrieval (IR) problem, in Natural Language Processing (NLP). This research work, presents the first application of Jumping Finite Automata (JFA) — an abstract computing machine — in performing this IR task. This methodology involves the abstraction of all FAQs to a JFA and applying algorithms to map customer queries to the underlying JFA of all possible queries. A data set of FAQs from a university’s Computer and Network Service (CNS) was used as test case. A prototype chat-bot application was developed that takes customer queries in a chat, automatically maps them to a FAQ, and presents the corresponding answer to the user. This research is expected to be the first of such applications of JFA in comprehending customer queries.XL201

    Generate-then-Retrieve: Intent-Aware FAQ Retrieval in Product Search

    Full text link
    Customers interacting with product search engines are increasingly formulating information-seeking queries. Frequently Asked Question (FAQ) retrieval aims to retrieve common question-answer pairs for a user query with question intent. Integrating FAQ retrieval in product search can not only empower users to make more informed purchase decisions, but also enhance user retention through efficient post-purchase support. Determining when an FAQ entry can satisfy a user's information need within product search, without disrupting their shopping experience, represents an important challenge. We propose an intent-aware FAQ retrieval system consisting of (1) an intent classifier that predicts when a user's information need can be answered by an FAQ; (2) a reformulation model that rewrites a query into a natural question. Offline evaluation demonstrates that our approach improves Hit@1 by 13% on retrieving ground-truth FAQs, while reducing latency by 95% compared to baseline systems. These improvements are further validated by real user feedback, where 71% of displayed FAQs on top of product search results received explicit positive user feedback. Overall, our findings show promising directions for integrating FAQ retrieval into product search at scale.Comment: ACL 2023 Industry Trac

    A Mobile-Health Information Access System

    Get PDF
    Patients using the Mobile-Health Information System can send SMS requests to a Frequently Asked Questions (FAQ) web server with the expectation of receiving an appropriate feedback on issues that relate to their health. The accuracy of such feedback is paramount to the mobile search user. However, automating SMS-based information search and retrieval poses significant challenges because of the inherent noise in SMS communication. First, in this paper an architecture is proposed for the implementation of the retrieval process, and second, an algorithm is developed for the best-ranked question-answer pair retrieval. We present an algorithm that assists in the selection of the best FAQ-query after the ranking of the query-answer pair. Results are generated based on the ranking of the FAQ-query. Our algorithm gives a better result in terms of average precision and recall when compared with the naıve retrieval algorithm.Southern Africa Telecommunication Networks and Applications Conference (SATNAC)Department of HE and Training approved lis

    A Comparative analysis: QA evaluation questions versus real-world queries

    Get PDF
    This paper presents a comparative analysis of user queries to a web search engine, questions to a Q&A service (answers.com), and questions employed in question answering (QA) evaluations at TREC and CLEF. The analysis shows that user queries to search engines contain mostly content words (i.e. keywords) but lack structure words (i.e. stopwords) and capitalization. Thus, they resemble natural language input after case folding and stopword removal. In contrast, topics for QA evaluation and questions to answers.com mainly consist of fully capitalized and syntactically well-formed questions. Classification experiments using a na¨ıve Bayes classifier show that stopwords play an important role in determining the expected answer type. A classification based on stopwords is considerably more accurate (47.5% accuracy) than a classification based on all query words (40.1% accuracy) or on content words (33.9% accuracy). To simulate user input, questions are preprocessed by case folding and stopword removal. Additional classification experiments aim at reconstructing the syntactic wh-word frame of a question, i.e. the embedding of the interrogative word. Results indicate that this part of questions can be reconstructed with moderate accuracy (25.7%), but for a classification problem with a much larger number of classes compared to classifying queries by expected answer type (2096 classes vs. 130 classes). Furthermore, eliminating stopwords can lead to multiple reconstructed questions with a different or with the opposite meaning (e.g. if negations or temporal restrictions are included). In conclusion, question reconstruction from short user queries can be seen as a new realistic evaluation challenge for QA systems

    Measuring and Narrowing the Compositionality Gap in Language Models

    Full text link
    We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate the overall solution, a ratio we call the compositionality gap. We evaluate this ratio by asking multi-hop questions with answers that require composing multiple facts unlikely to have been observed together during pretraining. In the GPT-3 family of models, as model size increases we show that the single-hop question answering performance improves faster than the multi-hop performance does, therefore the compositionality gap does not decrease. This surprising result suggests that while more powerful models memorize and recall more factual knowledge, they show no corresponding improvement in their ability to perform this kind of compositional reasoning. We then demonstrate how elicitive prompting (such as chain of thought) narrows the compositionality gap by reasoning explicitly instead of implicitly. We present a new method, self-ask, that further improves on chain of thought. In our method, the model explicitly asks itself (and then answers) follow-up questions before answering the initial question. We finally show that self-ask's structured prompting lets us easily plug in a search engine to answer the follow-up questions, which additionally improves accuracy

    EagleBot: A Chatbot Based Multi-Tier Question Answering System for Retrieving Answers From Heterogeneous Sources Using BERT

    Get PDF
    This paper proposes to tackle Question Answering on a specific domain by developing a multi-tier system using three different types of data storage for storing answers. For testing our system on University domain we have used extracted data from Georgia Southern University website. For the task of faster retrieval we have divided our answer data sources into three distinct types and utilized Dialogflow\u27s Natural Language Understanding engine for route selection. We compared different word and sentence embedding techniques for making a semantic question search engine and BERT sentence embedding gave us the best result and for extracting answer from a large collection of documents we also achieved the highest accuracy using the BERT-base model. Besides trying with the BERT-base model we also achieved competitive accuracy by using BERT embedding on paragraph splitted documents. We have also been able to accelerate the answer retrieval time by a huge percentage using pre-stored embedding

    Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?

    Full text link
    Recent advances in natural language processing (NLP) have led to the development of large language models (LLMs) such as ChatGPT. This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes. The proposed method involves translating an English dataset into French and training a classifier on the translated data. Results show that the detectors can effectively detect ChatGPT-generated text, with a degree of robustness against basic attack techniques in in-domain settings. However, vulnerabilities are evident in out-of-domain contexts, highlighting the challenge of detecting adversarial text. The study emphasizes caution when applying in-domain testing results to a wider variety of content. We provide our translated datasets and models as open-source resources. https://gitlab.inria.fr/wantoun/robust-chatgpt-detectionComment: Accepted to TALN 202
    corecore