7,972 research outputs found
A large annotated corpus for learning natural language inference
Understanding entailment and contradiction is fundamental to understanding
natural language, and inference about entailment and contradiction is a
valuable testing ground for the development of semantic representations.
However, machine learning research in this area has been dramatically limited
by the lack of large-scale resources. To address this, we introduce the
Stanford Natural Language Inference corpus, a new, freely available collection
of labeled sentence pairs, written by humans doing a novel grounded task based
on image captioning. At 570K pairs, it is two orders of magnitude larger than
all other resources of its type. This increase in scale allows lexicalized
classifiers to outperform some sophisticated existing entailment models, and it
allows a neural network-based model to perform competitively on natural
language inference benchmarks for the first time.Comment: To appear at EMNLP 2015. The data will be posted shortly before the
conference (the week of 14 Sep) at http://nlp.stanford.edu/projects/snli
Automatic Comprehension of Customer Queries for Feedback Generation
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg in fulfillment of the requirements for the degree of Master of Science, 2018One major challenge in customer-driven industries is the response to large volumes ofqueries. Inresponsetothisbusinessneed,FrequentlyAskedQuestions(FAQs)have been used for over four decades to provide customers with a repository of questions and associated answers. However, FAQs require some efforts on the part of the customers to search, especially when the FAQ repository is large and poorly indexed or structured. Thisevengetsdifficultwhenanorganisationhashundredsofqueriesinits repository of FAQs. One way of dealing with this rigorous task is to allow customers to ask their questions in a Natural Language, extract the meaning of the input text and automatically provide feedback from a pool of FAQs. This is an Information Retrieval (IR) problem, in Natural Language Processing (NLP). This research work, presents the first application of Jumping Finite Automata (JFA) — an abstract computing machine — in performing this IR task. This methodology involves the abstraction of all FAQs to a JFA and applying algorithms to map customer queries to the underlying JFA of all possible queries. A data set of FAQs from a university’s Computer and Network Service (CNS) was used as test case. A prototype chat-bot application was developed that takes customer queries in a chat, automatically maps them to a FAQ, and presents the corresponding answer to the user. This research is expected to be the first of such applications of JFA in comprehending customer queries.XL201
Generate-then-Retrieve: Intent-Aware FAQ Retrieval in Product Search
Customers interacting with product search engines are increasingly
formulating information-seeking queries. Frequently Asked Question (FAQ)
retrieval aims to retrieve common question-answer pairs for a user query with
question intent. Integrating FAQ retrieval in product search can not only
empower users to make more informed purchase decisions, but also enhance user
retention through efficient post-purchase support. Determining when an FAQ
entry can satisfy a user's information need within product search, without
disrupting their shopping experience, represents an important challenge. We
propose an intent-aware FAQ retrieval system consisting of (1) an intent
classifier that predicts when a user's information need can be answered by an
FAQ; (2) a reformulation model that rewrites a query into a natural question.
Offline evaluation demonstrates that our approach improves Hit@1 by 13% on
retrieving ground-truth FAQs, while reducing latency by 95% compared to
baseline systems. These improvements are further validated by real user
feedback, where 71% of displayed FAQs on top of product search results received
explicit positive user feedback. Overall, our findings show promising
directions for integrating FAQ retrieval into product search at scale.Comment: ACL 2023 Industry Trac
A Mobile-Health Information Access System
Patients using the Mobile-Health Information System
can send SMS requests to a Frequently Asked Questions
(FAQ) web server with the expectation of receiving an appropriate
feedback on issues that relate to their health. The accuracy of
such feedback is paramount to the mobile search user. However,
automating SMS-based information search and retrieval poses
significant challenges because of the inherent noise in SMS
communication. First, in this paper an architecture is proposed
for the implementation of the retrieval process, and second, an
algorithm is developed for the best-ranked question-answer pair
retrieval. We present an algorithm that assists in the selection of
the best FAQ-query after the ranking of the query-answer pair.
Results are generated based on the ranking of the FAQ-query.
Our algorithm gives a better result in terms of average precision
and recall when compared with the naıve retrieval algorithm.Southern Africa Telecommunication Networks and Applications Conference (SATNAC)Department of HE and Training approved lis
A Comparative analysis: QA evaluation questions versus real-world queries
This paper presents a comparative analysis of user queries to a web search engine, questions to a Q&A service (answers.com), and questions employed in question answering (QA) evaluations at TREC and CLEF. The analysis shows that user queries to search engines contain mostly content words (i.e. keywords) but lack structure words (i.e. stopwords) and capitalization. Thus, they resemble natural language input after case folding and stopword removal. In contrast, topics for QA evaluation and questions to answers.com mainly
consist of fully capitalized and syntactically well-formed questions. Classification experiments using a na¨ıve Bayes classifier show that stopwords play an important role in determining the expected answer type. A classification based on stopwords is considerably more accurate (47.5% accuracy) than a classification based on all query words (40.1% accuracy) or on content words (33.9% accuracy). To
simulate user input, questions are preprocessed by case folding and stopword removal. Additional classification experiments aim at reconstructing the syntactic wh-word frame of a question, i.e. the embedding of the interrogative word. Results indicate that this part of
questions can be reconstructed with moderate accuracy (25.7%), but for a classification problem with a much larger number of classes compared to classifying queries by expected answer type (2096 classes vs. 130 classes). Furthermore, eliminating stopwords can lead to multiple reconstructed questions with a different or with the opposite meaning (e.g. if negations or temporal restrictions are included). In conclusion, question reconstruction from short user queries can be seen as a new realistic evaluation challenge for QA systems
Measuring and Narrowing the Compositionality Gap in Language Models
We investigate the ability of language models to perform compositional
reasoning tasks where the overall solution depends on correctly composing the
answers to sub-problems. We measure how often models can correctly answer all
sub-problems but not generate the overall solution, a ratio we call the
compositionality gap. We evaluate this ratio by asking multi-hop questions with
answers that require composing multiple facts unlikely to have been observed
together during pretraining. In the GPT-3 family of models, as model size
increases we show that the single-hop question answering performance improves
faster than the multi-hop performance does, therefore the compositionality gap
does not decrease. This surprising result suggests that while more powerful
models memorize and recall more factual knowledge, they show no corresponding
improvement in their ability to perform this kind of compositional reasoning.
We then demonstrate how elicitive prompting (such as chain of thought)
narrows the compositionality gap by reasoning explicitly instead of implicitly.
We present a new method, self-ask, that further improves on chain of thought.
In our method, the model explicitly asks itself (and then answers) follow-up
questions before answering the initial question. We finally show that
self-ask's structured prompting lets us easily plug in a search engine to
answer the follow-up questions, which additionally improves accuracy
EagleBot: A Chatbot Based Multi-Tier Question Answering System for Retrieving Answers From Heterogeneous Sources Using BERT
This paper proposes to tackle Question Answering on a specific domain by developing a multi-tier system using three different types of data storage for storing answers. For testing our system on University domain we have used extracted data from Georgia Southern University website. For the task of faster retrieval we have divided our answer data sources into three distinct types and utilized Dialogflow\u27s Natural Language Understanding engine for route selection. We compared different word and sentence embedding techniques for making a semantic question search engine and BERT sentence embedding gave us the best result and for extracting answer from a large collection of documents we also achieved the highest accuracy using the BERT-base model. Besides trying with the BERT-base model we also achieved competitive accuracy by using BERT embedding on paragraph splitted documents. We have also been able to accelerate the answer retrieval time by a huge percentage using pre-stored embedding
Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?
Recent advances in natural language processing (NLP) have led to the
development of large language models (LLMs) such as ChatGPT. This paper
proposes a methodology for developing and evaluating ChatGPT detectors for
French text, with a focus on investigating their robustness on out-of-domain
data and against common attack schemes. The proposed method involves
translating an English dataset into French and training a classifier on the
translated data. Results show that the detectors can effectively detect
ChatGPT-generated text, with a degree of robustness against basic attack
techniques in in-domain settings. However, vulnerabilities are evident in
out-of-domain contexts, highlighting the challenge of detecting adversarial
text. The study emphasizes caution when applying in-domain testing results to a
wider variety of content. We provide our translated datasets and models as
open-source resources. https://gitlab.inria.fr/wantoun/robust-chatgpt-detectionComment: Accepted to TALN 202
- …