389 research outputs found
Text messaging and retrieval techniques for a mobile health information system
Mobile phones have been identified as one of the technologies that can be used to overcome the challenges of information dissemination regarding serious diseases. Short message services, a much used function of cell phones, for example, can be turned into a major tool for accessing databases. This paper focuses on the design and development of a short message services-based information access algorithm to carefully screen information on human immunodeficiency virus/acquired immune deficiency syndrome within the context of a frequently asked questions system. However, automating the short message services-based information search and retrieval poses significant challenges because of the inherent noise in its communications. The developed algorithm was used to retrieve the best-ranked question–answer pair. Results were evaluated using three metrics: average precision, recall and computational time. The retrieval efficacy was measured and it was confirmed that there was a significant improvement in the results of the proposed algorithm when compared with similar retrieval algorithms
A Mobile-Health Information Access System
Patients using the Mobile-Health Information System
can send SMS requests to a Frequently Asked Questions
(FAQ) web server with the expectation of receiving an appropriate
feedback on issues that relate to their health. The accuracy of
such feedback is paramount to the mobile search user. However,
automating SMS-based information search and retrieval poses
significant challenges because of the inherent noise in SMS
communication. First, in this paper an architecture is proposed
for the implementation of the retrieval process, and second, an
algorithm is developed for the best-ranked question-answer pair
retrieval. We present an algorithm that assists in the selection of
the best FAQ-query after the ranking of the query-answer pair.
Results are generated based on the ranking of the FAQ-query.
Our algorithm gives a better result in terms of average precision
and recall when compared with the naıve retrieval algorithm.Southern Africa Telecommunication Networks and Applications Conference (SATNAC)Department of HE and Training approved lis
Generate-then-Retrieve: Intent-Aware FAQ Retrieval in Product Search
Customers interacting with product search engines are increasingly
formulating information-seeking queries. Frequently Asked Question (FAQ)
retrieval aims to retrieve common question-answer pairs for a user query with
question intent. Integrating FAQ retrieval in product search can not only
empower users to make more informed purchase decisions, but also enhance user
retention through efficient post-purchase support. Determining when an FAQ
entry can satisfy a user's information need within product search, without
disrupting their shopping experience, represents an important challenge. We
propose an intent-aware FAQ retrieval system consisting of (1) an intent
classifier that predicts when a user's information need can be answered by an
FAQ; (2) a reformulation model that rewrites a query into a natural question.
Offline evaluation demonstrates that our approach improves Hit@1 by 13% on
retrieving ground-truth FAQs, while reducing latency by 95% compared to
baseline systems. These improvements are further validated by real user
feedback, where 71% of displayed FAQs on top of product search results received
explicit positive user feedback. Overall, our findings show promising
directions for integrating FAQ retrieval into product search at scale.Comment: ACL 2023 Industry Trac
A semi-automated FAQ retrieval system for HIV/AIDS
This thesis describes a semi-automated FAQ retrieval system that can be queried by users through short text messages on low-end mobile phones to provide answers on HIV/AIDS related queries. First we address the issue of result presentation on low-end mobile phones by proposing an iterative interaction retrieval strategy where the user engages with the FAQ retrieval system in the question answering process. At each iteration, the system returns only one question-answer pair to the user and the iterative process terminates after the user's information need has been satisfied. Since the proposed system is iterative, this thesis attempts to reduce the number of iterations (search length) between the users and the system so that users do not abandon the search process before their information need has been satisfied. Moreover, we conducted a user study to determine the number of iterations that users are willing to tolerate before abandoning the iterative search process. We subsequently used the bad abandonment statistics from this study to develop an evaluation measure for estimating the probability that any random user will be satisfied when using our FAQ retrieval system.
In addition, we used a query log and its click-through data to address three main FAQ document collection deficiency problems in order to improve the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system. Conclusions are derived concerning whether we can reduce the rate at which users abandon their search before their information need has been satisfied by using information from previous searches to: Address the term mismatch problem between the users' SMS queries and the relevant FAQ documents in the collection; to selectively rank the FAQ document according to how often they have been previously identified as relevant by users for a particular query term; and to identify those queries that do not have a relevant FAQ document in the collection.
In particular, we proposed a novel template-based approach that uses queries from a query log for which the true relevant FAQ documents are known to enrich the FAQ documents with additional terms in order to alleviate the term mismatch problem. These terms are added as a separate field in a field-based model using two different proposed enrichment strategies, namely the Term Frequency and the Term Occurrence strategies. This thesis thoroughly investigates the effectiveness of the aforementioned FAQ document enrichment strategies using three different field-based models. Our findings suggest that we can improve the overall recall and the probability that any random user will be satisfied by enriching the FAQ documents with additional terms from queries in our query log. Moreover, our investigation suggests that it is important to use an FAQ document enrichment strategy that takes into consideration the number of times a term occurs in the query when enriching the FAQ documents. We subsequently show that our proposed enrichment approach for alleviating the term mismatch problem generalise well on other datasets.
Through the evaluation of our proposed approach for selectively ranking the FAQ documents, we show that we can improve the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system by incorporating the click popularity score of a query term t on an FAQ document d into the scoring and ranking process. Our results generalised well on a new dataset. However, when we deploy the click popularity score of a query term t on an FAQ document d on an enriched FAQ document collection, we saw a decrease in the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system.
Furthermore, we used our query log to build a binary classifier for detecting those queries that do not have a relevant FAQ document in the collection (Missing Content Queries (MCQs))). Before building such a classifier, we empirically evaluated several feature sets in order to determine the best combination of features for building a model that yields the best classification accuracy in identifying the MCQs and the non-MCQs. Using a different dataset, we show that we can improve the overall retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system by deploying a MCQs detection subsystem in our FAQ retrieval system to filter out the MCQs.
Finally, this thesis demonstrates that correcting spelling errors can help improve the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system. We tested our FAQ retrieval system with two different testing sets, one containing the original SMS queries and the other containing the SMS queries which were manually corrected for spelling errors. Our results show a significant improvement in the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system
Towards More Robust Natural Language Understanding
Natural Language Understanding (NLU) is a branch of Natural Language Processing (NLP) that uses intelligent computer software to understand texts that encode human knowledge. Recent years have witnessed notable progress across various NLU tasks with deep learning techniques, especially with pretrained language models. Besides proposing more advanced model architectures, constructing more reliable and trustworthy datasets also plays a huge role in improving NLU systems, without which it would be impossible to train a decent NLU model. It's worth noting that the human ability of understanding natural language is flexible and robust. On the contrary, most of existing NLU systems fail to achieve desirable performance on out-of-domain data or struggle on handling challenging items (e.g., inherently ambiguous items, adversarial items) in the real world. Therefore, in order to have NLU models understand human language more effectively, it is expected to prioritize the study on robust natural language understanding.
In this thesis, we deem that NLU systems are consisting of two components: NLU models and NLU datasets. As such, we argue that, to achieve robust NLU, the model architecture/training and the dataset are equally important. Specifically, we will focus on three NLU tasks to illustrate the robustness problem in different NLU tasks and our contributions (i.e., novel models and new datasets) to help achieve more robust natural language understanding. The major technical contributions of this thesis are:
(1) We study how to utilize diversity boosters (e.g., beam search & QPP) to help neural question generator synthesize diverse QA pairs, upon which a Question Answering (QA) system is trained to improve the generalization on the unseen target domain. It's worth mentioning that our proposed QPP (question phrase prediction) module, which predicts a set of valid question phrases given an answer evidence, plays an important role in improving the cross-domain generalizability for QA systems. Besides, a target-domain test set is constructed and approved by the community to help evaluate the model robustness under the cross-domain generalization setting.
(2) We investigate inherently ambiguous items in Natural Language Inference, for which annotators don't agree on the label. Ambiguous items are overlooked in the literature but often occurring in the real world. We build an ensemble model, AAs (Artificial Annotators), that simulates underlying annotation distribution to effectively identify such inherently ambiguous items. Our AAs are better at handling inherently ambiguous items since the model design captures the essence of the problem better than vanilla model architectures.
(3) We follow a standard practice to build a robust dataset for FAQ retrieval task, COUGH. In our dataset analysis, we show how COUGH better reflects the challenge of FAQ retrieval in the real situation than its counterparts. The imposed challenge will push forward the boundary of research on FAQ retrieval in real scenarios.
Moving forward, the ultimate goal for robust natural language understanding is to build NLU models which can behave humanly. That is, it's expected that robust NLU systems are capable to transfer the knowledge from training corpus to unseen documents more reliably and survive when encountering challenging items even if the system doesn't know a priori of users' inputs.No embargoAcademic Major: Computer Science and EngineeringAcademic Major: Industrial and Systems Engineerin
Short message service normalization for communication with a health information system
Philosophiae Doctor - PhDShort Message Service (SMS) is one of the most popularly used services for communication between mobile phone users. In recent times it has also been proposed as a means for information access. However, there are several challenges to be overcome in order to process an SMS, especially when it is used as a query in an information retrieval system.SMS users often tend deliberately to use compacted and grammatically incorrect writing that makes the message difficult to process with conventional information retrieval systems. To overcome this, a pre-processing step known as normalization is required. In this thesis an investigation of SMS normalization algorithms is carried out. To this end,studies have been conducted into the design of algorithms for translating and normalizing SMS text. Character-based, unsupervised and rule-based techniques are presented.
An investigation was also undertaken into the design and development of a system for information access via SMS. A specific system was designed to access information related to a Frequently Asked Questions (FAQ) database in healthcare, using a case study. This study secures SMS communication, especially for healthcare information systems. The proposed technique is to encipher the messages using the secure shell (SSH) protocol
- …