7 research outputs found
ProKnow: Process Knowledge for Safety Constrained and Explainable Question Generation for Mental Health Diagnostic Assistance
Current Virtual Mental Health Assistants (VMHAs) provide counseling and suggestive care. They refrain from patient diagnostic assistance because of a lack of training on safety-constrained and specialized clinical process knowledge (Pro-Know). In this work, we define ProKnow as an ordered set of information that maps to evidence-based guidelines or categories of conceptual understanding to experts in a domain. We also introduce a new dataset of diagnostic conversations guided by safety constraints and ProKnow that healthcare professionals use (ProKnow-data). We develop a method for natural language question generation (NLG) that collects diagnostic information from the patient interactively (ProKnow-algo). We demonstrate the limitations of using state-of-the-art large-scale language models (LMs) on this dataset. ProKnow-algo models the process knowledge through explicitly modeling safety, knowledge capture, and explainability. LMs with ProKnow-algo generated 89% safer questions in the depression and anxiety domain. Further, without ProKnow-algo generations question did not adhere to clinical process knowledge in ProKnow-data. In comparison, ProKnow-algo-based generations yield a 96% reduction in averaged squared rank error. The Explainability of the generated question is assessed by computing similarity with concepts in depression and anxiety knowledge bases. Overall, irrespective of the type of LMs, ProKnow-algo achieved an averaged 82% improvement over simple pre-trained LMs on safety, explainability, and process-guided question generation. We qualitatively and quantitatively evaluate the efficacy of ProKnow-algo by introducing three new evaluation metrics for safety, explainability, and process knowledge-adherence. For reproducibility, we will make ProKnow-data and the code repository of ProKnow-algo publicly available upon acceptance
TDLR: Top (\u3cem\u3eSemantic\u3c/em\u3e)-Down (\u3cem\u3eSyntactic\u3c/em\u3e) Language Representation
Language understanding involves processing text with both the grammatical and common-sense contexts of the text fragments. The text “I went to the grocery store and brought home a car” requires both the grammatical context (syntactic) and common-sense context (semantic) to capture the oddity in the sentence. Contextualized text representations learned by Language Models (LMs) are expected to capture a variety of syntactic and semantic contexts from large amounts of training data corpora. Recent work such as ERNIE has shown that infusing the knowledge contexts, where they are available in LMs, results in significant performance gains on General Language Understanding (GLUE) benchmark tasks. However, to our knowledge, no knowledge-aware model has attempted to infuse knowledge through top-down semantics-driven syntactic processing (Eg: Common-sense to Grammatical) and directly operated on the attention mechanism that LMs leverage to learn the data context. We propose a learning framework Top-Down Language Representation (TDLR) to infuse common-sense semantics into LMs. In our implementation, we build on BERT for its rich syntactic knowledge and use the knowledge graphs ConceptNet and WordNet to infuse semantic knowledge
Exploring the Relationship between LLM Hallucinations and Prompt Linguistic Nuances: Readability, Formality, and Concreteness
As Large Language Models (LLMs) have advanced, they have brought forth new
challenges, with one of the prominent issues being LLM hallucination. While
various mitigation techniques are emerging to address hallucination, it is
equally crucial to delve into its underlying causes. Consequently, in this
preliminary exploratory investigation, we examine how linguistic factors in
prompts, specifically readability, formality, and concreteness, influence the
occurrence of hallucinations. Our experimental results suggest that prompts
characterized by greater formality and concreteness tend to result in reduced
hallucination. However, the outcomes pertaining to readability are somewhat
inconclusive, showing a mixed pattern
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
As Large Language Models (LLMs) continue to advance in their ability to write
human-like text, a key challenge remains around their tendency to hallucinate
generating content that appears factual but is ungrounded. This issue of
hallucination is arguably the biggest hindrance to safely deploying these
powerful LLMs into real-world production systems that impact people's lives.
The journey toward widespread adoption of LLMs in practical settings heavily
relies on addressing and mitigating hallucinations. Unlike traditional AI
systems focused on limited tasks, LLMs have been exposed to vast amounts of
online text data during training. While this allows them to display impressive
language fluency, it also means they are capable of extrapolating information
from the biases in training data, misinterpreting ambiguous prompts, or
modifying the information to align superficially with the input. This becomes
hugely alarming when we rely on language generation capabilities for sensitive
applications, such as summarizing medical records, financial analysis reports,
etc. This paper presents a comprehensive survey of over 32 techniques developed
to mitigate hallucination in LLMs. Notable among these are Retrieval Augmented
Generation (Lewis et al, 2021), Knowledge Retrieval (Varshney et al,2023),
CoNLI (Lei et al, 2023), and CoVe (Dhuliawala et al, 2023). Furthermore, we
introduce a detailed taxonomy categorizing these methods based on various
parameters, such as dataset utilization, common tasks, feedback mechanisms, and
retriever types. This classification helps distinguish the diverse approaches
specifically designed to tackle hallucination issues in LLMs. Additionally, we
analyze the challenges and limitations inherent in these techniques, providing
a solid foundation for future research in addressing hallucinations and related
phenomena within the realm of LLMs
The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations
The recent advancements in Large Language Models (LLMs) have garnered
widespread acclaim for their remarkable emerging capabilities. However, the
issue of hallucination has parallelly emerged as a by-product, posing
significant concerns. While some recent endeavors have been made to identify
and mitigate different types of hallucination, there has been a limited
emphasis on the nuanced categorization of hallucination and associated
mitigation methods. To address this gap, we offer a fine-grained discourse on
profiling hallucination based on its degree, orientation, and category, along
with offering strategies for alleviation. As such, we define two overarching
orientations of hallucination: (i) factual mirage (FM) and (ii) silver lining
(SL). To provide a more comprehensive understanding, both orientations are
further sub-categorized into intrinsic and extrinsic, with three degrees of
severity - (i) mild, (ii) moderate, and (iii) alarming. We also meticulously
categorize hallucination into six types: (i) acronym ambiguity, (ii) numeric
nuisance, (iii) generated golem, (iv) virtual voice, (v) geographic erratum,
and (vi) time wrap. Furthermore, we curate HallucInation eLiciTation (HILT), a
publicly available dataset comprising of 75,000 samples generated using 15
contemporary LLMs along with human annotations for the aforementioned
categories. Finally, to establish a method for quantifying and to offer a
comparative spectrum that allows us to evaluate and rank LLMs based on their
vulnerability to producing hallucinations, we propose Hallucination
Vulnerability Index (HVI). We firmly believe that HVI holds significant value
as a tool for the wider NLP community, with the potential to serve as a rubric
in AI-related policy-making. In conclusion, we propose two solution strategies
for mitigating hallucinations
FETILDA: An Effective Framework For Fin-tuned Embeddings For Long Financial Text Documents
Unstructured data, especially text, continues to grow rapidly in various
domains. In particular, in the financial sphere, there is a wealth of
accumulated unstructured financial data, such as the textual disclosure
documents that companies submit on a regular basis to regulatory agencies, such
as the Securities and Exchange Commission (SEC). These documents are typically
very long and tend to contain valuable soft information about a company's
performance. It is therefore of great interest to learn predictive models from
these long textual documents, especially for forecasting numerical key
performance indicators (KPIs). Whereas there has been a great progress in
pre-trained language models (LMs) that learn from tremendously large corpora of
textual data, they still struggle in terms of effective representations for
long documents. Our work fills this critical need, namely how to develop better
models to extract useful information from long textual documents and learn
effective features that can leverage the soft financial and risk information
for text regression (prediction) tasks. In this paper, we propose and implement
a deep learning framework that splits long documents into chunks and utilizes
pre-trained LMs to process and aggregate the chunks into vector
representations, followed by self-attention to extract valuable document-level
features. We evaluate our model on a collection of 10-K public disclosure
reports from US banks, and another dataset of reports submitted by US
companies. Overall, our framework outperforms strong baseline methods for
textual modeling as well as a baseline regression model using only numerical
data. Our work provides better insights into how utilizing pre-trained
domain-specific and fine-tuned long-input LMs in representing long documents
can improve the quality of representation of textual data, and therefore, help
in improving predictive analyses.Comment: 10 pages, 9 figures, 7 table