5 research outputs found
Embedding Mental Health Discourse for Community Recommendation
Our paper investigates the use of discourse embedding techniques to develop a
community recommendation system that focuses on mental health support groups on
social media. Social media platforms provide a means for users to anonymously
connect with communities that cater to their specific interests. However, with
the vast number of online communities available, users may face difficulties in
identifying relevant groups to address their mental health concerns. To address
this challenge, we explore the integration of discourse information from
various subreddit communities using embedding techniques to develop an
effective recommendation system. Our approach involves the use of content-based
and collaborative filtering techniques to enhance the performance of the
recommendation system. Our findings indicate that the proposed approach
outperforms the use of each technique separately and provides interpretability
in the recommendation process.Comment: Accepted to the 4th workshop on Computational Approaches to Discourse
(CODI-2023) at ACL 202
Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection
Network intrusion detection (NID) systems which leverage machine learning
have been shown to have strong performance in practice when used to detect
malicious network traffic. Decision trees in particular offer a strong balance
between performance and simplicity, but require users of NID systems to have
background knowledge in machine learning to interpret. In addition, they are
unable to provide additional outside information as to why certain features may
be important for classification.
In this work, we explore the use of large language models (LLMs) to provide
explanations and additional background knowledge for decision tree NID systems.
Further, we introduce a new human evaluation framework for decision tree
explanations, which leverages automatically generated quiz questions that
measure human evaluators' understanding of decision tree inference. Finally, we
show LLM generated decision tree explanations correlate highly with human
ratings of readability, quality, and use of background knowledge while
simultaneously providing better understanding of decision boundaries.Comment: Accepted to NeurIPS XAIA Workshop 202
Primary hyperparathyroidism screening with machine learning
Primary Hyperparathyroidism(PHPT) is a relatively common disease, affecting about one in
every 1,000 adults. However, screening for PHPT can be difficult, meaning it often goes
undiagnosed for long periods of time. While looking at specific blood test results independently
can help indicate whether a patient has PHPT, often these blood result levels can all be within
their respective normal ranges despite the patient having PHPT. Based on clinical data from the
real world, in this work, we propose a novel approach to screening PHPT with neural network
(NN) architectures, achieving over 97% accuracy with common blood values as inputs. Further,
we propose a second model achieving over 99% accuracy with additional lab test values as
inputs. Moreover, compared to traditional PHPT screening methods, our NN can reduce the false
negatives of traditional screening methods by 99%.Thesis (B.?)Honors Colleg
Large Language Models are Built-in Autoregressive Search Engines
Document retrieval is a key stage of standard Web search engines. Existing
dual-encoder dense retrievers obtain representations for questions and
documents independently, allowing for only shallow interactions between them.
To overcome this limitation, recent autoregressive search engines replace the
dual-encoder architecture by directly generating identifiers for relevant
documents in the candidate pool. However, the training cost of such
autoregressive search engines rises sharply as the number of candidate
documents increases. In this paper, we find that large language models (LLMs)
can follow human instructions to directly generate URLs for document retrieval.
Surprisingly, when providing a few {Query-URL} pairs as in-context
demonstrations, LLMs can generate Web URLs where nearly 90\% of the
corresponding documents contain correct answers to open-domain questions. In
this way, LLMs can be thought of as built-in search engines, since they have
not been explicitly trained to map questions to document identifiers.
Experiments demonstrate that our method can consistently achieve better
retrieval performance than existing retrieval approaches by a significant
margin on three open-domain question answering benchmarks, under both zero and
few-shot settings. The code for this work can be found at
\url{https://github.com/Ziems/llm-url}.Comment: Accepted to ACL 2023 Finding