Search CORE

5 research outputs found

Embedding Mental Health Discourse for Community Recommendation

Author: Dang Hy
Jiang Meng
Nguyen Bang
Ziems Noah
Publication venue
Publication date: 07/07/2023
Field of study

Our paper investigates the use of discourse embedding techniques to develop a community recommendation system that focuses on mental health support groups on social media. Social media platforms provide a means for users to anonymously connect with communities that cater to their specific interests. However, with the vast number of online communities available, users may face difficulties in identifying relevant groups to address their mental health concerns. To address this challenge, we explore the integration of discourse information from various subreddit communities using embedding techniques to develop an effective recommendation system. Our approach involves the use of content-based and collaborative filtering techniques to enhance the performance of the recommendation system. Our findings indicate that the proposed approach outperforms the use of each technique separately and provides interpretability in the recommendation process.Comment: Accepted to the 4th workshop on Computational Approaches to Discourse (CODI-2023) at ACL 202

arXiv.org e-Print Archive

Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection

Author: Flanagan John
Jiang Meng
Liu Gang
Ziems Noah
Publication venue
Publication date: 30/10/2023
Field of study

Network intrusion detection (NID) systems which leverage machine learning have been shown to have strong performance in practice when used to detect malicious network traffic. Decision trees in particular offer a strong balance between performance and simplicity, but require users of NID systems to have background knowledge in machine learning to interpret. In addition, they are unable to provide additional outside information as to why certain features may be important for classification. In this work, we explore the use of large language models (LLMs) to provide explanations and additional background knowledge for decision tree NID systems. Further, we introduce a new human evaluation framework for decision tree explanations, which leverages automatically generated quiz questions that measure human evaluators' understanding of decision tree inference. Finally, we show LLM generated decision tree explanations correlate highly with human ratings of readability, quality, and use of background knowledge while simultaneously providing better understanding of decision boundaries.Comment: Accepted to NeurIPS XAIA Workshop 202

arXiv.org e-Print Archive

Primary hyperparathyroidism screening with machine learning

Author: Ziems Noah
Publication venue
Publication date: 01/05/2021
Field of study

Primary Hyperparathyroidism(PHPT) is a relatively common disease, affecting about one in every 1,000 adults. However, screening for PHPT can be difficult, meaning it often goes undiagnosed for long periods of time. While looking at specific blood test results independently can help indicate whether a patient has PHPT, often these blood result levels can all be within their respective normal ranges despite the patient having PHPT. Based on clinical data from the real world, in this work, we propose a novel approach to screening PHPT with neural network (NN) architectures, achieving over 97% accuracy with common blood values as inputs. Further, we propose a second model achieving over 99% accuracy with additional lab test values as inputs. Moreover, compared to traditional PHPT screening methods, our NN can reduce the false negatives of traditional screening methods by 99%.Thesis (B.?)Honors Colleg

Cardinal Scholar

Large Language Models are Built-in Autoregressive Search Engines

Author: Jiang Meng
Yu Wenhao
Zhang Zhihan
Ziems Noah
Publication venue
Publication date: 16/05/2023
Field of study

Document retrieval is a key stage of standard Web search engines. Existing dual-encoder dense retrievers obtain representations for questions and documents independently, allowing for only shallow interactions between them. To overcome this limitation, recent autoregressive search engines replace the dual-encoder architecture by directly generating identifiers for relevant documents in the candidate pool. However, the training cost of such autoregressive search engines rises sharply as the number of candidate documents increases. In this paper, we find that large language models (LLMs) can follow human instructions to directly generate URLs for document retrieval. Surprisingly, when providing a few {Query-URL} pairs as in-context demonstrations, LLMs can generate Web URLs where nearly 90\% of the corresponding documents contain correct answers to open-domain questions. In this way, LLMs can be thought of as built-in search engines, since they have not been explicitly trained to map questions to document identifiers. Experiments demonstrate that our method can consistently achieve better retrieval performance than existing retrieval approaches by a significant margin on three open-domain question answering benchmarks, under both zero and few-shot settings. The code for this work can be found at \url{https://github.com/Ziems/llm-url}.Comment: Accepted to ACL 2023 Finding

arXiv.org e-Print Archive