2,631 research outputs found
Intelligent Data Monitoring and Controlling System for Health Related Social Networks
Depression is a worldwide wellbeing concern in view of healthcare. Now a days, social media became popular to allow the affected people to share their experience in the form of posts. These kinds of experiences are stored in the database and extracted and analyzed to give the precautions to the other people or to recall the drugs from the side effects, and other service improvements in their treatment regarding to a particular disease. In such cases depression-related social websites are helpful to monitor or get knowledge in various kinds of drugs, side effects and to share the user experiences. In this paper, we proposed a social media website to allow the users to share the experiences of a particular disease i.e. depression and their experience over on it. We used a weighted network model to represent the activities in the social networks. The proposed work has three steps. The first one is to monitor the user activity and followed by network clustering and the module analysis. The persons who likes a particular post comes under a group and those who contrasted belongs to other group. The stop word technique we have implemented in this work is helpful to avoid the misleading communication over the posts and for the efficient user interaction. The statistical analysis of this kind of user interactions are helpful in health networks to gain much knowledge about a specific disease. This approach will enable all the gatherings to take a part and for the future healthcare improvements to the patients suffering from a disease
Mining social media data for biomedical signals and health-related behavior
Social media data has been increasingly used to study biomedical and
health-related phenomena. From cohort level discussions of a condition to
planetary level analyses of sentiment, social media has provided scientists
with unprecedented amounts of data to study human behavior and response
associated with a variety of health conditions and medical treatments. Here we
review recent work in mining social media for biomedical, epidemiological, and
social phenomena information relevant to the multilevel complexity of human
health. We pay particular attention to topics where social media data analysis
has shown the most progress, including pharmacovigilance, sentiment analysis
especially for mental health, and other areas. We also discuss a variety of
innovative uses of social media data for health-related applications and
important limitations in social media data access and use.Comment: To appear in the Annual Review of Biomedical Data Scienc
์ง๋ฃ ๋ด์ญ ๋ฐ์ดํฐ๋ฅผ ํ์ฉํ ๋ฅ๋ฌ๋ ๊ธฐ๋ฐ์ ๊ฑด๊ฐ๋ณดํ ๋จ์ฉ ํ์ง
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ) -- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ฐ์
๊ณตํ๊ณผ, 2020. 8. ์กฐ์ฑ์ค.As global life expectancy increases, spending on healthcare grows in accordance in order to improve quality of life. However, due to expensive price of medical care, the bare cost of healthcare services would inevitably places great financial burden to individuals and households. In this light, many countries have devised and established their own public healthcare insurance systems to help people receive medical services at a lower price. Since reimbursements are made ex-post, unethical practices arise, exploiting the post-payment structure of the insurance system. The archetypes of such behavior are overdiagnosis, the act of manipulating patients diseases, and overtreatments, prescribing unnecessary drugs for the patient. These abusive behaviors are considered as one of the main sources of financial loss incurred in the healthcare system. In order to detect and prevent abuse, the national healthcare insurance hires medical professionals to manually examine whether the claim filing is medically legitimate or not. However, the review process is, unquestionably, very costly and time-consuming. In order to address these limitations, data mining techniques have been employed to detect problematic claims or abusive providers showing an abnormal billing pattern. However, these cases only used coarsely grained information such as claim-level or provider-level data. This extracted information may lead to degradation of the model's performance.
In this thesis, we proposed abuse detection methods using the medical treatment data, which is the lowest level information of the healthcare insurance claim. Firstly, we propose a scoring model based on which abusive providers are detected and show that the review process with the proposed model is more efficient than that with the previous model which uses the provider-level variables as input variables. At the same time, we devise the evaluation metrics to quantify the efficiency of the review process. Secondly, we propose the method of detecting overtreatment under seasonality, which reflects more reality to the model. We propose a model embodying multiple structures specific to DRG codes selected as important for each given department. We show that the proposed method is more robust to the seasonality than the previous method. Thirdly, we propose an overtreatment detection model accounting for heterogeneous treatment between practitioners. We proposed a network-based approach through which the relationship between the diseases and treatments is considered during the overtreatment detection process. Experimental results show that the proposed method classify the treatment well which does not explicitly exist in the training set. From these works, we show that using treatment data allows modeling abuse detection at various levels: treatment, claim, and provider-level.์ฌ๋๋ค์ ๊ธฐ๋์๋ช
์ด ์ฆ๊ฐํจ์ ๋ฐ๋ผ ์ถ์ ์ง์ ํฅ์์ํค๊ธฐ ์ํด ๋ณด๊ฑด์๋ฃ์ ์๋นํ๋ ๊ธ์ก์ ์ฆ๊ฐํ๊ณ ์๋ค. ๊ทธ๋ฌ๋, ๋น์ผ ์๋ฃ ์๋น์ค ๋น์ฉ์ ํ์ฐ์ ์ผ๋ก ๊ฐ์ธ๊ณผ ๊ฐ์ ์๊ฒ ํฐ ์ฌ์ ์ ๋ถ๋ด์ ์ฃผ๊ฒ๋๋ค. ์ด๋ฅผ ๋ฐฉ์งํ๊ธฐ ์ํด, ๋ง์ ๊ตญ๊ฐ์์๋ ๊ณต๊ณต ์๋ฃ ๋ณดํ ์์คํ
์ ๋์
ํ์ฌ ์ฌ๋๋ค์ด ์ ์ ํ ๊ฐ๊ฒฉ์ ์๋ฃ์๋น์ค๋ฅผ ๋ฐ์ ์ ์๋๋ก ํ๊ณ ์๋ค. ์ผ๋ฐ์ ์ผ๋ก, ํ์๊ฐ ๋จผ์ ์๋น์ค๋ฅผ ๋ฐ๊ณ ๋์ ์ผ๋ถ๋ง ์ง๋ถํ๊ณ ๋๋ฉด, ๋ณดํ ํ์ฌ๊ฐ ์ฌํ์ ํด๋น ์๋ฃ ๊ธฐ๊ด์ ์์ฌ ๊ธ์ก์ ์ํ์ ํ๋ ์ ๋๋ก ์ด์๋๋ค. ๊ทธ๋ฌ๋ ์ด๋ฌํ ์ ๋๋ฅผ ์
์ฉํ์ฌ ํ์์ ์ง๋ณ์ ์กฐ์ํ๊ฑฐ๋ ๊ณผ์์ง๋ฃ๋ฅผ ํ๋ ๋ฑ์ ๋ถ๋น์ฒญ๊ตฌ๊ฐ ๋ฐ์ํ๊ธฐ๋ ํ๋ค. ์ด๋ฌํ ํ์๋ค์ ์๋ฃ ์์คํ
์์ ๋ฐ์ํ๋ ์ฃผ์ ์ฌ์ ์์ค์ ์ด์ ์ค ํ๋๋ก, ์ด๋ฅผ ๋ฐฉ์งํ๊ธฐ ์ํด, ๋ณดํํ์ฌ์์๋ ์๋ฃ ์ ๋ฌธ๊ฐ๋ฅผ ๊ณ ์ฉํ์ฌ ์ํ์ ์ ๋น์ฑ์ฌ๋ถ๋ฅผ ์ผ์ผํ ๊ฒ์ฌํ๋ค. ๊ทธ๋ฌ๋, ์ด๋ฌํ ๊ฒํ ๊ณผ์ ์ ๋งค์ฐ ๋น์ธ๊ณ ๋ง์ ์๊ฐ์ด ์์๋๋ค. ์ด๋ฌํ ๊ฒํ ๊ณผ์ ์ ํจ์จ์ ์ผ๋ก ํ๊ธฐ ์ํด, ๋ฐ์ดํฐ๋ง์ด๋ ๊ธฐ๋ฒ์ ํ์ฉํ์ฌ ๋ฌธ์ ๊ฐ ์๋ ์ฒญ๊ตฌ์๋ ์ฒญ๊ตฌ ํจํด์ด ๋น์ ์์ ์ธ ์๋ฃ ์๋น์ค ๊ณต๊ธ์๋ฅผ ํ์งํ๋ ์ฐ๊ตฌ๊ฐ ์์ด์๋ค. ๊ทธ๋ฌ๋, ์ด๋ฌํ ์ฐ๊ตฌ๋ค์ ๋ฐ์ดํฐ๋ก๋ถํฐ ์ฒญ๊ตฌ์ ๋จ์๋ ๊ณต๊ธ์ ๋จ์์ ๋ณ์๋ฅผ ์ ๋ํ์ฌ ๋ชจ๋ธ์ ํ์ตํ ์ฌ๋ก๋ค๋ก, ๊ฐ์ฅ ๋ฎ์ ๋จ์์ ๋ฐ์ดํฐ์ธ ์ง๋ฃ ๋ด์ญ ๋ฐ์ดํฐ๋ฅผ ํ์ฉํ์ง ๋ชปํ๋ค.
์ด ๋
ผ๋ฌธ์์๋ ์ฒญ๊ตฌ์์์ ๊ฐ์ฅ ๋ฎ์ ๋จ์์ ๋ฐ์ดํฐ์ธ ์ง๋ฃ ๋ด์ญ ๋ฐ์ดํฐ๋ฅผ ํ์ฉํ์ฌ ๋ถ๋น์ฒญ๊ตฌ๋ฅผ ํ์งํ๋ ๋ฐฉ๋ฒ๋ก ์ ์ ์ํ๋ค. ์ฒซ์งธ, ๋น์ ์์ ์ธ ์ฒญ๊ตฌ ํจํด์ ๊ฐ๋ ์๋ฃ ์๋น์ค ์ ๊ณต์๋ฅผ ํ์งํ๋ ๋ฐฉ๋ฒ๋ก ์ ์ ์ํ์๋ค. ์ด๋ฅผ ์ค์ ๋ฐ์ดํฐ์ ์ ์ฉํ์์ ๋, ๊ธฐ์กด์ ๊ณต๊ธ์ ๋จ์์ ๋ณ์๋ฅผ ์ฌ์ฉํ ๋ฐฉ๋ฒ๋ณด๋ค ๋ ํจ์จ์ ์ธ ์ฌ์ฌ๊ฐ ์ด๋ฃจ์ด ์ง์ ํ์ธํ์๋ค. ์ด ๋, ํจ์จ์ฑ์ ์ ๋ํํ๊ธฐ ์ํ ํ๊ฐ ์ฒ๋๋ ์ ์ํ์๋ค. ๋์งธ๋ก, ์ฒญ๊ตฌ์์ ๊ณ์ ์ฑ์ด ์กด์ฌํ๋ ์ํฉ์์ ๊ณผ์์ง๋ฃ๋ฅผ ํ์งํ๋ ๋ฐฉ๋ฒ์ ์ ์ํ์๋ค. ์ด ๋, ์ง๋ฃ ๊ณผ๋ชฉ๋จ์๋ก ๋ชจ๋ธ์ ์ด์ํ๋ ๋์ ์ง๋ณ๊ตฐ(DRG) ๋จ์๋ก ๋ชจ๋ธ์ ํ์ตํ๊ณ ํ๊ฐํ๋ ๋ฐฉ๋ฒ์ ์ ์ํ์๋ค. ๊ทธ๋ฆฌ๊ณ ์ค์ ๋ฐ์ดํฐ์ ์ ์ฉํ์์ ๋, ์ ์ํ ๋ฐฉ๋ฒ์ด ๊ธฐ์กด ๋ฐฉ๋ฒ๋ณด๋ค ๊ณ์ ์ฑ์ ๋ ๊ฐ๊ฑดํจ์ ํ์ธํ์๋ค. ์
์งธ๋ก, ๋์ผ ํ์์ ๋ํด์ ์์ฌ๊ฐ์ ์์ดํ ์ง๋ฃ ํจํด์ ๊ฐ๋ ํ๊ฒฝ์์์ ๊ณผ์์ง๋ฃ ํ์ง ๋ฐฉ๋ฒ์ ์ ์ํ์๋ค. ์ด๋ ํ์์ ์ง๋ณ๊ณผ ์ง๋ฃ๋ด์ญ๊ฐ์ ๊ด๊ณ๋ฅผ ๋คํธ์ํฌ ๊ธฐ๋ฐ์ผ๋ก ๋ชจ๋ธ๋งํ๋๊ฒ์ ๊ธฐ๋ฐ์ผ๋ก ํ๋ค. ์คํ ๊ฒฐ๊ณผ ์ ์ํ ๋ฐฉ๋ฒ์ด ํ์ต ๋ฐ์ดํฐ์์ ๋ํ๋์ง ์๋ ์ง๋ฃ ํจํด์ ๋ํด์๋ ์ ๋ถ๋ฅํจ์ ์ ์ ์์๋ค. ๊ทธ๋ฆฌ๊ณ ์ด๋ฌํ ์ฐ๊ตฌ๋ค๋ก๋ถํฐ ์ง๋ฃ ๋ด์ญ์ ํ์ฉํ์์ ๋, ์ง๋ฃ๋ด์ญ, ์ฒญ๊ตฌ์, ์๋ฃ ์๋น์ค ์ ๊ณต์ ๋ฑ ๋ค์ํ ๋ ๋ฒจ์์์ ๋ถ๋น ์ฒญ๊ตฌ๋ฅผ ํ์งํ ์ ์์์ ํ์ธํ์๋ค.Chapter 1 Introduction 1
Chapter 2 Detection of Abusive Providers by department with Neural Network 9
2.1 Background 9
2.2 Literature Review 12
2.2.1 Abnormality Detection in Healthcare Insurance with Datamining Technique 12
2.2.2 Feed-Forward Neural Network 17
2.3 Proposed Method 21
2.3.1 Calculating the Likelihood of Abuse for each Treatment with Deep Neural Network 22
2.3.2 Calculating the Abuse Score of the Provider 25
2.4 Experiments 26
2.4.1 Data Description 27
2.4.2 Experimental Settings 32
2.4.3 Evaluation Measure (1): Relative Efficiency 33
2.4.4 Evaluation Measure (2): Precision at k 37
2.5 Results 38
2.5.1 Results in the test set 38
2.5.2 The Relationship among the Claimed Amount, the Abused Amount and the Abuse Score 40
2.5.3 The Relationship between the Performance of the Treatment Scoring Model and Review Efficiency 41
2.5.4 Treatment Scoring Model Results 42
2.5.5 Post-deployment Performance 44
2.6 Summary 45
Chapter 3 Detection of overtreatment by Diagnosis-related Group with Neural Network 48
3.1 Background 48
3.2 Literature review 51
3.2.1 Seasonality in disease 51
3.2.2 Diagnosis related group 52
3.3 Proposed method 54
3.3.1 Training a deep neural network model for treatment classi fication 55
3.3.2 Comparing the Performance of DRG-based Model against the department-based Model 57
3.4 Experiments 60
3.4.1 Data Description and Preprocessing 60
3.4.2 Performance Measures 64
3.4.3 Experimental Settings 65
3.5 Results 65
3.5.1 Overtreatment Detection 65
3.5.2 Abnormal Claim Detection 67
3.6 Summary 68
Chapter 4 Detection of overtreatment with graph embedding of disease-treatment pair 70
4.1 Background 70
4.2 Literature review 72
4.2.1 Graph embedding methods 73
4.2.2 Application of graph embedding methods to biomedical data analysis 79
4.2.3 Medical concept embedding methods 87
4.3 Proposed method 88
4.3.1 Network construction 89
4.3.2 Link Prediction between the Disease and the Treatment 90
4.3.3 Overtreatment Detection 93
4.4 Experiments 96
4.4.1 Data Description 97
4.4.2 Experimental Settings 99
4.5 Results 102
4.5.1 Network Construction 102
4.5.2 Link Prediction between the Disease and the Treatment 104
4.5.3 Overtreatment Detection 105
4.6 Summary 106
Chapter 5 Conclusion 108
5.1 Contribution 108
5.2 Future Work 110
Bibliography 112
๊ตญ๋ฌธ์ด๋ก 129Docto
Drug repurposing using biological networks
Drug repositioning is a strategy to identify new uses for existing, approved, or research drugs that are outside the scope of its original medical indication. Drug repurposing is based on the fact that one drug can act on multiple targets or that two diseases can have molecular similarities, among others. Currently, thanks to the rapid advancement of high-performance technologies, a massive amount of biological and biomedical data is being generated. This allows the use of computational methods and models based on biological networks to develop new possibilities for drug repurposing. Therefore, here, we provide an in-depth review of the main applications of drug repositioning that have been carried out using biological network models. The goal of this review is to show the usefulness of these computational methods to predict associations and to find candidate drugs for repositioning in new indications of certain diseases
Artificial Intelligence for Participatory Health: Applications, Impact, and Future Implications
Objective: Artificial intelligence (AI) provides people and
professionals working in the field of participatory health informatics
an opportunity to derive robust insights from a variety of online
sources. The objective of this paper is to identify current state of the
art and application areas of AI in the context of participatory health.
Methods: A search was conducted across seven databases
(PubMed, Embase, CINAHL, PsychInfo, ACM Digital Library,
IEEExplore, and SCOPUS), covering articles published since
2013. Additionally, clinical trials involving AI in participatory
health contexts registered at clinicaltrials.gov were collected and
analyzed.
Results: Twenty-two articles and 12 trials were selected for
review. The most common application of AI in participatory health was the secondary analysis of social media data:
self-reported data including patient experiences with healthcare
facilities, reports of adverse drug reactions, safety and efficacy
concerns about over-the-counter medications, and other
perspectives on medications. Other application areas included
determining which online forum threads required moderator
assistance, identifying users who were likely to drop out from
a forum, extracting terms used in an online forum to learn its
vocabulary, highlighting contextual information that is missing
from online questions and answers, and paraphrasing technical
medical terms for consumers.
Conclusions: While AI for supporting participatory health is
still in its infancy, there are a number of important research
priorities that should be considered for the advancement of the
field. Further research evaluating the impact of AI in participatory
health informatics on the psychosocial wellbeing of individuals
would help in facilitating the wider acceptance of AI into the
healthcare ecosystem
๋ฅ ๋ด๋ด ๋คํธ์ํฌ๋ฅผ ํ์ฉํ ์ํ ๊ฐ๋ ๋ฐ ํ์ ํํ ํ์ต๊ณผ ์๋ฃ ๋ฌธ์ ์์ ์์ฉ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ ๋ณด๊ณตํ๋ถ, 2022. 8. ์ ๊ต๋ฏผ.๋ณธ ํ์ ๋
ผ๋ฌธ์ ์ ๊ตญ๋ฏผ ์๋ฃ ๋ณดํ๋ฐ์ดํฐ์ธ ํ๋ณธ์ฝํธํธDB๋ฅผ ํ์ฉํ์ฌ ๋ฅ ๋ด๋ด ๋คํธ์ํฌ ๊ธฐ๋ฐ์ ์ํ ๊ฐ๋
๋ฐ ํ์ ํํ ํ์ต ๋ฐฉ๋ฒ๊ณผ ์๋ฃ ๋ฌธ์ ํด๊ฒฐ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ๋จผ์ ์์ฐจ์ ์ธ ํ์ ์๋ฃ ๊ธฐ๋ก๊ณผ ๊ฐ์ธ ํ๋กํ์ผ ์ ๋ณด๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ์ ํํ์ ํ์ตํ๊ณ ํฅํ ์ง๋ณ ์ง๋จ ๊ฐ๋ฅ์ฑ์ ์์ธกํ๋ ์ฌ๊ท์ ๊ฒฝ๋ง ๋ชจ๋ธ์ ์ ์ํ์๋ค. ์ฐ๋ฆฌ๋ ๋ค์ํ ์ฑ๊ฒฉ์ ํ์ ์ ๋ณด๋ฅผ ํจ์จ์ ์ผ๋ก ํผํฉํ๋ ๊ตฌ์กฐ๋ฅผ ๋์
ํ์ฌ ํฐ ์ฑ๋ฅ ํฅ์์ ์ป์๋ค. ๋ํ ํ์์ ์๋ฃ ๊ธฐ๋ก์ ์ด๋ฃจ๋ ์๋ฃ ์ฝ๋๋ค์ ๋ถ์ฐ ํํ์ผ๋ก ๋ํ๋ด ์ถ๊ฐ ์ฑ๋ฅ ๊ฐ์ ์ ์ด๋ฃจ์๋ค. ์ด๋ฅผ ํตํด ์๋ฃ ์ฝ๋์ ๋ถ์ฐ ํํ์ด ์ค์ํ ์๊ฐ์ ์ ๋ณด๋ฅผ ๋ด๊ณ ์์์ ํ์ธํ์๊ณ , ์ด์ด์ง๋ ์ฐ๊ตฌ์์๋ ์ด๋ฌํ ์๊ฐ์ ์ ๋ณด๊ฐ ๊ฐํ๋ ์ ์๋๋ก ๊ทธ๋ํ ๊ตฌ์กฐ๋ฅผ ๋์
ํ์๋ค. ์ฐ๋ฆฌ๋ ์๋ฃ ์ฝ๋์ ๋ถ์ฐ ํํ ๊ฐ์ ์ ์ฌ๋์ ํต๊ณ์ ์ ๋ณด๋ฅผ ๊ฐ์ง๊ณ ๊ทธ๋ํ๋ฅผ ๊ตฌ์ถํ์๊ณ ๊ทธ๋ํ ๋ด๋ด ๋คํธ์ํฌ๋ฅผ ํ์ฉ, ์๊ฐ/ํต๊ณ์ ์ ๋ณด๊ฐ ๊ฐํ๋ ์๋ฃ ์ฝ๋์ ํํ ๋ฒกํฐ๋ฅผ ์ป์๋ค. ํ๋ํ ์๋ฃ ์ฝ๋ ๋ฒกํฐ๋ฅผ ํตํด ์ํ ์ฝ๋ฌผ์ ์ ์ฌ์ ์ธ ๋ถ์์ฉ ์ ํธ๋ฅผ ํ์งํ๋ ๋ชจ๋ธ์ ์ ์ํ ๊ฒฐ๊ณผ, ๊ธฐ์กด์ ๋ถ์์ฉ ๋ฐ์ดํฐ๋ฒ ์ด์ค์ ์กด์ฌํ์ง ์๋ ์ฌ๋ก๊น์ง๋ ์์ธกํ ์ ์์์ ๋ณด์๋ค. ๋ง์ง๋ง์ผ๋ก ๋ถ๋์ ๋นํด ์ฃผ์ ์ ๋ณด๊ฐ ํฌ์ํ๋ค๋ ์๋ฃ ๊ธฐ๋ก์ ํ๊ณ๋ฅผ ๊ทน๋ณตํ๊ธฐ ์ํด ์ง์๊ทธ๋ํ๋ฅผ ํ์ฉํ์ฌ ์ฌ์ ์ํ ์ง์์ ๋ณด๊ฐํ์๋ค. ์ด๋ ํ์์ ์๋ฃ ๊ธฐ๋ก์ ๊ตฌ์ฑํ๋ ์ง์๊ทธ๋ํ์ ๋ถ๋ถ๋ง์ ์ถ์ถํ์ฌ ๊ฐ์ธํ๋ ์ง์๊ทธ๋ํ๋ฅผ ๋ง๋ค๊ณ ๊ทธ๋ํ ๋ด๋ด ๋คํธ์ํฌ๋ฅผ ํตํด ๊ทธ๋ํ์ ํํ ๋ฒกํฐ๋ฅผ ํ๋ํ์๋ค. ์ต์ข
์ ์ผ๋ก ์์ฐจ์ ์ธ ์๋ฃ ๊ธฐ๋ก์ ํจ์ถํ ํ์ ํํ๊ณผ ๋๋ถ์ด ๊ฐ์ธํ๋ ์ํ ์ง์์ ํจ์ถํ ํํ์ ํจ๊ป ์ฌ์ฉํ์ฌ ํฅํ ์ง๋ณ ๋ฐ ์ง๋จ ์์ธก ๋ฌธ์ ์ ํ์ฉํ์๋ค.This dissertation proposes a deep neural network-based medical concept and patient representation learning methods using medical claims data to solve two healthcare tasks, i.e., clinical outcome prediction and post-marketing adverse drug reaction (ADR) signal detection. First, we propose SAF-RNN, a Recurrent Neural Network (RNN)-based model that learns a deep patient representation based on the clinical sequences and patient characteristics. Our proposed model fuses different types of patient records using feature-based gating and self-attention. We demonstrate that high-level associations between two heterogeneous records are effectively extracted by our model, thus achieving state-of-the-art performances for predicting the risk probability of cardiovascular disease. Secondly, based on the observation that the distributed medical code embeddings represent temporal proximity between the medical codes, we introduce a graph structure to enhance the code embeddings with such temporal information. We construct a graph using the distributed code embeddings and the statistical information from the claims data. We then propose the Graph Neural Network(GNN)-based representation learning for post-marketing ADR detection. Our model shows competitive performances and provides valid ADR candidates. Finally, rather than using patient records alone, we utilize a knowledge graph to augment the patient representation with prior medical knowledge. Using SAF-RNN and GNN, the deep patient representation is learned from the clinical sequences and the personalized medical knowledge. It is then used to predict clinical outcomes, i.e., next diagnosis prediction and CVD risk prediction, resulting in state-of-the-art performances.1 Introduction 1
2 Background 8
2.1 Medical Concept Embedding 8
2.2 Encoding Sequential Information in Clinical Records 11
3 Deep Patient Representation with Heterogeneous Information 14
3.1 Related Work 16
3.2 Problem Statement 19
3.3 Method 20
3.3.1 RNN-based Disease Prediction Model 20
3.3.2 Self-Attentive Fusion (SAF) Encoder 23
3.4 Dataset and Experimental Setup 24
3.4.1 Dataset 24
3.4.2 Experimental Design 26
ii 3.4.3 Implementation Details 27
3.5 Experimental Results 28
3.5.1 Evaluation of CVD Prediction 28
3.5.2 Sensitivity Analysis 28
3.5.3 Ablation Studies 31
3.6 Further Investigation 32
3.6.1 Case Study: Patient-Centered Analysis 32
3.6.2 Data-Driven CVD Risk Factors 32
3.7 Conclusion 33
4 Graph-Enhanced Medical Concept Embedding 40
4.1 Related Work 42
4.2 Problem Statement 43
4.3 Method 44
4.3.1 Code Embedding Learning with Skip-gram Model 44
4.3.2 Drug-disease Graph Construction 45
4.3.3 A GNN-based Method for Learning Graph Structure 47
4.4 Dataset and Experimental Setup 49
4.4.1 Dataset 49
4.4.2 Experimental Design 50
4.4.3 Implementation Details 52
4.5 Experimental Results 53
4.5.1 Evaluation of ADR Detection 53
4.5.2 Newly-Described ADR Candidates 54
4.6 Conclusion 55
5 Knowledge-Augmented Deep Patient Representation 57
5.1 Related Work 60
5.1.1 Incorporating Prior Medical Knowledge for Clinical Outcome Prediction 60
5.1.2 Inductive KGC based on Subgraph Learning 61
5.2 Method 61
5.2.1 Extracting Personalized KG 61
5.2.2 KA-SAF: Knowledge-Augmented Self-Attentive Fusion Encoder 64
5.2.3 KGC as a Pre-training Task 68
5.2.4 Subgraph Infomax: SGI 69
5.3 Dataset and Experimental Setup 72
5.3.1 Clinical Outcome Prediction 72
5.3.2 Next Diagnosis Prediction 72
5.4 Experimental Results 73
5.4.1 Cardiovascular Disease Prediction 73
5.4.2 Next Diagnosis Prediction 73
5.4.3 KGC on SemMed KG 73
5.5 Conclusion 74
6 Conclusion 77
Abstract (In Korean) 90
Acknowlegement 92๋ฐ
Novel Natural Language Processing Models for Medical Terms and Symptoms Detection in Twitter
This dissertation focuses on disambiguation of language use on Twitter about drug use, consumption types of drugs, drug legalization, ontology-enhanced approaches, and prediction analysis of data-driven by developing novel NLP models. Three technical aims comprise this work: (a) leveraging pattern recognition techniques to improve the quality and quantity of crawled Twitter posts related to drug abuse; (b) using an expert-curated, domain-specific DsOn ontology model that improve knowledge extraction in the form of drug-to-symptom and drug-to-side effect relations; and (c) modeling the prediction of public perception of the drugโs legalization and the sentiment analysis of drug consumption on Twitter. We collected 7.5 million data from August 2015 to March 2016. This work leveraged a longstanding, multidisciplinary collaboration between researchers at the Population & Center for Interventions, Treatment, and Addictions Research (CITAR) in the Boonshoft School of Medicine and the Department of Computer Science and Engineering. In addition, we aimed to develop and deploy an innovative prediction analysis algorithm for eDrugTrends, capable of semi-automated processing of Twitter data to identify emerging trends in cannabis and synthetic cannabinoid use in the U.S. In addition, the study included aim four, a use case study defined by tweets content analyzing PLWH, medication patterns, and identifying keyword trends via Twitter-based, user-generated content. This case study leveraged a multidisciplinary collaboration between researchers at the Departments of Family Medicine and Population and Public Health Sciences at Wright State Universityโs Boonshoft School of Medicine and the Department of Computer Science and Engineering. We collected 65K data from February 2022 to July 2022 with the U.S.-based HIV knowledge domain recruited via the Twitter API streaming platform. For knowledge discovery, domain knowledge plays a significant role in powering many intelligent frameworks, such as data analysis, information retrieval, and pattern recognition. Recent NLP and semantic web advances have contributed to extending the domain knowledge of medical terms. These techniques required a bag of seeds for medical knowledge discovery. Various initiate seeds create irrelevant data to the noise and negatively impact the prediction analysis performance. The methodology of aim one, PatRDis classifier, applied for noisy and ambiguous issues, and aim two, DsOn Ontology model, applied for semantic parsing and enriching the online medical to classify the data for HIV care medications engagement and symptom detection from Twitter. By applying the methodology of aims 2 and 3, we solved the challenges of ambiguity and explored more than 1500 cannabis and cannabinoid slang terms. Sentiments measured preceding the election, such as states with high levels of positive sentiment preceding the election who were engaged in enhancing their legalization status. we also used the same dataset for prediction analysis for marijuana legalization and consumption trend analysis (Ohio public polling data). In Aim 4, we applied three experiments, ensemble-learning, the RNN-LSM, the NNBERT-CNN models, and five techniques to determine the tweets associated with medication adherence and HIV symptoms. The long short-term memory (LSTM) model and the CNN for sentence classification produce accurate results and have been recently used in NLP tasks. CNN models use convolutional layers and maximum pooling or max-overtime pooling layers to extract higher-level features, while LSTM models can capture long-term dependencies between word sequences hence are better used for text classification. We propose attention-based RNN, MLP, and CNN deep learning models that capitalize on the advantages of LSTM and BERT techniques with an additional attention mechanism. We trained the model using NNBERT to evaluate the proposed model\u27s performance. The test results showed that the proposed models produce more accurate classification results, and BERT obtained higher recall and F1 scores than MLP or LSTM models. In addition, We developed an intelligent tool capable of automated processing of Twitter data to identify emerging trends in HIV disease, HIV symptoms, and medication adherence
Recommended from our members
Combining Heterogeneous Databases to Detect Adverse Drug Reactions
Adverse drug reactions (ADRs) cause a global and substantial burden accounting for considerable mortality, morbidity and extra costs. In the United States, over 770,000 ADR related injures or deaths occur each year in hospitals, which may cost up to $5.6 million each year per hospital. Unanticipated ADRs may occur after a drug has been approved due to its use or prolonged use on large, diverse populations. Therefore, the post-marketing surveillance of drugs is essential for generating more complete drug safety profiles and for providing a decision making tool to help governmental drug administration agencies take an action on the marketed drugs. Analysis of spontaneous reports of suspected ADRs has traditionally served as a valuable tool in pharmacovigilance. However, because of well-known limitations of spontaneous reports, observational healthcare data, such as electronic health records (EHRs) and administrative claims data, are starting to be used to complement the spontaneous reporting system. Synthesizing ADR evidence from multiple data sources has been conducted by human experts on an at hoc basis. However, the amount of data from both spontaneous reporting systems (SRSs) and observational healthcare databases is growing exponentially. The revolution in the ability of machines to access, process, and mine databases, making it advantageous to develop an automatic system to obtain integrated evidence by combining them.
Towards this goal, this dissertation proposes a framework consisting of three components that generates signal scores based on data an EHR system and of an SRS system, and then integrates two signal scores into a composite one. The first component is a data-driven and regression- based method that aims to alleviate confounding effect and detect ADR based on EHRs. The results demonstrate that this component achieves comparable or slightly higher accuracy than those trained with experts and existing automatic methods. The second component is also a data- driven and regression-based method that aims to reduce the effect of confounding by co- medication and confounding by indication using primary suspected, secondary suspected, concomitant medications and indications on the basis of a SRS. This study demonstrates that it could accomplish comparable or slightly better accuracy than the cutting edge algorithm Gamma Poisson Shrinkage (GPS), which uses primary suspected medications only. The third component is a computational integration method that normalizes signal scores from each data source and integrates them into a composite signal score. The results achieved by the method demonstrate that the combined ADR evidence achieve better accuracy of drug-ADR detection than individual systems based on either an SRS or an EHR. Furthermore, component three is explored as a tool to assist clinical assessors in pharmacovigilance practice.
The research presented in this dissertation has produced several novel insights and provided new solutions towards the challenging problem of pharmacovigilance. The method of reducing confounding effect can be generalizable to other EHR systems and the method for integrating ADR evidence can be generalizable to include other data sources. In conclusion, this dissertation develops a method to reduce confounding effect in both EHRs and SRSs, and a combined system to synthesize evidence, which could potentially unveil drug safety profiles and novel adverse events in a timely fashion
- โฆ