27 research outputs found

    ์ง„๋ฃŒ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ๊ฑด๊ฐ•๋ณดํ—˜ ๋‚จ์šฉ ํƒ์ง€

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2020. 8. ์กฐ์„ฑ์ค€.As global life expectancy increases, spending on healthcare grows in accordance in order to improve quality of life. However, due to expensive price of medical care, the bare cost of healthcare services would inevitably places great financial burden to individuals and households. In this light, many countries have devised and established their own public healthcare insurance systems to help people receive medical services at a lower price. Since reimbursements are made ex-post, unethical practices arise, exploiting the post-payment structure of the insurance system. The archetypes of such behavior are overdiagnosis, the act of manipulating patients diseases, and overtreatments, prescribing unnecessary drugs for the patient. These abusive behaviors are considered as one of the main sources of financial loss incurred in the healthcare system. In order to detect and prevent abuse, the national healthcare insurance hires medical professionals to manually examine whether the claim filing is medically legitimate or not. However, the review process is, unquestionably, very costly and time-consuming. In order to address these limitations, data mining techniques have been employed to detect problematic claims or abusive providers showing an abnormal billing pattern. However, these cases only used coarsely grained information such as claim-level or provider-level data. This extracted information may lead to degradation of the model's performance. In this thesis, we proposed abuse detection methods using the medical treatment data, which is the lowest level information of the healthcare insurance claim. Firstly, we propose a scoring model based on which abusive providers are detected and show that the review process with the proposed model is more efficient than that with the previous model which uses the provider-level variables as input variables. At the same time, we devise the evaluation metrics to quantify the efficiency of the review process. Secondly, we propose the method of detecting overtreatment under seasonality, which reflects more reality to the model. We propose a model embodying multiple structures specific to DRG codes selected as important for each given department. We show that the proposed method is more robust to the seasonality than the previous method. Thirdly, we propose an overtreatment detection model accounting for heterogeneous treatment between practitioners. We proposed a network-based approach through which the relationship between the diseases and treatments is considered during the overtreatment detection process. Experimental results show that the proposed method classify the treatment well which does not explicitly exist in the training set. From these works, we show that using treatment data allows modeling abuse detection at various levels: treatment, claim, and provider-level.์‚ฌ๋žŒ๋“ค์˜ ๊ธฐ๋Œ€์ˆ˜๋ช…์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์‚ถ์˜ ์งˆ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋ณด๊ฑด์˜๋ฃŒ์— ์†Œ๋น„ํ•˜๋Š” ๊ธˆ์•ก์€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ๋น„์‹ผ ์˜๋ฃŒ ์„œ๋น„์Šค ๋น„์šฉ์€ ํ•„์—ฐ์ ์œผ๋กœ ๊ฐœ์ธ๊ณผ ๊ฐ€์ •์—๊ฒŒ ํฐ ์žฌ์ •์  ๋ถ€๋‹ด์„ ์ฃผ๊ฒŒ๋œ๋‹ค. ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋งŽ์€ ๊ตญ๊ฐ€์—์„œ๋Š” ๊ณต๊ณต ์˜๋ฃŒ ๋ณดํ—˜ ์‹œ์Šคํ…œ์„ ๋„์ž…ํ•˜์—ฌ ์‚ฌ๋žŒ๋“ค์ด ์ ์ ˆํ•œ ๊ฐ€๊ฒฉ์— ์˜๋ฃŒ์„œ๋น„์Šค๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ณ  ์žˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ, ํ™˜์ž๊ฐ€ ๋จผ์ € ์„œ๋น„์Šค๋ฅผ ๋ฐ›๊ณ  ๋‚˜์„œ ์ผ๋ถ€๋งŒ ์ง€๋ถˆํ•˜๊ณ  ๋‚˜๋ฉด, ๋ณดํ—˜ ํšŒ์‚ฌ๊ฐ€ ์‚ฌํ›„์— ํ•ด๋‹น ์˜๋ฃŒ ๊ธฐ๊ด€์— ์ž”์—ฌ ๊ธˆ์•ก์„ ์ƒํ™˜์„ ํ•˜๋Š” ์ œ๋„๋กœ ์šด์˜๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ์ œ๋„๋ฅผ ์•…์šฉํ•˜์—ฌ ํ™˜์ž์˜ ์งˆ๋ณ‘์„ ์กฐ์ž‘ํ•˜๊ฑฐ๋‚˜ ๊ณผ์ž‰์ง„๋ฃŒ๋ฅผ ํ•˜๋Š” ๋“ฑ์˜ ๋ถ€๋‹น์ฒญ๊ตฌ๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ๋„ ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ํ–‰์œ„๋“ค์€ ์˜๋ฃŒ ์‹œ์Šคํ…œ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ฃผ์š” ์žฌ์ • ์†์‹ค์˜ ์ด์œ  ์ค‘ ํ•˜๋‚˜๋กœ, ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋ณดํ—˜ํšŒ์‚ฌ์—์„œ๋Š” ์˜๋ฃŒ ์ „๋ฌธ๊ฐ€๋ฅผ ๊ณ ์šฉํ•˜์—ฌ ์˜ํ•™์  ์ •๋‹น์„ฑ์—ฌ๋ถ€๋ฅผ ์ผ์ผํžˆ ๊ฒ€์‚ฌํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋Ÿฌํ•œ ๊ฒ€ํ† ๊ณผ์ •์€ ๋งค์šฐ ๋น„์‹ธ๊ณ  ๋งŽ์€ ์‹œ๊ฐ„์ด ์†Œ์š”๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒ€ํ† ๊ณผ์ •์„ ํšจ์œจ์ ์œผ๋กœ ํ•˜๊ธฐ ์œ„ํ•ด, ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹ ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š” ์ฒญ๊ตฌ์„œ๋‚˜ ์ฒญ๊ตฌ ํŒจํ„ด์ด ๋น„์ •์ƒ์ ์ธ ์˜๋ฃŒ ์„œ๋น„์Šค ๊ณต๊ธ‰์ž๋ฅผ ํƒ์ง€ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ์žˆ์–ด์™”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค์€ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ฒญ๊ตฌ์„œ ๋‹จ์œ„๋‚˜ ๊ณต๊ธ‰์ž ๋‹จ์œ„์˜ ๋ณ€์ˆ˜๋ฅผ ์œ ๋„ํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•œ ์‚ฌ๋ก€๋“ค๋กœ, ๊ฐ€์žฅ ๋‚ฎ์€ ๋‹จ์œ„์˜ ๋ฐ์ดํ„ฐ์ธ ์ง„๋ฃŒ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์ง€ ๋ชปํ–ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ฒญ๊ตฌ์„œ์—์„œ ๊ฐ€์žฅ ๋‚ฎ์€ ๋‹จ์œ„์˜ ๋ฐ์ดํ„ฐ์ธ ์ง„๋ฃŒ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ถ€๋‹น์ฒญ๊ตฌ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ์งธ, ๋น„์ •์ƒ์ ์ธ ์ฒญ๊ตฌ ํŒจํ„ด์„ ๊ฐ–๋Š” ์˜๋ฃŒ ์„œ๋น„์Šค ์ œ๊ณต์ž๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋ฅผ ์‹ค์ œ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜์˜€์„ ๋•Œ, ๊ธฐ์กด์˜ ๊ณต๊ธ‰์ž ๋‹จ์œ„์˜ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ํšจ์œจ์ ์ธ ์‹ฌ์‚ฌ๊ฐ€ ์ด๋ฃจ์–ด ์ง์„ ํ™•์ธํ•˜์˜€๋‹ค. ์ด ๋•Œ, ํšจ์œจ์„ฑ์„ ์ •๋Ÿ‰ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ํ‰๊ฐ€ ์ฒ™๋„๋„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋‘˜์งธ๋กœ, ์ฒญ๊ตฌ์„œ์˜ ๊ณ„์ ˆ์„ฑ์ด ์กด์žฌํ•˜๋Š” ์ƒํ™ฉ์—์„œ ๊ณผ์ž‰์ง„๋ฃŒ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ๋•Œ, ์ง„๋ฃŒ ๊ณผ๋ชฉ๋‹จ์œ„๋กœ ๋ชจ๋ธ์„ ์šด์˜ํ•˜๋Š” ๋Œ€์‹  ์งˆ๋ณ‘๊ตฐ(DRG) ๋‹จ์œ„๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ํ‰๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜์˜€์„ ๋•Œ, ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค ๊ณ„์ ˆ์„ฑ์— ๋” ๊ฐ•๊ฑดํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์…‹์งธ๋กœ, ๋™์ผ ํ™˜์ž์— ๋Œ€ํ•ด์„œ ์˜์‚ฌ๊ฐ„์˜ ์ƒ์ดํ•œ ์ง„๋ฃŒ ํŒจํ„ด์„ ๊ฐ–๋Š” ํ™˜๊ฒฝ์—์„œ์˜ ๊ณผ์ž‰์ง„๋ฃŒ ํƒ์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋Š” ํ™˜์ž์˜ ์งˆ๋ณ‘๊ณผ ์ง„๋ฃŒ๋‚ด์—ญ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๋Š”๊ฒƒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ๋‚˜ํƒ€๋‚˜์ง€ ์•Š๋Š” ์ง„๋ฃŒ ํŒจํ„ด์— ๋Œ€ํ•ด์„œ๋„ ์ž˜ ๋ถ„๋ฅ˜ํ•จ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค๋กœ๋ถ€ํ„ฐ ์ง„๋ฃŒ ๋‚ด์—ญ์„ ํ™œ์šฉํ•˜์˜€์„ ๋•Œ, ์ง„๋ฃŒ๋‚ด์—ญ, ์ฒญ๊ตฌ์„œ, ์˜๋ฃŒ ์„œ๋น„์Šค ์ œ๊ณต์ž ๋“ฑ ๋‹ค์–‘ํ•œ ๋ ˆ๋ฒจ์—์„œ์˜ ๋ถ€๋‹น ์ฒญ๊ตฌ๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค.Chapter 1 Introduction 1 Chapter 2 Detection of Abusive Providers by department with Neural Network 9 2.1 Background 9 2.2 Literature Review 12 2.2.1 Abnormality Detection in Healthcare Insurance with Datamining Technique 12 2.2.2 Feed-Forward Neural Network 17 2.3 Proposed Method 21 2.3.1 Calculating the Likelihood of Abuse for each Treatment with Deep Neural Network 22 2.3.2 Calculating the Abuse Score of the Provider 25 2.4 Experiments 26 2.4.1 Data Description 27 2.4.2 Experimental Settings 32 2.4.3 Evaluation Measure (1): Relative Efficiency 33 2.4.4 Evaluation Measure (2): Precision at k 37 2.5 Results 38 2.5.1 Results in the test set 38 2.5.2 The Relationship among the Claimed Amount, the Abused Amount and the Abuse Score 40 2.5.3 The Relationship between the Performance of the Treatment Scoring Model and Review Efficiency 41 2.5.4 Treatment Scoring Model Results 42 2.5.5 Post-deployment Performance 44 2.6 Summary 45 Chapter 3 Detection of overtreatment by Diagnosis-related Group with Neural Network 48 3.1 Background 48 3.2 Literature review 51 3.2.1 Seasonality in disease 51 3.2.2 Diagnosis related group 52 3.3 Proposed method 54 3.3.1 Training a deep neural network model for treatment classi fication 55 3.3.2 Comparing the Performance of DRG-based Model against the department-based Model 57 3.4 Experiments 60 3.4.1 Data Description and Preprocessing 60 3.4.2 Performance Measures 64 3.4.3 Experimental Settings 65 3.5 Results 65 3.5.1 Overtreatment Detection 65 3.5.2 Abnormal Claim Detection 67 3.6 Summary 68 Chapter 4 Detection of overtreatment with graph embedding of disease-treatment pair 70 4.1 Background 70 4.2 Literature review 72 4.2.1 Graph embedding methods 73 4.2.2 Application of graph embedding methods to biomedical data analysis 79 4.2.3 Medical concept embedding methods 87 4.3 Proposed method 88 4.3.1 Network construction 89 4.3.2 Link Prediction between the Disease and the Treatment 90 4.3.3 Overtreatment Detection 93 4.4 Experiments 96 4.4.1 Data Description 97 4.4.2 Experimental Settings 99 4.5 Results 102 4.5.1 Network Construction 102 4.5.2 Link Prediction between the Disease and the Treatment 104 4.5.3 Overtreatment Detection 105 4.6 Summary 106 Chapter 5 Conclusion 108 5.1 Contribution 108 5.2 Future Work 110 Bibliography 112 ๊ตญ๋ฌธ์ดˆ๋ก 129Docto

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

    Machine learning for large and small data biomedical discovery

    Get PDF
    In modern biomedicine, the role of computation becomes more crucial in light of the ever-increasing growth of biological data, which requires effective computational methods to integrate them in a meaningful way and unveil previously undiscovered biological insights. In this dissertation, we introduce a series of machine learning algorithms for biomedical discovery. Focused on protein functions in the context of system biology, these machine learning algorithms learn representations of protein sequences, structures, and networks in both the small- and large-data scenarios. First, we present a deep learning model that learns evolutionary contexts integrated representations of protein sequence and assists to discover protein variants with enhanced functions in protein engineering. Second, we describe a geometric deep learning model that learns representations of protein and compound structures to inform the prediction of protein-compound binding affinity. Third, we introduce a machine learning algorithm to integrate heterogeneous networks by learning compact network representations and to achieve drug repurposing by predicting novel drug-target interaction. We also present new scientific discoveries enabled by these machine learning algorithms. Taken together, this dissertation demonstrates the potential of machine learning to address the small- and large-data challenges of biomedical data and transform data into actionable insights and new discoveries

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

    Full text link

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Pacific Symposium on Biocomputing 2023

    Get PDF
    The Pacific Symposium on Biocomputing (PSB) 2023 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2023 will be held on January 3-7, 2023 in Kohala Coast, Hawaii. Tutorials and workshops will be offered prior to the start of the conference.PSB 2023 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. It is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology.The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders of research in biocomputing's 'hot topics.' In this way, the meeting provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field
    corecore