2,663 research outputs found

    Mining large-scale human mobility data for long-term crime prediction

    Full text link
    Traditional crime prediction models based on census data are limited, as they fail to capture the complexity and dynamics of human activity. With the rise of ubiquitous computing, there is the opportunity to improve such models with data that make for better proxies of human presence in cities. In this paper, we leverage large human mobility data to craft an extensive set of features for crime prediction, as informed by theories in criminology and urban studies. We employ averaging and boosting ensemble techniques from machine learning, to investigate their power in predicting yearly counts for different types of crimes occurring in New York City at census tract level. Our study shows that spatial and spatio-temporal features derived from Foursquare venues and checkins, subway rides, and taxi rides, improve the baseline models relying on census and POI data. The proposed models achieve absolute R^2 metrics of up to 65% (on a geographical out-of-sample test set) and up to 89% (on a temporal out-of-sample test set). This proves that, next to the residential population of an area, the ambient population there is strongly predictive of the area's crime levels. We deep-dive into the main crime categories, and find that the predictive gain of the human dynamics features varies across crime types: such features bring the biggest boost in case of grand larcenies, whereas assaults are already well predicted by the census features. Furthermore, we identify and discuss top predictive features for the main crime categories. These results offer valuable insights for those responsible for urban policy or law enforcement

    Advances in Cybercrime Prediction: A Survey of Machine, Deep, Transfer, and Adaptive Learning Techniques

    Full text link
    Cybercrime is a growing threat to organizations and individuals worldwide, with criminals using increasingly sophisticated techniques to breach security systems and steal sensitive data. In recent years, machine learning, deep learning, and transfer learning techniques have emerged as promising tools for predicting cybercrime and preventing it before it occurs. This paper aims to provide a comprehensive survey of the latest advancements in cybercrime prediction using above mentioned techniques, highlighting the latest research related to each approach. For this purpose, we reviewed more than 150 research articles and discussed around 50 most recent and relevant research articles. We start the review by discussing some common methods used by cyber criminals and then focus on the latest machine learning techniques and deep learning techniques, such as recurrent and convolutional neural networks, which were effective in detecting anomalous behavior and identifying potential threats. We also discuss transfer learning, which allows models trained on one dataset to be adapted for use on another dataset, and then focus on active and reinforcement Learning as part of early-stage algorithmic research in cybercrime prediction. Finally, we discuss critical innovations, research gaps, and future research opportunities in Cybercrime prediction. Overall, this paper presents a holistic view of cutting-edge developments in cybercrime prediction, shedding light on the strengths and limitations of each method and equipping researchers and practitioners with essential insights, publicly available datasets, and resources necessary to develop efficient cybercrime prediction systems.Comment: 27 Pages, 6 Figures, 4 Table

    Ensemble deep learning: A review

    Get PDF
    Ensemble learning combines several individual models to obtain better generalization performance. Currently, deep learning models with multilayer processing architecture is showing better performance as compared to the shallow or traditional classification models. Deep ensemble learning models combine the advantages of both the deep learning models as well as the ensemble learning such that the final model has better generalization performance. This paper reviews the state-of-art deep ensemble models and hence serves as an extensive summary for the researchers. The ensemble models are broadly categorised into ensemble models like bagging, boosting and stacking, negative correlation based deep ensemble models, explicit/implicit ensembles, homogeneous /heterogeneous ensemble, decision fusion strategies, unsupervised, semi-supervised, reinforcement learning and online/incremental, multilabel based deep ensemble models. Application of deep ensemble models in different domains is also briefly discussed. Finally, we conclude this paper with some future recommendations and research directions

    ์ง„๋ฃŒ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ๊ฑด๊ฐ•๋ณดํ—˜ ๋‚จ์šฉ ํƒ์ง€

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2020. 8. ์กฐ์„ฑ์ค€.As global life expectancy increases, spending on healthcare grows in accordance in order to improve quality of life. However, due to expensive price of medical care, the bare cost of healthcare services would inevitably places great financial burden to individuals and households. In this light, many countries have devised and established their own public healthcare insurance systems to help people receive medical services at a lower price. Since reimbursements are made ex-post, unethical practices arise, exploiting the post-payment structure of the insurance system. The archetypes of such behavior are overdiagnosis, the act of manipulating patients diseases, and overtreatments, prescribing unnecessary drugs for the patient. These abusive behaviors are considered as one of the main sources of financial loss incurred in the healthcare system. In order to detect and prevent abuse, the national healthcare insurance hires medical professionals to manually examine whether the claim filing is medically legitimate or not. However, the review process is, unquestionably, very costly and time-consuming. In order to address these limitations, data mining techniques have been employed to detect problematic claims or abusive providers showing an abnormal billing pattern. However, these cases only used coarsely grained information such as claim-level or provider-level data. This extracted information may lead to degradation of the model's performance. In this thesis, we proposed abuse detection methods using the medical treatment data, which is the lowest level information of the healthcare insurance claim. Firstly, we propose a scoring model based on which abusive providers are detected and show that the review process with the proposed model is more efficient than that with the previous model which uses the provider-level variables as input variables. At the same time, we devise the evaluation metrics to quantify the efficiency of the review process. Secondly, we propose the method of detecting overtreatment under seasonality, which reflects more reality to the model. We propose a model embodying multiple structures specific to DRG codes selected as important for each given department. We show that the proposed method is more robust to the seasonality than the previous method. Thirdly, we propose an overtreatment detection model accounting for heterogeneous treatment between practitioners. We proposed a network-based approach through which the relationship between the diseases and treatments is considered during the overtreatment detection process. Experimental results show that the proposed method classify the treatment well which does not explicitly exist in the training set. From these works, we show that using treatment data allows modeling abuse detection at various levels: treatment, claim, and provider-level.์‚ฌ๋žŒ๋“ค์˜ ๊ธฐ๋Œ€์ˆ˜๋ช…์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์‚ถ์˜ ์งˆ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋ณด๊ฑด์˜๋ฃŒ์— ์†Œ๋น„ํ•˜๋Š” ๊ธˆ์•ก์€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ๋น„์‹ผ ์˜๋ฃŒ ์„œ๋น„์Šค ๋น„์šฉ์€ ํ•„์—ฐ์ ์œผ๋กœ ๊ฐœ์ธ๊ณผ ๊ฐ€์ •์—๊ฒŒ ํฐ ์žฌ์ •์  ๋ถ€๋‹ด์„ ์ฃผ๊ฒŒ๋œ๋‹ค. ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋งŽ์€ ๊ตญ๊ฐ€์—์„œ๋Š” ๊ณต๊ณต ์˜๋ฃŒ ๋ณดํ—˜ ์‹œ์Šคํ…œ์„ ๋„์ž…ํ•˜์—ฌ ์‚ฌ๋žŒ๋“ค์ด ์ ์ ˆํ•œ ๊ฐ€๊ฒฉ์— ์˜๋ฃŒ์„œ๋น„์Šค๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ณ  ์žˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ, ํ™˜์ž๊ฐ€ ๋จผ์ € ์„œ๋น„์Šค๋ฅผ ๋ฐ›๊ณ  ๋‚˜์„œ ์ผ๋ถ€๋งŒ ์ง€๋ถˆํ•˜๊ณ  ๋‚˜๋ฉด, ๋ณดํ—˜ ํšŒ์‚ฌ๊ฐ€ ์‚ฌํ›„์— ํ•ด๋‹น ์˜๋ฃŒ ๊ธฐ๊ด€์— ์ž”์—ฌ ๊ธˆ์•ก์„ ์ƒํ™˜์„ ํ•˜๋Š” ์ œ๋„๋กœ ์šด์˜๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ์ œ๋„๋ฅผ ์•…์šฉํ•˜์—ฌ ํ™˜์ž์˜ ์งˆ๋ณ‘์„ ์กฐ์ž‘ํ•˜๊ฑฐ๋‚˜ ๊ณผ์ž‰์ง„๋ฃŒ๋ฅผ ํ•˜๋Š” ๋“ฑ์˜ ๋ถ€๋‹น์ฒญ๊ตฌ๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ๋„ ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ํ–‰์œ„๋“ค์€ ์˜๋ฃŒ ์‹œ์Šคํ…œ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ฃผ์š” ์žฌ์ • ์†์‹ค์˜ ์ด์œ  ์ค‘ ํ•˜๋‚˜๋กœ, ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋ณดํ—˜ํšŒ์‚ฌ์—์„œ๋Š” ์˜๋ฃŒ ์ „๋ฌธ๊ฐ€๋ฅผ ๊ณ ์šฉํ•˜์—ฌ ์˜ํ•™์  ์ •๋‹น์„ฑ์—ฌ๋ถ€๋ฅผ ์ผ์ผํžˆ ๊ฒ€์‚ฌํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋Ÿฌํ•œ ๊ฒ€ํ† ๊ณผ์ •์€ ๋งค์šฐ ๋น„์‹ธ๊ณ  ๋งŽ์€ ์‹œ๊ฐ„์ด ์†Œ์š”๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒ€ํ† ๊ณผ์ •์„ ํšจ์œจ์ ์œผ๋กœ ํ•˜๊ธฐ ์œ„ํ•ด, ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹ ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š” ์ฒญ๊ตฌ์„œ๋‚˜ ์ฒญ๊ตฌ ํŒจํ„ด์ด ๋น„์ •์ƒ์ ์ธ ์˜๋ฃŒ ์„œ๋น„์Šค ๊ณต๊ธ‰์ž๋ฅผ ํƒ์ง€ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ์žˆ์–ด์™”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค์€ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ฒญ๊ตฌ์„œ ๋‹จ์œ„๋‚˜ ๊ณต๊ธ‰์ž ๋‹จ์œ„์˜ ๋ณ€์ˆ˜๋ฅผ ์œ ๋„ํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•œ ์‚ฌ๋ก€๋“ค๋กœ, ๊ฐ€์žฅ ๋‚ฎ์€ ๋‹จ์œ„์˜ ๋ฐ์ดํ„ฐ์ธ ์ง„๋ฃŒ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์ง€ ๋ชปํ–ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ฒญ๊ตฌ์„œ์—์„œ ๊ฐ€์žฅ ๋‚ฎ์€ ๋‹จ์œ„์˜ ๋ฐ์ดํ„ฐ์ธ ์ง„๋ฃŒ ๋‚ด์—ญ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ถ€๋‹น์ฒญ๊ตฌ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ์งธ, ๋น„์ •์ƒ์ ์ธ ์ฒญ๊ตฌ ํŒจํ„ด์„ ๊ฐ–๋Š” ์˜๋ฃŒ ์„œ๋น„์Šค ์ œ๊ณต์ž๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋ฅผ ์‹ค์ œ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜์˜€์„ ๋•Œ, ๊ธฐ์กด์˜ ๊ณต๊ธ‰์ž ๋‹จ์œ„์˜ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ํšจ์œจ์ ์ธ ์‹ฌ์‚ฌ๊ฐ€ ์ด๋ฃจ์–ด ์ง์„ ํ™•์ธํ•˜์˜€๋‹ค. ์ด ๋•Œ, ํšจ์œจ์„ฑ์„ ์ •๋Ÿ‰ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ํ‰๊ฐ€ ์ฒ™๋„๋„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋‘˜์งธ๋กœ, ์ฒญ๊ตฌ์„œ์˜ ๊ณ„์ ˆ์„ฑ์ด ์กด์žฌํ•˜๋Š” ์ƒํ™ฉ์—์„œ ๊ณผ์ž‰์ง„๋ฃŒ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ๋•Œ, ์ง„๋ฃŒ ๊ณผ๋ชฉ๋‹จ์œ„๋กœ ๋ชจ๋ธ์„ ์šด์˜ํ•˜๋Š” ๋Œ€์‹  ์งˆ๋ณ‘๊ตฐ(DRG) ๋‹จ์œ„๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ํ‰๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜์˜€์„ ๋•Œ, ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค ๊ณ„์ ˆ์„ฑ์— ๋” ๊ฐ•๊ฑดํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์…‹์งธ๋กœ, ๋™์ผ ํ™˜์ž์— ๋Œ€ํ•ด์„œ ์˜์‚ฌ๊ฐ„์˜ ์ƒ์ดํ•œ ์ง„๋ฃŒ ํŒจํ„ด์„ ๊ฐ–๋Š” ํ™˜๊ฒฝ์—์„œ์˜ ๊ณผ์ž‰์ง„๋ฃŒ ํƒ์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋Š” ํ™˜์ž์˜ ์งˆ๋ณ‘๊ณผ ์ง„๋ฃŒ๋‚ด์—ญ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๋Š”๊ฒƒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ๋‚˜ํƒ€๋‚˜์ง€ ์•Š๋Š” ์ง„๋ฃŒ ํŒจํ„ด์— ๋Œ€ํ•ด์„œ๋„ ์ž˜ ๋ถ„๋ฅ˜ํ•จ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค๋กœ๋ถ€ํ„ฐ ์ง„๋ฃŒ ๋‚ด์—ญ์„ ํ™œ์šฉํ•˜์˜€์„ ๋•Œ, ์ง„๋ฃŒ๋‚ด์—ญ, ์ฒญ๊ตฌ์„œ, ์˜๋ฃŒ ์„œ๋น„์Šค ์ œ๊ณต์ž ๋“ฑ ๋‹ค์–‘ํ•œ ๋ ˆ๋ฒจ์—์„œ์˜ ๋ถ€๋‹น ์ฒญ๊ตฌ๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค.Chapter 1 Introduction 1 Chapter 2 Detection of Abusive Providers by department with Neural Network 9 2.1 Background 9 2.2 Literature Review 12 2.2.1 Abnormality Detection in Healthcare Insurance with Datamining Technique 12 2.2.2 Feed-Forward Neural Network 17 2.3 Proposed Method 21 2.3.1 Calculating the Likelihood of Abuse for each Treatment with Deep Neural Network 22 2.3.2 Calculating the Abuse Score of the Provider 25 2.4 Experiments 26 2.4.1 Data Description 27 2.4.2 Experimental Settings 32 2.4.3 Evaluation Measure (1): Relative Efficiency 33 2.4.4 Evaluation Measure (2): Precision at k 37 2.5 Results 38 2.5.1 Results in the test set 38 2.5.2 The Relationship among the Claimed Amount, the Abused Amount and the Abuse Score 40 2.5.3 The Relationship between the Performance of the Treatment Scoring Model and Review Efficiency 41 2.5.4 Treatment Scoring Model Results 42 2.5.5 Post-deployment Performance 44 2.6 Summary 45 Chapter 3 Detection of overtreatment by Diagnosis-related Group with Neural Network 48 3.1 Background 48 3.2 Literature review 51 3.2.1 Seasonality in disease 51 3.2.2 Diagnosis related group 52 3.3 Proposed method 54 3.3.1 Training a deep neural network model for treatment classi fication 55 3.3.2 Comparing the Performance of DRG-based Model against the department-based Model 57 3.4 Experiments 60 3.4.1 Data Description and Preprocessing 60 3.4.2 Performance Measures 64 3.4.3 Experimental Settings 65 3.5 Results 65 3.5.1 Overtreatment Detection 65 3.5.2 Abnormal Claim Detection 67 3.6 Summary 68 Chapter 4 Detection of overtreatment with graph embedding of disease-treatment pair 70 4.1 Background 70 4.2 Literature review 72 4.2.1 Graph embedding methods 73 4.2.2 Application of graph embedding methods to biomedical data analysis 79 4.2.3 Medical concept embedding methods 87 4.3 Proposed method 88 4.3.1 Network construction 89 4.3.2 Link Prediction between the Disease and the Treatment 90 4.3.3 Overtreatment Detection 93 4.4 Experiments 96 4.4.1 Data Description 97 4.4.2 Experimental Settings 99 4.5 Results 102 4.5.1 Network Construction 102 4.5.2 Link Prediction between the Disease and the Treatment 104 4.5.3 Overtreatment Detection 105 4.6 Summary 106 Chapter 5 Conclusion 108 5.1 Contribution 108 5.2 Future Work 110 Bibliography 112 ๊ตญ๋ฌธ์ดˆ๋ก 129Docto
    • โ€ฆ
    corecore