328 research outputs found

    Outlier Detection in Inpatient Claims Using DBSCAN and K-Means

    Get PDF
    Health insurance helps people to obtain quality and affordable health services. The claim billing process is manually input code to the system, this can lack of errors and be suspected of being fraudulent. Claims suspected of fraud are traced manually to find incorrect inputs. The increasing volume of claims causes a decrease in the accuracy of tracing claims suspected of fraud and consumes time and energy. As an effort to prevent and reduce the occurrence of fraud, this study aims to determine the pattern of data on the occurrence of fraud based on the formation of data groupings. Data was prepared by combining claims for inpatient bills and patient bills from hospitals in 2020. Two methods were used in this study to form clusters, DBSCAN and KMeans. To find out the outliers in the cluster, Local Outlier Factor (LOF) was added. The results from experiments show that both methods can detect outlier data and distribute outlier data in the formed cluster. Variable that high effect becomes data outlier is the length of stay, claims code, and condition of patient when discharged from the hospital. Accuracy K-Means is 0.391, 0.003 higher than DBSCAN, which is 0.389

    MapReduce-iterative support vector machine classifier: novel fraud detection systems in healthcare insurance industry

    Get PDF
    Fraud in healthcare insurance claims is one of the significant research challenges that affect the growth of the healthcare services. The healthcare frauds are happening through subscribers, companies and the providers. The development of a decision support is to automate the claim data from service provider and to offset the patient’s challenges. In this paper, a novel hybridized big data and statistical machine learning technique, named MapReduce based iterative support vector machine (MR-ISVM) that provide a set of sophisticated steps for the automatic detection of fraudulent claims in the health insurance databases. The experimental results have proven that the MR-ISVM classifier outperforms better in classification and detection than other support vector machine (SVM) kernel classifiers. From the results, a positive impact seen in declining the computational time on processing the healthcare insurance claims without compromising the classification accuracy is achieved. The proposed MR-ISVM classifier achieves 87.73% accuracy than the linear (75.3%) and radial basis function (79.98%)

    Business Analytics Using Predictive Algorithms

    Get PDF
    In today's data-driven business landscape, organizations strive to extract actionable insights and make informed decisions using their vast data. Business analytics, combining data analysis, statistical modeling, and predictive algorithms, is crucial for transforming raw data into meaningful information. However, there are gaps in the field, such as limited industry focus, algorithm comparison, and data quality challenges. This work aims to address these gaps by demonstrating how predictive algorithms can be applied across business domains for pattern identification, trend forecasting, and accurate predictions. The report focuses on sales forecasting and topic modeling, comparing the performance of various algorithms including Linear Regression, Random Forest Regression, XGBoost, LSTMs, and ARIMA. It emphasizes the importance of data preprocessing, feature selection, and model evaluation for reliable sales forecasts, while utilizing S-BERT, UMAP, and HDBScan unsupervised algorithms for extracting valuable insights from unstructured textual data

    Unveiling the frontiers of deep learning: innovations shaping diverse domains

    Full text link
    Deep learning (DL) enables the development of computer models that are capable of learning, visualizing, optimizing, refining, and predicting data. In recent years, DL has been applied in a range of fields, including audio-visual data processing, agriculture, transportation prediction, natural language, biomedicine, disaster management, bioinformatics, drug design, genomics, face recognition, and ecology. To explore the current state of deep learning, it is necessary to investigate the latest developments and applications of deep learning in these disciplines. However, the literature is lacking in exploring the applications of deep learning in all potential sectors. This paper thus extensively investigates the potential applications of deep learning across all major fields of study as well as the associated benefits and challenges. As evidenced in the literature, DL exhibits accuracy in prediction and analysis, makes it a powerful computational tool, and has the ability to articulate itself and optimize, making it effective in processing data with no prior training. Given its independence from training data, deep learning necessitates massive amounts of data for effective analysis and processing, much like data volume. To handle the challenge of compiling huge amounts of medical, scientific, healthcare, and environmental data for use in deep learning, gated architectures like LSTMs and GRUs can be utilized. For multimodal learning, shared neurons in the neural network for all activities and specialized neurons for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table

    Cost-Sensitive Learning-based Methods for Imbalanced Classification Problems with Applications

    Get PDF
    Analysis and predictive modeling of massive datasets is an extremely significant problem that arises in many practical applications. The task of predictive modeling becomes even more challenging when data are imperfect or uncertain. The real data are frequently affected by outliers, uncertain labels, and uneven distribution of classes (imbalanced data). Such uncertainties create bias and make predictive modeling an even more difficult task. In the present work, we introduce a cost-sensitive learning method (CSL) to deal with the classification of imperfect data. Typically, most traditional approaches for classification demonstrate poor performance in an environment with imperfect data. We propose the use of CSL with Support Vector Machine, which is a well-known data mining algorithm. The results reveal that the proposed algorithm produces more accurate classifiers and is more robust with respect to imperfect data. Furthermore, we explore the best performance measures to tackle imperfect data along with addressing real problems in quality control and business analytics

    Novel neural approaches to data topology analysis and telemedicine

    Get PDF
    1noL'abstract è presente nell'allegato / the abstract is in the attachmentopen676. INGEGNERIA ELETTRICAnoopenRandazzo, Vincenz

    Forensic science in combat of human trafficking

    Get PDF
    Although Forensic Science has become a crucial part of the investigation of many types of crime, the low number of scientific publications on the usage of Forensic Science to eliminate Human Trafficking or to speed up crime investigation, has given rise to the idea of conducting research on the role of Forensic Science in the investigation of Human Trafficking cases. The following literature review aims at judging the current importance of Forensic Science in solving and preventing Human Trafficking cases, at gathering ideas for the introduction of novel techniques and at identifying gaps of research within this field. For this purpose, a wider view, also addressing socio-economic topics, was applied

    Unsupervised Machine Learning for Networking:Techniques, Applications and Research Challenges

    Get PDF
    While machine learning and artificial intelligence have long been applied in networking research, the bulk of such works has focused on supervised learning. Recently, there has been a rising trend of employing unsupervised machine learning using unstructured raw network data to improve network performance and provide services such as traffic engineering, anomaly detection, Internet traffic classification, and quality of service optimization. The interest in applying unsupervised learning techniques in networking emerges from their great success in other fields such as computer vision, natural language processing, speech recognition, and optimal control (e.g., for developing autonomous self-driving cars). Unsupervised learning is interesting since it can unconstrain us from the need of labeled data and manual handcrafted feature engineering thereby facilitating flexible, general, and automated methods of machine learning. The focus of this survey paper is to provide an overview of the applications of unsupervised learning in the domain of networking. We provide a comprehensive survey highlighting the recent advancements in unsupervised learning techniques and describe their applications in various learning tasks in the context of networking. We also provide a discussion on future directions and open research issues, while also identifying potential pitfalls. While a few survey papers focusing on the applications of machine learning in networking have previously been published, a survey of similar scope and breadth is missing in literature. Through this paper, we advance the state of knowledge by carefully synthesizing the insights from these survey papers while also providing contemporary coverage of recent advances

    Robust and Adversarial Data Mining

    Get PDF
    In the domain of data mining and machine learning, researchers have made significant contributions in developing algorithms handling clustering and classification problems. We develop algorithms under assumptions that are not met by previous works. (i) In adversarial learning, which is the study of machine learning techniques deployed in non-benign environments. We design an algorithm to show how a classifier should be designed to be robust against sparse adversarial attacks. Our main insight is that sparse feature attacks are best defended by designing classifiers which use L1 regularizers. (ii) The different properties between L1 (Lasso) and L2 (Tikhonov or Ridge) regularization has been studied extensively. However, given a data set, principle to follow in terms of choosing the suitable regularizer is yet to be developed. We use mathematical properties of the two regularization methods followed by detailed experimentation to understand their impact based on four characteristics. (iii) The identification of anomalies is an inherent component of knowledge discovery. In lots of cases, the number of features of a data set can be traced to a much smaller set of features. We claim that algorithms applied in a latent space are more robust. This can lead to more accurate results, and potentially provide a natural medium to explain and describe outliers. (iv) We also apply data mining techniques on health care industry. In a lot cases, health insurance companies cover unnecessary costs carried out by healthcare providers. The potential adversarial behaviours of surgeon physicians are addressed. We describe a specific con- text of private healthcare in Australia and describe our social network based approach (applied to health insurance claims) to understand the nature of collaboration among doctors treating hospital inpatients and explore the impact of collaboration on cost and quality of care. (v) We further develop models that predict the behaviours of orthopaedic surgeons in regard to surgery type and use of prosthetic device. An important feature of these models is that they can not only predict the behaviours of surgeons but also provide explanation for the predictions
    • …
    corecore