703 research outputs found

    An Examination of the Smote and Other Smote-based Techniques That Use Synthetic Data to Oversample the Minority Class in the Context of Credit-Card Fraud Classification

    Get PDF
    This research project seeks to investigate some of the different sampling techniques that generate and use synthetic data to oversample the minority class as a means of handling the imbalanced distribution between non-fraudulent (majority class) and fraudulent (minority class) classes in a credit-card fraud dataset. The purpose of the research project is to assess the effectiveness of these techniques in the context of fraud detection which is a highly imbalanced and cost-sensitive dataset. Machine learning tasks that require learning from datasets that are highly unbalanced have difficulty learning since many of the traditional learning algorithms are not designed to cope with large differentials between classes. For that reason, various different methods have been developed to help tackle this problem. Oversampling and undersampling are examples of techniques that help deal with the class imbalance problem through sampling. This paper will evaluate oversampling techniques that use synthetic data to balance the minority class. The idea of using synthetic data to compensate for the minority class was first proposed by (Chawla et al., 2002). The technique is known as Synthetic Minority Over-Sampling Technique (SMOTE). Following the development of the technique, other techniques were developed from it. This paper will evaluate the SMOTE technique along with other also popular SMOTE-based extensions of the original technique

    An Evolutionary Neural Network Approach for Slopes Stability Assessment

    Get PDF
    A current big challenge for developed or developing countries is how to keep large-scale transportation infrastructure networks operational under all conditions. Network extensions and budgetary constraints for maintenance purposes are among the main factors that make transportation network management a non-trivial task. On the other hand, the high number of parameters affecting the stability condition of engineered slopes makes their assessment even more complex and difficult to accomplish. Aiming to help achieve the more efficient management of such an important element of modern society, a first attempt at the development of a classification system for rock and soil cuttings, as well as embankments based on visual features, was made in this paper using soft computing algorithms. The achieved results, although interesting, nevertheless have some important limitations to their successful use as auxiliary tools for transportation network management tasks. Accordingly, we carried out new experiments through the combination of modern optimization and soft computing algorithms. Thus, one of the main challenges to overcome is related to the selection of the best set of input features for a feedforward neural network for earthwork hazard category (EHC) identification. We applied a genetic algorithm (GA) for this purpose. Another challenging task is related to the asymmetric distribution of the data (since typically good conditions are much more common than bad ones). To address this question, three training sampling approaches were explored: no resampling, the synthetic minority oversampling technique (SMOTE), and oversampling. Some relevant observations were taken from the optimization process, namely, the identification of which variables are more frequently selected for EHC identification. After finding the most efficient models, a detailed sensitivity analysis was applied over the selected models, allowing us to measure the relative importance of each attribute in EHC identification

    Predicting type of delivery by identification of obstetric risk factors through data mining

    Get PDF
    In Maternity Care, a quick decision has to be made about the most suitable delivery type for the current patient. Guidelines are followed by physicians to support that decision; however, those practice recommendations are limited and underused. In the last years, caesarean delivery has been pursued in over 28% of pregnancies, and other operative techniques regarding specific problems have also been excessively employed. This study identifies obstetric and pregnancy factors that can be used to predict the most appropriate delivery technique, through the induction of data mining models using real data gathered in the perinatal and maternal care unit of Centro Hospitalar of Oporto (CHP). Predicting the type of birth envisions high-quality services, increased safety and effectiveness of specific practices to help guide maternity care decisions and facilitate optimal outcomes in mother and child. In this work was possible to acquire good results, achieving sensitivity and specificity values of 90.11% and 80.05%, respectively, providing the CHP with a model capable of correctly identify caesarean sections and vaginal deliveries

    ADGym: Design Choices for Deep Anomaly Detection

    Full text link
    Deep learning (DL) techniques have recently found success in anomaly detection (AD) across various fields such as finance, medical services, and cloud computing. However, most of the current research tends to view deep AD algorithms as a whole, without dissecting the contributions of individual design choices like loss functions and network architectures. This view tends to diminish the value of preliminary steps like data preprocessing, as more attention is given to newly designed loss functions, network architectures, and learning paradigms. In this paper, we aim to bridge this gap by asking two key questions: (i) Which design choices in deep AD methods are crucial for detecting anomalies? (ii) How can we automatically select the optimal design choices for a given AD dataset, instead of relying on generic, pre-existing solutions? To address these questions, we introduce ADGym, a platform specifically crafted for comprehensive evaluation and automatic selection of AD design elements in deep methods. Our extensive experiments reveal that relying solely on existing leading methods is not sufficient. In contrast, models developed using ADGym significantly surpass current state-of-the-art techniques.Comment: NeurIPS 2023. The first three authors contribute equally. Code available at https://github.com/Minqi824/ADGy

    CCNN: An Artificial Intelligent based Classifier to Credit Card Fraud Detection System with Optimized Cognitive Learning Model

    Get PDF
    Nowadays digital transactions play a vital role in money transaction processes. Last 5 years statistical report portrays the growth of internet money transaction especially credit card and unified payments interface. Mean time increasing numerous banking threats and digital transaction fraud rates also growing significantly. Data engineering techniques provide ultra supports to detect credit card forgery problems in online and offline mode transactions. This credit card fraud detection (CCFD) and prevention-based data processing issues raising because of two major reasons first, classification rate of legitimate and forgery uses is frequently changing, and next one is fraud detection dataset values are vastly asymmetric. Through this research work investigating performance of various existing classifier with our proposed cognitive convolutional neural network (CCNN) classifier. Existing classifiers like Logistic Regression (LR), K-nearest neighbor (KNN), Decision Tree (DT) and Support Vector Machine (SVM). These models are facing various challenges of low performance rate and high complexity because of low hit rate and accuracy. Through this research work we introduce cognitive learning-based CCNN classifier methodology with artificial intelligence for achieve maximum accuracy rate and minimal complexity issues. For experimental data analysis uses dataset of credit card transactions attained from specific region cardholders containing 284500 transactions and its various features. Also, this dataset contains unstructured and non-dimensional data are converted into structured data with the help of over sample and under sample method. Performance analysis shows proposed CCNN classifier model provide significant improvement on accuracy, specificity, sensitivity and hit rate. The results are shown in comparison. After cross-validation, the accuracy of the CCNN classification algorithm model for transaction fraudulent detection archived 99% which using the over-sampling model
    corecore