9,706 research outputs found

    Combining similarity in time and space for training set formation under concept drift

    Get PDF
    Concept drift is a challenge in supervised learning for sequential data. It describes a phenomenon when the data distributions change over time. In such a case accuracy of a classifier benefits from the selective sampling for training. We develop a method for training set selection, particularly relevant when the expected drift is gradual. Training set selection at each time step is based on the distance to the target instance. The distance function combines similarity in space and in time. The method determines an optimal training set size online at every time step using cross validation. It is a wrapper approach, it can be used plugging in different base classifiers. The proposed method shows the best accuracy in the peer group on the real and artificial drifting data. The method complexity is reasonable for the field applications

    Ensemble Learning for fraud detection in Online Payment System: Fraud Detection in Online Payment System

    Get PDF
    The imbalanced problem in fraud detection systems refers to the unequal distribution of fraud cases and non-fraud cases in the information that is used to train machine learning models. This can make it difficult to accurately detect fraudulent activity. As a general rule, instances of fraud occur much less frequently than instances of other types of occurrences, which results in a dataset which is very unbalanced. This imbalance can present challenges for machine learning algorithms, as they may become biased towards the majority class (that is, non-fraud cases) and fail to accurately detect fraud. In situations like these, machine learning models may have a high accuracy overall, but a low recall for the minority class (i.e., fraud cases), which means that many instances of fraud will be misclassified as instances of something else and will not be found. In this study, Synthetic Minority Sampling Technique (SMOTE) is used for balancing the data set and the following machine learning algorithms such as decision trees, Enhanced logistic regression, Naive Bayes are used to classify the dataset.Majority Voting mechanism is used to ensemble the DT,NB, ELR methods and analyze the performance of the model. The performance of the Ensemble of various Machine Learning algorithms was superior to that of the other algorithms in terms of accuracy (98.62%), F1 score (95.21%), precision (98.02%), and recall (96.75%)
    • ā€¦
    corecore