4 research outputs found

    A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

    Get PDF
    Imbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlapping instances, in this paper, we propose a novel oversampling method based on density peaks clustering. Firstly, density peaks clustering algorithm is used to cluster minority instances while screening outlier points. Secondly, sampling weights are assigned according to the size of clustered sub-clusters, and new instances are synthesized by interpolating between cluster cores and other instances of the same sub-cluster. Finally, comparative experiments are conducted on both the artificial data and KEEL datasets. The experiments validate the feasibility and effectiveness of the algorithm and improve the classification accuracy of the imbalanced data

    Adaptive Driving Style Classification through Transfer Learning with Synthetic Oversampling

    Get PDF
    Driving style classification does not only depend on objective measures such as vehicle speed or acceleration, but is also highly subjective as drivers come with their own definition. From our perspective, the successful implementation of driving style classification in real-world applications requires an adaptive approach that is tuned to each driver individually. Within this work, we propose a transfer learning framework for driving style classification in which we use a previously developed rule-based algorithm for the initialization of the neural network weights and train on limited data. Therefore, we applied various state-of-the-art machine learning methods to ensure robust training. First, we performed heuristic-based feature engineering to enhance generalized feature building in the first layer. We then calibrated our network to be able to use its output as a probabilistic metric and to only give predictions above a predefined neural network confidence. To increase the robustness of the transfer learning in early increments, we used a synthetic oversampling technique. We then performed a holistic hyperparameter optimization in the form of a random grid search, which incorporated the entire learning framework from pretraining to incremental adaption. The final algorithm was then evaluated based on the data of predefined synthetic drivers. Our results showed that, by integrating these various methods, high system-level performance and robustness were met with as little as three new training and validation data samples in each increment
    corecore