14 research outputs found

    A review on classification of imbalanced data for wireless sensor networks

    Get PDF
    © The Author(s) 2020. Classification of imbalanced data is a vastly explored issue of the last and present decade and still keeps the same importance because data are an essential term today and it becomes crucial when data are distributed into several classes. The term imbalance refers to uneven distribution of data into classes that severely affects the performance of traditional classifiers, that is, classifiers become biased toward the class having larger amount of data. The data generated from wireless sensor networks will have several imbalances. This review article is a decent analysis of imbalance issue for wireless sensor networks and other application domains, which will help the community to understand WHAT, WHY, and WHEN of imbalance in data and its remedies

    Oversampling Sintetis Berbasis Kopula untuk Model Klasifikasi dengan Data yang Tidak Seimbang

    Get PDF
    A machine learning classification model for detecting abnormality is usually developed using imbalanced data where the number of abnormal instances is significantly smaller than the normal ones. Since the data is imbalanced, the learning process is dominated by normal instances, and the resulting model may be biased. The most common method for coping with this problem is synthetic oversampling. Most synthetic oversampling techniques are distance-based, usually based on the k-Nearest Neighbor method. Patterns in data can be based on distance or correlation. This research proposes a synthetic oversampling technique that is based on correlations in the form of the joint probability distribution of the data. The joint probability distribution is represented using a Gaussian copula, while the marginal distribution uses three alternatives distribution: the Pearson distribution system, empirical distribution, and the Metalog distribution system. This proposed technique is compared with several commonly used synthetic oversampling techniques in a case study of credit card default prediction. The classification model uses the k-Nearest Neighbor and is validated using the k-fold cross-validation. We found that the classification model using the proposed oversampling method with the Metalog marginal distribution has the greatest total accuracy.  Model klasifikasi berbasis pembelajaran mesin untuk mendeteksi anomali biasanya didasarkan pada data dengan proporsi yang tidak seimbang. Proporsi data anomali biasanya jauh lebih kecil dibandingkan proporsi data non anomali. Ketidakseimbangan data menyebabkan model klasifikasi lebih banyak melakukan pembelajaran dengan data non anomali sehingga model bisa bias. Salah satu metode yang banyak digunakan untuk mengatasi masalah ini adalah oversampling sintetis. Oversampling sintetis umumnya didasarkan pada jarak dan didominasi metode berbasis k-Nearest Neighbor. Secara umum, pola data bisa berdasarkan jarak atau hubungan korelasional. Penelitian ini bertujuan menawarkan metode oversampling sintetis berdasarkan hubungan korelasional dalam bentuk distribusi probabilitas bersama dari data aslinya. Distribusi probabilitas bersama direpresentasikan dengan kopula Gaussian, sedangkan distribusi probabilitas marjinalnya direpresentasikan menggunakan tiga alternatf distribusi, yaitu sistem distribusi Pearson, distribusi empiris, dan sistem distribusi Metalog. Metode ini dibandingkan dengan beberapa metode oversampling lain yang umum digunakan untuk data yang tidak seimbang. Implementasi dilakukan dalam masalah kredit macet nasabah kartu kredit di suatu bank dengan metode klasifikasi k-Nearest Neighbor dengan ukuran kinerja akurasi total dengan metode validasi k-fold cross validation. Didapati bahwa model klasifikasi dengan metode oversampling usulan menggunakan distribusi marjinal Metalog memiliki akurasi total tertinggi

    Blended Multi-Modal Deep ConvNet Features for Diabetic Retinopathy Severity Prediction

    Full text link
    Diabetic Retinopathy (DR) is one of the major causes of visual impairment and blindness across the world. It is usually found in patients who suffer from diabetes for a long period. The major focus of this work is to derive optimal representation of retinal images that further helps to improve the performance of DR recognition models. To extract optimal representation, features extracted from multiple pre-trained ConvNet models are blended using proposed multi-modal fusion module. These final representations are used to train a Deep Neural Network (DNN) used for DR identification and severity level prediction. As each ConvNet extracts different features, fusing them using 1D pooling and cross pooling leads to better representation than using features extracted from a single ConvNet. Experimental studies on benchmark Kaggle APTOS 2019 contest dataset reveals that the model trained on proposed blended feature representations is superior to the existing methods. In addition, we notice that cross average pooling based fusion of features from Xception and VGG16 is the most appropriate for DR recognition. With the proposed model, we achieve an accuracy of 97.41%, and a kappa statistic of 94.82 for DR identification and an accuracy of 81.7% and a kappa statistic of 71.1% for severity level prediction. Another interesting observation is that DNN with dropout at input layer converges more quickly when trained using blended features, compared to the same model trained using uni-modal deep features.Comment: 18 pages, 8 figures, published in Electronics MDPI journa

    Deep Sensing: Inertial and Ambient Sensing for Activity Context Recognition using Deep Convolutional Neural Networks

    Get PDF
    With the widespread use of embedded sensing capabilities of mobile devices, there has been unprecedented development of context-aware solutions. This allows the proliferation of various intelligent applications, such as those for remote health and lifestyle monitoring, intelligent personalized services, etc. However, activity context recognition based on multivariate time series signals obtained from mobile devices in unconstrained conditions is naturally prone to imbalance class problems. This means that recognition models tend to predict classes with the majority number of samples whilst ignoring classes with the least number of samples, resulting in poor generalization. To address this problem, we propose augmentation of the time series signals from inertial sensors with signals from ambient sensing to train deep convolutional neural network (DCNNs) models. DCNNs provide the characteristics that capture local dependency and scale invariance of these combined sensor signals. Consequently, we developed a DCNN model using only inertial sensor signals and then developed another model that combined signals from both inertial and ambient sensors aiming to investigate the class imbalance problem by improving the performance of the recognition model. Evaluation and analysis of the proposed system using data with imbalanced classes show that the system achieved better recognition accuracy when data from inertial sensors are combined with those from ambient sensors, such as environmental noise level and illumination, with an overall improvement of 5.3% accuracy

    Deep Sensing: Inertial and Ambient Sensing for Activity Context Recognition using Deep Convolutional Neural Networks

    Get PDF
    With the widespread use of embedded sensing capabilities of mobile devices, there has been unprecedented development of context-aware solutions. This allows the proliferation of various intelligent applications, such as those for remote health and lifestyle monitoring, intelligent personalized services, etc. However, activity context recognition based on multivariate time series signals obtained from mobile devices in unconstrained conditions is naturally prone to imbalance class problems. This means that recognition models tend to predict classes with the majority number of samples whilst ignoring classes with the least number of samples, resulting in poor generalization. To address this problem, we propose augmentation of the time series signals from inertial sensors with signals from ambient sensing to train deep convolutional neural network (DCNNs) models. DCNNs provide the characteristics that capture local dependency and scale invariance of these combined sensor signals. Consequently, we developed a DCNN model using only inertial sensor signals and then developed another model that combined signals from both inertial and ambient sensors aiming to investigate the class imbalance problem by improving the performance of the recognition model. Evaluation and analysis of the proposed system using data with imbalanced classes show that the system achieved better recognition accuracy when data from inertial sensors are combined with those from ambient sensors, such as environmental noise level and illumination, with an overall improvement of 5.3% accuracy

    Survey on highly imbalanced multi-class data

    Get PDF
    Machine learning technology has a massive impact on society because it offers solutions to solve many complicated problems like classification, clustering analysis, and predictions, especially during the COVID-19 pandemic. Data distribution in machine learning has been an essential aspect in providing unbiased solutions. From the earliest literatures published on highly imbalanced data until recently, machine learning research has focused mostly on binary classification data problems. Research on highly imbalanced multi-class data is still greatly unexplored when the need for better analysis and predictions in handling Big Data is required. This study focuses on reviews related to the models or techniques in handling highly imbalanced multi-class data, along with their strengths and weaknesses and related domains. Furthermore, the paper uses the statistical method to explore a case study with a severely imbalanced dataset. This article aims to (1) understand the trend of highly imbalanced multi-class data through analysis of related literatures; (2) analyze the previous and current methods of handling highly imbalanced multi-class data; (3) construct a framework of highly imbalanced multi-class data. The chosen highly imbalanced multi-class dataset analysis will also be performed and adapted to the current methods or techniques in machine learning, followed by discussions on open challenges and the future direction of highly imbalanced multi-class data. Finally, for highly imbalanced multi-class data, this paper presents a novel framework. We hope this research can provide insights on the potential development of better methods or techniques to handle and manipulate highly imbalanced multi-class data
    corecore