157 research outputs found

    Hellinger Distance Trees for Imbalanced Streams

    Get PDF
    Classifiers trained on data sets possessing an imbalanced class distribution are known to exhibit poor generalisation performance. This is known as the imbalanced learning problem. The problem becomes particularly acute when we consider incremental classifiers operating on imbalanced data streams, especially when the learning objective is rare class identification. As accuracy may provide a misleading impression of performance on imbalanced data, existing stream classifiers based on accuracy can suffer poor minority class performance on imbalanced streams, with the result being low minority class recall rates. In this paper we address this deficiency by proposing the use of the Hellinger distance measure, as a very fast decision tree split criterion. We demonstrate that by using Hellinger a statistically significant improvement in recall rates on imbalanced data streams can be achieved, with an acceptable increase in the false positive rate.Comment: 6 Pages, 2 figures, to be published in Proceedings 22nd International Conference on Pattern Recognition (ICPR) 201

    A hybrid algorithm to improve the accuracy of support vector machines on skewed data-sets

    Get PDF
    Over the past few years, has been shown that generalization power of Support Vector Machines (SVM) falls dramatically on imbalanced data-sets. In this paper, we propose a new method to improve accuracy of SVM on imbalanced data-sets. To get this outcome, firstly, we used undersampling and SVM to obtain the initial SVs and a sketch of the hyperplane. These support vectors help to generate new artificial instances, which will take part as the initial population of a genetic algorithm. The genetic algorithm improves the population in artificial instances from one generation to another and eliminates instances that produce noise in the hyperplane. Finally, the generated and evolved data were included in the original data-set for minimizing the imbalance and improving the generalization ability of the SVM on skewed data-sets

    Epileptic Seizure Detection in EEGs by Using Random Tree Forest, Naïve Bayes and KNN Classification

    Get PDF
    Epilepsy is a disease that attacks the nerves. To detect epilepsy, it is necessary to analyze the results of an EEG test. In this study, we compared the naive bayes, random tree forest and K-nearest neighbour (KNN) classification algorithms to detect epilepsy. The raw EEG data were pre-processed before doing feature extraction. Then, we have done the training in three algorithms: KNN Classification, naïve bayes classification and random tree forest. The last step was validation of the trained machine learning. Comparing those three classifiers, we calculated accuracy, sensitivity, specificity, and precision. The best trained classifier is KNN classifier (accuracy: 92.7%), rather than random tree forest (accuracy: 86.6%) and naïve bayes classifier (accuracy: 55.6%). Seen from precision performance, KNN Classification also gives the best precision (82.5%) rather than Naïve Bayes classification (25.3%) and random tree forest (68.2%). But, for the sensitivity, Naïve Bayes classification is the best with 80.3% sensitivity, compare to KNN 73.2% and random tree forest (42.2%). For specificity, KNN classification gives 96.7% specificity, then random tree forest 95.9% and Naïve bayes 50.4%. The training time of naïve bayes was 0.166030 sec, while training time of random tree forest was 2.4094sec and KNN was the slower in training that was 4.789 sec. Therefore, KNN Classification gives better performance than naïve bayes and random tree forest classification

    Combine vector quantization and support vector machine for imbalanced datasets

    Get PDF
    In cases of extremely imbalanced dataset with high dimensions, standard machine learning techniques tend to be overwhelmed by the large classes. This paper rebalances skewed datasets by compressing the majority class. This approach combines Vector Quantization and Support Vector Machine and constructs a new approach, VQ-SVM, to rebalance datasets without significant information loss. Some issues, e.g. distortion and support vectors, have been discussed to address the trade-off between the information loss and undersampling. Experiments compare VQ-SVM and standard SVM on some imbalanced datasets with varied imbalance ratios, and results show that the performance of VQ-SVM is superior to SVM, especially in case of extremely imbalanced large datasets.IFIP International Conference on Artificial Intelligence in Theory and Practice - Integration of AI with other TechnologiesRed de Universidades con Carreras en Informática (RedUNCI

    A post-processing strategy for SVM learning from unbalanced data

    Get PDF
    Standard learning algorithms may perform poorly when learning from unbalanced datasets. Based on the Fisher’s discriminant analysis, a post-processing strategy is introduced to deal datasets with significant imbalance in the data distribution. A new bias is defined, which reduces skew towards the minority class. Empirical results from experiments for a learned SVM model on twelve UCI datasets indicates that the proposed solution improves the original SVM, and they also improve those reported when using a z-SVM, in terms of g-mean and sensitivity.Peer ReviewedPostprint (author’s final draft

    Equity Forecast: Predicting Long Term Stock Price Movement using Machine Learning

    Get PDF
    Abstract. Long term investment is one of the major investment strategies. However, calculating intrinsic value of some company and evaluating shares for long term investment is not easy, since analyst have to care about a large number of financial indicators and evaluate them in a right manner. So far, little help in predicting the direction of the company value over the longer period of time has been provided from the machines. In this paper we present a machine learning aided approach to evaluate the equity’s future price over the long time. Our method is able to correctly predict whether some company’s value will be 10% higher or not over the period of one year in 76.5% of cases.Keywords. Machine learning, Long term investment, Equity, Stock price prediction.JEL. H54, D92, E20

    Bridging the Gap: Simultaneous Fine Tuning for Data Re-Balancing

    Full text link
    There are many real-world classification problems wherein the issue of data imbalance (the case when a data set contains substantially more samples for one/many classes than the rest) is unavoidable. While under-sampling the problematic classes is a common solution, this is not a compelling option when the large data class is itself diverse and/or the limited data class is especially small. We suggest a strategy based on recent work concerning limited data problems which utilizes a supplemental set of images with similar properties to the limited data class to aid in the training of a neural network. We show results for our model against other typical methods on a real-world synthetic aperture sonar data set. Code can be found at github.com/JohnMcKay/dataImbalance.Comment: Submitted to IGARSS 2018, 4 Pages, 8 Figure
    • …
    corecore