51,952 research outputs found

    Oversampling for Imbalanced Learning Based on K-Means and SMOTE

    Full text link
    Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE oversampling, which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 71 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation is made available in the python programming language.Comment: 19 pages, 8 figure

    An empirical evaluation of imbalanced data strategies from a practitioner's point of view

    Full text link
    This research tested the following well known strategies to deal with binary imbalanced data on 82 different real life data sets (sampled to imbalance rates of 5%, 3%, 1%, and 0.1%): class weight, SMOTE, Underbagging, and a baseline (just the base classifier). As base classifiers we used SVM with RBF kernel, random forests, and gradient boosting machines and we measured the quality of the resulting classifier using 6 different metrics (Area under the curve, Accuracy, F-measure, G-mean, Matthew's correlation coefficient and Balanced accuracy). The best strategy strongly depends on the metric used to measure the quality of the classifier. For AUC and accuracy class weight and the baseline perform better; for F-measure and MCC, SMOTE performs better; and for G-mean and balanced accuracy, underbagging

    On the almost sure convergence of adaptive allocation procedures

    Get PDF
    In this paper, we provide some general convergence results for adaptive designs for treatment comparison, both in the absence and presence of covariates. In particular, we demonstrate the almost sure convergence of the treatment allocation proportion for a vast class of adaptive procedures, also including designs that have not been formally investigated but mainly explored through simulations, such as Atkinson's optimum biased coin design, Pocock and Simon's minimization method and some of its generalizations. Even if the large majority of the proposals in the literature rely on continuous allocation rules, our results allow to prove via a unique mathematical framework the convergence of adaptive allocation methods based on both continuous and discontinuous randomization functions. Although several examples of earlier works are included in order to enhance the applicability, our approach provides substantial insight for future suggestions, especially in the absence of a prefixed target and for designs characterized by sequences of allocation rules.Comment: Published at http://dx.doi.org/10.3150/13-BEJ591 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    STEM Rebalance: A Novel Approach for Tackling Imbalanced Datasets using SMOTE, Edited Nearest Neighbour, and Mixup

    Full text link
    Imbalanced datasets in medical imaging are characterized by skewed class proportions and scarcity of abnormal cases. When trained using such data, models tend to assign higher probabilities to normal cases, leading to biased performance. Common oversampling techniques such as SMOTE rely on local information and can introduce marginalization issues. This paper investigates the potential of using Mixup augmentation that combines two training examples along with their corresponding labels to generate new data points as a generic vicinal distribution. To this end, we propose STEM, which combines SMOTE-ENN and Mixup at the instance level. This integration enables us to effectively leverage the entire distribution of minority classes, thereby mitigating both between-class and within-class imbalances. We focus on the breast cancer problem, where imbalanced datasets are prevalent. The results demonstrate the effectiveness of STEM, which achieves AUC values of 0.96 and 0.99 in the Digital Database for Screening Mammography and Wisconsin Breast Cancer (Diagnostics) datasets, respectively. Moreover, this method shows promising potential when applied with an ensemble of machine learning (ML) classifiers.Comment: 7 pages, 4 figures, International Conference on Intelligent Computer Communication and Processin

    How to sustain entrepreneurial performance during the current financial crisis

    Get PDF
    In a debt-ridden society that badly needs to grow economically, policies controlling the flows of economic accounts (revenues and expenditures) should be consistent with an efficient “asset and liability management”. The extra money obtained from immediate sales of idle or low-productive government properties can boost economic growth if lent to innovative entrepreneurial firms

    Head of the Class: A QualityTeacher in Every Pennsylvania Classroom

    Get PDF
    "Head of the Class: A Quality Teacher in Every Pennsylvania Classroom" makes recommendations for how state policy can increase and support Pennsylvania's supply of qualified teachers. The report emphasizes that quality teaching is key to student achievement and that the state must act to ensure the presence of a qualified teacher in every Pennsylvania classroom at all times

    Travel for Transformation: Embracing a Counter-Hegemonic Approach to Transformative Learning in Study Abroad

    Get PDF
    This article reviews literature from 2006-2016 on study abroad (and other forms of travel) to investigate frameworks that create the best plausible opportunities for transformative learning within study-abroad experiences. According to the literature reviewed, in order to be considered travel for transformation, the travel experience must respect the values and knowledge of the host culture, acknowledge the presence of differences in privilege among study-abroad participants, and utilize environmentally sustainable practices. In addition, the duration, purpose of travel, and degree of immersion plays a significant role in perspective transformation. A repeated benefit to study-abroad programs among the articles indicate that study abroad is better positioned for transformative learning than the traditional classroom environment is that it situates the student in a new context where the place, culture, people, and hopefully the language are “other.” While almost all of the literature reviewed for this article included cautions to avoid essentializing and exploiting the host culture, very little could be found on the possible negative outcomes to participants—and especially the host culture—when students from the United States study in other contexts. Therefore, the author recommends that future research investigate the possibility of study abroad as exploitation of both the host culture and the participants of the study-abroad program
    • 

    corecore