51,952 research outputs found
Oversampling for Imbalanced Learning Based on K-Means and SMOTE
Learning from class-imbalanced data continues to be a common and challenging
problem in supervised learning as standard classification algorithms are
designed to handle balanced class distributions. While different strategies
exist to tackle this problem, methods which generate artificial data to achieve
a balanced class distribution are more versatile than modifications to the
classification algorithm. Such techniques, called oversamplers, modify the
training data, allowing any classifier to be used with class-imbalanced
datasets. Many algorithms have been proposed for this task, but most are
complex and tend to generate unnecessary noise. This work presents a simple and
effective oversampling method based on k-means clustering and SMOTE
oversampling, which avoids the generation of noise and effectively overcomes
imbalances between and within classes. Empirical results of extensive
experiments with 71 datasets show that training data oversampled with the
proposed method improves classification results. Moreover, k-means SMOTE
consistently outperforms other popular oversampling methods. An implementation
is made available in the python programming language.Comment: 19 pages, 8 figure
An empirical evaluation of imbalanced data strategies from a practitioner's point of view
This research tested the following well known strategies to deal with binary
imbalanced data on 82 different real life data sets (sampled to imbalance rates
of 5%, 3%, 1%, and 0.1%): class weight, SMOTE, Underbagging, and a baseline
(just the base classifier). As base classifiers we used SVM with RBF kernel,
random forests, and gradient boosting machines and we measured the quality of
the resulting classifier using 6 different metrics (Area under the curve,
Accuracy, F-measure, G-mean, Matthew's correlation coefficient and Balanced
accuracy). The best strategy strongly depends on the metric used to measure the
quality of the classifier. For AUC and accuracy class weight and the baseline
perform better; for F-measure and MCC, SMOTE performs better; and for G-mean
and balanced accuracy, underbagging
On the almost sure convergence of adaptive allocation procedures
In this paper, we provide some general convergence results for adaptive
designs for treatment comparison, both in the absence and presence of
covariates. In particular, we demonstrate the almost sure convergence of the
treatment allocation proportion for a vast class of adaptive procedures, also
including designs that have not been formally investigated but mainly explored
through simulations, such as Atkinson's optimum biased coin design, Pocock and
Simon's minimization method and some of its generalizations. Even if the large
majority of the proposals in the literature rely on continuous allocation
rules, our results allow to prove via a unique mathematical framework the
convergence of adaptive allocation methods based on both continuous and
discontinuous randomization functions. Although several examples of earlier
works are included in order to enhance the applicability, our approach provides
substantial insight for future suggestions, especially in the absence of a
prefixed target and for designs characterized by sequences of allocation rules.Comment: Published at http://dx.doi.org/10.3150/13-BEJ591 in the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
STEM Rebalance: A Novel Approach for Tackling Imbalanced Datasets using SMOTE, Edited Nearest Neighbour, and Mixup
Imbalanced datasets in medical imaging are characterized by skewed class
proportions and scarcity of abnormal cases. When trained using such data,
models tend to assign higher probabilities to normal cases, leading to biased
performance. Common oversampling techniques such as SMOTE rely on local
information and can introduce marginalization issues. This paper investigates
the potential of using Mixup augmentation that combines two training examples
along with their corresponding labels to generate new data points as a generic
vicinal distribution. To this end, we propose STEM, which combines SMOTE-ENN
and Mixup at the instance level. This integration enables us to effectively
leverage the entire distribution of minority classes, thereby mitigating both
between-class and within-class imbalances. We focus on the breast cancer
problem, where imbalanced datasets are prevalent. The results demonstrate the
effectiveness of STEM, which achieves AUC values of 0.96 and 0.99 in the
Digital Database for Screening Mammography and Wisconsin Breast Cancer
(Diagnostics) datasets, respectively. Moreover, this method shows promising
potential when applied with an ensemble of machine learning (ML) classifiers.Comment: 7 pages, 4 figures, International Conference on Intelligent Computer
Communication and Processin
How to sustain entrepreneurial performance during the current financial crisis
In a debt-ridden society that badly needs to grow economically, policies controlling the flows of economic accounts (revenues and expenditures) should be consistent with an efficient âasset and liability managementâ. The extra money obtained from immediate sales of idle or low-productive government properties can boost economic growth if lent to innovative entrepreneurial firms
Head of the Class: A QualityTeacher in Every Pennsylvania Classroom
"Head of the Class: A Quality Teacher in Every Pennsylvania Classroom" makes recommendations for how state policy can increase and support Pennsylvania's supply of qualified teachers. The report emphasizes that quality teaching is key to student achievement and that the state must act to ensure the presence of a qualified teacher in every Pennsylvania classroom at all times
Travel for Transformation: Embracing a Counter-Hegemonic Approach to Transformative Learning in Study Abroad
This article reviews literature from 2006-2016 on study abroad (and other forms of travel) to investigate frameworks that create the best plausible opportunities for transformative learning within study-abroad experiences. According to the literature reviewed, in order to be considered travel for transformation, the travel experience must respect the values and knowledge of the host culture, acknowledge the presence of differences in privilege among study-abroad participants, and utilize environmentally sustainable practices. In addition, the duration, purpose of travel, and degree of immersion plays a significant role in perspective transformation. A repeated benefit to study-abroad programs among the articles indicate that study abroad is better positioned for transformative learning than the traditional classroom environment is that it situates the student in a new context where the place, culture, people, and hopefully the language are âother.â While almost all of the literature reviewed for this article included cautions to avoid essentializing and exploiting the host culture, very little could be found on the possible negative outcomes to participantsâand especially the host cultureâwhen students from the United States study in other contexts. Therefore, the author recommends that future research investigate the possibility of study abroad as exploitation of both the host culture and the participants of the study-abroad program
- âŠ