Search CORE

51,952 research outputs found

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

Author: Bacao Fernando
Douzas Georgios
Last Felix
Publication venue: 'Elsevier BV'
Publication date: 12/12/2017
Field of study

Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE oversampling, which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 71 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation is made available in the python programming language.Comment: 19 pages, 8 figure

arXiv.org e-Print Archive

Repositório da Universidade Nova de Lisboa

A Person is a Person because of Others': Challenges to the Meanings of Discipline in Scotland and South Africa

Author: Lephalala M.
McCluskey Gillian Grassie
Publication venue
Publication date: 01/01/2010
Field of study

Edinburgh Research Explorer

An empirical evaluation of imbalanced data strategies from a practitioner's point of view

Author: Franceschinell Rodrigo A.
Wainer Jacques
Publication venue
Publication date: 16/10/2018
Field of study

This research tested the following well known strategies to deal with binary imbalanced data on 82 different real life data sets (sampled to imbalance rates of 5%, 3%, 1%, and 0.1%): class weight, SMOTE, Underbagging, and a baseline (just the base classifier). As base classifiers we used SVM with RBF kernel, random forests, and gradient boosting machines and we measured the quality of the resulting classifier using 6 different metrics (Area under the curve, Accuracy, F-measure, G-mean, Matthew's correlation coefficient and Balanced accuracy). The best strategy strongly depends on the metric used to measure the quality of the classifier. For AUC and accuracy class weight and the baseline perform better; for F-measure and MCC, SMOTE performs better; and for G-mean and balanced accuracy, underbagging

arXiv.org e-Print Archive

On the almost sure convergence of adaptive allocation procedures

Author: Antognini Alessandro Baldi
Zagoraiou Maroussa
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 01/05/2015
Field of study

In this paper, we provide some general convergence results for adaptive designs for treatment comparison, both in the absence and presence of covariates. In particular, we demonstrate the almost sure convergence of the treatment allocation proportion for a vast class of adaptive procedures, also including designs that have not been formally investigated but mainly explored through simulations, such as Atkinson's optimum biased coin design, Pocock and Simon's minimization method and some of its generalizations. Even if the large majority of the proposals in the literature rely on continuous allocation rules, our results allow to prove via a unique mathematical framework the convergence of adaptive allocation methods based on both continuous and discontinuous randomization functions. Although several examples of earlier works are included in order to enhance the applicability, our approach provides substantial insight for future suggestions, especially in the absence of a prefixed target and for designs characterized by sequences of allocation rules.Comment: Published at http://dx.doi.org/10.3150/13-BEJ591 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

STEM Rebalance: A Novel Approach for Tackling Imbalanced Datasets using SMOTE, Edited Nearest Neighbour, and Mixup

Author: Amerehi Fatemeh
Hasan Yumnah
Healy Patrick
Ryan Conor
Publication venue
Publication date: 13/11/2023
Field of study

Imbalanced datasets in medical imaging are characterized by skewed class proportions and scarcity of abnormal cases. When trained using such data, models tend to assign higher probabilities to normal cases, leading to biased performance. Common oversampling techniques such as SMOTE rely on local information and can introduce marginalization issues. This paper investigates the potential of using Mixup augmentation that combines two training examples along with their corresponding labels to generate new data points as a generic vicinal distribution. To this end, we propose STEM, which combines SMOTE-ENN and Mixup at the instance level. This integration enables us to effectively leverage the entire distribution of minority classes, thereby mitigating both between-class and within-class imbalances. We focus on the breast cancer problem, where imbalanced datasets are prevalent. The results demonstrate the effectiveness of STEM, which achieves AUC values of 0.96 and 0.99 in the Digital Database for Screening Mammography and Wisconsin Breast Cancer (Diagnostics) datasets, respectively. Moreover, this method shows promising potential when applied with an ensemble of machine learning (ML) classifiers.Comment: 7 pages, 4 figures, International Conference on Intelligent Computer Communication and Processin

arXiv.org e-Print Archive

How to sustain entrepreneurial performance during the current financial crisis

Author: Adam
Barro
Barro
Barro
Barro
Bohn
Borgatta
Cassard
Chan-Lau
Cicero
De Viti De Marco
Fulcher
Griziotti
Hilferding
IMF and World Bank
Jorgenson
McGregor
O'Driscoll
Pantaleoni
Pasinetti
Ricardo
Ricardo
Say
Schumpeter
Spaventa
Spaventa
Trevor-Roper
Weber Max
Publication venue: Birkbeck College, University of London
Publication date: 01/01/2012
Field of study

In a debt-ridden society that badly needs to grow economically, policies controlling the flows of economic accounts (revenues and expenditures) should be consistent with an efficient “asset and liability management”. The extra money obtained from immediate sales of idle or low-productive government properties can boost economic growth if lent to innovative entrepreneurial firms

CiteSeerX

Crossref

Birkbeck Institutional Research Online

Head of the Class: A QualityTeacher in Every Pennsylvania Classroom

Author
Publication venue: Education Policy and Leadership Center
Publication date: 02/02/2003
Field of study

"Head of the Class: A Quality Teacher in Every Pennsylvania Classroom" makes recommendations for how state policy can increase and support Pennsylvania's supply of qualified teachers. The report emphasizes that quality teaching is key to student achievement and that the state must act to ensure the presence of a qualified teacher in every Pennsylvania classroom at all times

IssueLab

Travel for Transformation: Embracing a Counter-Hegemonic Approach to Transformative Learning in Study Abroad

Author: Gambrell James A
Publication venue: SFA ScholarWorks
Publication date: 01/08/2018
Field of study

This article reviews literature from 2006-2016 on study abroad (and other forms of travel) to investigate frameworks that create the best plausible opportunities for transformative learning within study-abroad experiences. According to the literature reviewed, in order to be considered travel for transformation, the travel experience must respect the values and knowledge of the host culture, acknowledge the presence of differences in privilege among study-abroad participants, and utilize environmentally sustainable practices. In addition, the duration, purpose of travel, and degree of immersion plays a significant role in perspective transformation. A repeated benefit to study-abroad programs among the articles indicate that study abroad is better positioned for transformative learning than the traditional classroom environment is that it situates the student in a new context where the place, culture, people, and hopefully the language are “other.” While almost all of the literature reviewed for this article included cautions to avoid essentializing and exploiting the host culture, very little could be found on the possible negative outcomes to participants—and especially the host culture—when students from the United States study in other contexts. Therefore, the author recommends that future research investigate the possibility of study abroad as exploitation of both the host culture and the participants of the study-abroad program

SFA ScholarWorks