4,732 research outputs found
Oversampling for Imbalanced Learning Based on K-Means and SMOTE
Learning from class-imbalanced data continues to be a common and challenging
problem in supervised learning as standard classification algorithms are
designed to handle balanced class distributions. While different strategies
exist to tackle this problem, methods which generate artificial data to achieve
a balanced class distribution are more versatile than modifications to the
classification algorithm. Such techniques, called oversamplers, modify the
training data, allowing any classifier to be used with class-imbalanced
datasets. Many algorithms have been proposed for this task, but most are
complex and tend to generate unnecessary noise. This work presents a simple and
effective oversampling method based on k-means clustering and SMOTE
oversampling, which avoids the generation of noise and effectively overcomes
imbalances between and within classes. Empirical results of extensive
experiments with 71 datasets show that training data oversampled with the
proposed method improves classification results. Moreover, k-means SMOTE
consistently outperforms other popular oversampling methods. An implementation
is made available in the python programming language.Comment: 19 pages, 8 figure
Capturing "attrition intensifying" structural traits from didactic interaction sequences of MOOC learners
This work is an attempt to discover hidden structural configurations in
learning activity sequences of students in Massive Open Online Courses (MOOCs).
Leveraging combined representations of video clickstream interactions and forum
activities, we seek to fundamentally understand traits that are predictive of
decreasing engagement over time. Grounded in the interdisciplinary field of
network science, we follow a graph based approach to successfully extract
indicators of active and passive MOOC participation that reflect persistence
and regularity in the overall interaction footprint. Using these rich
educational semantics, we focus on the problem of predicting student attrition,
one of the major highlights of MOOC literature in the recent years. Our results
indicate an improvement over a baseline ngram based approach in capturing
"attrition intensifying" features from the learning activities that MOOC
learners engage in. Implications for some compelling future research are
discussed.Comment: "Shared Task" submission for EMNLP 2014 Workshop on Modeling Large
Scale Social Interaction in Massively Open Online Course
- …