Search CORE

4,732 research outputs found

Detecting fraud: Utilizing new technology to advance the audit profession

Author: Stanton Gabriella
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/04/2012
Field of study

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

Author: Bacao Fernando
Douzas Georgios
Last Felix
Publication venue: 'Elsevier BV'
Publication date: 12/12/2017
Field of study

Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE oversampling, which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 71 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation is made available in the python programming language.Comment: 19 pages, 8 figure

arXiv.org e-Print Archive

Repositório da Universidade Nova de Lisboa

ANALYZING BIG DATA WITH DECISION TREES

Author: Leong Lok Kei
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2014
Field of study

ANALYZING BIG DATA WITH DECISION TREE

SJSU ScholarWorks

Capturing "attrition intensifying" structural traits from didactic interaction sequences of MOOC learners

Author: Dillenbourg Pierre
Jermann Patrick
Li Nan
Sinha Tanmay
Publication venue
Publication date: 01/01/2014
Field of study

This work is an attempt to discover hidden structural configurations in learning activity sequences of students in Massive Open Online Courses (MOOCs). Leveraging combined representations of video clickstream interactions and forum activities, we seek to fundamentally understand traits that are predictive of decreasing engagement over time. Grounded in the interdisciplinary field of network science, we follow a graph based approach to successfully extract indicators of active and passive MOOC participation that reflect persistence and regularity in the overall interaction footprint. Using these rich educational semantics, we focus on the problem of predicting student attrition, one of the major highlights of MOOC literature in the recent years. Our results indicate an improvement over a baseline ngram based approach in capturing "attrition intensifying" features from the learning activities that MOOC learners engage in. Implications for some compelling future research are discussed.Comment: "Shared Task" submission for EMNLP 2014 Workshop on Modeling Large Scale Social Interaction in Massively Open Online Course

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref