6,370 research outputs found
Oversampling for Imbalanced Learning Based on K-Means and SMOTE
Learning from class-imbalanced data continues to be a common and challenging
problem in supervised learning as standard classification algorithms are
designed to handle balanced class distributions. While different strategies
exist to tackle this problem, methods which generate artificial data to achieve
a balanced class distribution are more versatile than modifications to the
classification algorithm. Such techniques, called oversamplers, modify the
training data, allowing any classifier to be used with class-imbalanced
datasets. Many algorithms have been proposed for this task, but most are
complex and tend to generate unnecessary noise. This work presents a simple and
effective oversampling method based on k-means clustering and SMOTE
oversampling, which avoids the generation of noise and effectively overcomes
imbalances between and within classes. Empirical results of extensive
experiments with 71 datasets show that training data oversampled with the
proposed method improves classification results. Moreover, k-means SMOTE
consistently outperforms other popular oversampling methods. An implementation
is made available in the python programming language.Comment: 19 pages, 8 figure
A Comprehensive Survey on Rare Event Prediction
Rare event prediction involves identifying and forecasting events with a low
probability using machine learning and data analysis. Due to the imbalanced
data distributions, where the frequency of common events vastly outweighs that
of rare events, it requires using specialized methods within each step of the
machine learning pipeline, i.e., from data processing to algorithms to
evaluation protocols. Predicting the occurrences of rare events is important
for real-world applications, such as Industry 4.0, and is an active research
area in statistical and machine learning. This paper comprehensively reviews
the current approaches for rare event prediction along four dimensions: rare
event data, data processing, algorithmic approaches, and evaluation approaches.
Specifically, we consider 73 datasets from different modalities (i.e.,
numerical, image, text, and audio), four major categories of data processing,
five major algorithmic groupings, and two broader evaluation approaches. This
paper aims to identify gaps in the current literature and highlight the
challenges of predicting rare events. It also suggests potential research
directions, which can help guide practitioners and researchers.Comment: 44 page
Simulation-Based Data Augmentation for the Quality Inspection of Structural Adhesive with Deep Learning
UIDB/00066/2020 POCI-01-0247-FEDER-034072The advent of Industry 4.0 has shown the tremendous transformative potential of combining artificial intelligence, cyber-physical systems and Internet of Things concepts in industrial settings. Despite this, data availability is still a major roadblock for the successful adoption of data-driven solutions, particularly concerning deep learning approaches in manufacturing. Specifically in the quality control domain, annotated defect data can often be costly, time-consuming and inefficient to obtain, potentially compromising the viability of deep learning approaches due to data scarcity. In this context, we propose a novel method for generating annotated synthetic training data for automated quality inspections of structural adhesive applications, validated in an industrial cell for automotive parts. Our approach greatly reduces the cost of training deep learning models for this task, while simultaneously improving their performance in a scarce manufacturing data context with imbalanced training sets by 3.1% ([email protected]). Additional results can be seen at https://ricardosperes.github.io/simulation-synth-adhesive/.publishersversionpublishe
Deep learning in remote sensing: a review
Standing at the paradigm shift towards data-intensive science, machine
learning techniques are becoming increasingly important. In particular, as a
major breakthrough in the field, deep learning has proven as an extremely
powerful tool in many fields. Shall we embrace deep learning as the key to all?
Or, should we resist a 'black-box' solution? There are controversial opinions
in the remote sensing community. In this article, we analyze the challenges of
using deep learning for remote sensing data analysis, review the recent
advances, and provide resources to make deep learning in remote sensing
ridiculously simple to start with. More importantly, we advocate remote sensing
scientists to bring their expertise into deep learning, and use it as an
implicit general model to tackle unprecedented large-scale influential
challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
The Role of Synthetic Data in Improving Supervised Learning Methods: The Case of Land Use/Land Cover Classification
A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information ManagementIn remote sensing, Land Use/Land Cover (LULC) maps constitute important assets for
various applications, promoting environmental sustainability and good resource management.
Although, their production continues to be a challenging task. There are various factors
that contribute towards the difficulty of generating accurate, timely updated LULC maps,
both via automatic or photo-interpreted LULC mapping. Data preprocessing, being a
crucial step for any Machine Learning task, is particularly important in the remote sensing
domain due to the overwhelming amount of raw, unlabeled data continuously gathered
from multiple remote sensing missions. However a significant part of the state-of-the-art
focuses on scenarios with full access to labeled training data with relatively balanced class
distributions. This thesis focuses on the challenges found in automatic LULC classification
tasks, specifically in data preprocessing tasks. We focus on the development of novel
Active Learning (AL) and imbalanced learning techniques, to improve ML performance in
situations with limited training data and/or the existence of rare classes. We also show
that much of the contributions presented are not only successful in remote sensing problems,
but also in various other multidisciplinary classification problems. The work presented
in this thesis used open access datasets to test the contributions made in imbalanced
learning and AL. All the data pulling, preprocessing and experiments are made available at
https://github.com/joaopfonseca/publications. The algorithmic implementations are made
available in the Python package ml-research at https://github.com/joaopfonseca/ml-research
Towards Generalizable Morph Attack Detection with Consistency Regularization
Though recent studies have made significant progress in morph attack
detection by virtue of deep neural networks, they often fail to generalize well
to unseen morph attacks. With numerous morph attacks emerging frequently,
generalizable morph attack detection has gained significant attention. This
paper focuses on enhancing the generalization capability of morph attack
detection from the perspective of consistency regularization. Consistency
regularization operates under the premise that generalizable morph attack
detection should output consistent predictions irrespective of the possible
variations that may occur in the input space. In this work, to reach this
objective, two simple yet effective morph-wise augmentations are proposed to
explore a wide space of realistic morph transformations in our consistency
regularization. Then, the model is regularized to learn consistently at the
logit as well as embedding levels across a wide range of morph-wise augmented
images. The proposed consistency regularization aligns the abstraction in the
hidden layers of our model across the morph attack images which are generated
from diverse domains in the wild. Experimental results demonstrate the superior
generalization and robustness performance of our proposed method compared to
the state-of-the-art studies.Comment: Accepted to the IEEE International Joint Conference on Biometrics
(IJCB), 202
- …