4,114 research outputs found
A post-processing strategy for SVM learning from unbalanced data
Está en: https://upcommons.upc.edu/handle/2117/12531Standard learning algorithms may perform poorly when learning from unbalanced datasets. Based on the Fisher’s discriminant analysis, a post-processing strategy is introduced to deal datasets with significant imbalance in the data distribution. A new bias is defined, which reduces skew towards the minority class. Empirical results from experiments for a learned SVM model on twelve UCI datasets indicates that the proposed solution improves the original SVM, and they also improve those reported when using a z-SVM, in terms of g-mean and sensitivity.Spanish Ministry of Science and Technology TIN2009-14378-C02-0
A post-processing strategy for SVM learning from unbalanced data
Standard learning algorithms may perform poorly when learning
from unbalanced datasets. Based on the Fisher’s discriminant analysis,
a post-processing strategy is introduced to deal datasets with significant
imbalance in the data distribution. A new bias is defined, which reduces
skew towards the minority class. Empirical results from experiments for
a learned SVM model on twelve UCI datasets indicates that the proposed
solution improves the original SVM, and they also improve those reported
when using a z-SVM, in terms of g-mean and sensitivity.Peer ReviewedPostprint (author’s final draft
Modelling of Sound Events with Hidden Imbalances Based on Clustering and Separate Sub-Dictionary Learning
This paper proposes an effective modelling of sound event spectra with a
hidden data-size-imbalance, for improved Acoustic Event Detection (AED). The
proposed method models each event as an aggregated representation of a few
latent factors, while conventional approaches try to find acoustic elements
directly from the event spectra. In the method, all the latent factors across
all events are assigned comparable importance and complexity to overcome the
hidden imbalance of data-sizes in event spectra. To extract latent factors in
each event, the proposed method employs clustering and performs non-negative
matrix factorization to each latent factor, and learns its acoustic elements as
a sub-dictionary. Separate sub-dictionary learning effectively models the
acoustic elements with limited data-sizes and avoids over-fitting due to hidden
imbalances in training data. For the task of polyphonic sound event detection
from DCASE 2013 challenge, an AED based on the proposed modelling achieves a
detection F-measure of 46.5%, a significant improvement of more than 19% as
compared to the existing state-of-the-art methods
Spectral-spatial classification of hyperspectral images: three tricks and a new supervised learning setting
Spectral-spatial classification of hyperspectral images has been the subject
of many studies in recent years. In the presence of only very few labeled
pixels, this task becomes challenging. In this paper we address the following
two research questions: 1) Can a simple neural network with just a single
hidden layer achieve state of the art performance in the presence of few
labeled pixels? 2) How is the performance of hyperspectral image classification
methods affected when using disjoint train and test sets? We give a positive
answer to the first question by using three tricks within a very basic shallow
Convolutional Neural Network (CNN) architecture: a tailored loss function, and
smooth- and label-based data augmentation. The tailored loss function enforces
that neighborhood wavelengths have similar contributions to the features
generated during training. A new label-based technique here proposed favors
selection of pixels in smaller classes, which is beneficial in the presence of
very few labeled pixels and skewed class distributions. To address the second
question, we introduce a new sampling procedure to generate disjoint train and
test set. Then the train set is used to obtain the CNN model, which is then
applied to pixels in the test set to estimate their labels. We assess the
efficacy of the simple neural network method on five publicly available
hyperspectral images. On these images our method significantly outperforms
considered baselines. Notably, with just 1% of labeled pixels per class, on
these datasets our method achieves an accuracy that goes from 86.42%
(challenging dataset) to 99.52% (easy dataset). Furthermore we show that the
simple neural network method improves over other baselines in the new
challenging supervised setting. Our analysis substantiates the highly
beneficial effect of using the entire image (so train and test data) for
constructing a model.Comment: Remote Sensing 201
Transportation in Social Media: an automatic classifier for travel-related tweets
In the last years researchers in the field of intelligent transportation
systems have made several efforts to extract valuable information from social
media streams. However, collecting domain-specific data from any social media
is a challenging task demanding appropriate and robust classification methods.
In this work we focus on exploring geo-located tweets in order to create a
travel-related tweet classifier using a combination of bag-of-words and word
embeddings. The resulting classification makes possible the identification of
interesting spatio-temporal relations in S\~ao Paulo and Rio de Janeiro
- …