Search CORE

4,114 research outputs found

A post-processing strategy for SVM learning from unbalanced data

Author: Angulo Bahón Cecilio
González Abril Luis
Núñez Castro Haydemar
Publication venue: Ciaco
Publication date: 01/01/2011
Field of study

Está en: https://upcommons.upc.edu/handle/2117/12531Standard learning algorithms may perform poorly when learning from unbalanced datasets. Based on the Fisher’s discriminant analysis, a post-processing strategy is introduced to deal datasets with significant imbalance in the data distribution. A new bias is defined, which reduces skew towards the minority class. Empirical results from experiments for a learned SVM model on twelve UCI datasets indicates that the proposed solution improves the original SVM, and they also improve those reported when using a z-SVM, in terms of g-mean and sensitivity.Spanish Ministry of Science and Technology TIN2009-14378-C02-0

idUS. Depósito de Investigación Universidad de Sevilla

A post-processing strategy for SVM learning from unbalanced data

Author: Angulo Bahón Cecilio
González Abril Luis
Núñez Castro Haydemar
Publication venue
Publication date: 01/01/2011
Field of study

Standard learning algorithms may perform poorly when learning from unbalanced datasets. Based on the Fisher’s discriminant analysis, a post-processing strategy is introduced to deal datasets with significant imbalance in the data distribution. A new bias is defined, which reduces skew towards the minority class. Empirical results from experiments for a learned SVM model on twelve UCI datasets indicates that the proposed solution improves the original SVM, and they also improve those reported when using a z-SVM, in terms of g-mean and sensitivity.Peer ReviewedPostprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC

idUS. Depósito de Investigación Universidad de Sevilla

Modelling of Sound Events with Hidden Imbalances Based on Clustering and Separate Sub-Dictionary Learning

Author: Komatsu Tatsuya
Kondo Reishi
Narisetty Chaitanya
Publication venue
Publication date: 04/04/2019
Field of study

This paper proposes an effective modelling of sound event spectra with a hidden data-size-imbalance, for improved Acoustic Event Detection (AED). The proposed method models each event as an aggregated representation of a few latent factors, while conventional approaches try to find acoustic elements directly from the event spectra. In the method, all the latent factors across all events are assigned comparable importance and complexity to overcome the hidden imbalance of data-sizes in event spectra. To extract latent factors in each event, the proposed method employs clustering and performs non-negative matrix factorization to each latent factor, and learns its acoustic elements as a sub-dictionary. Separate sub-dictionary learning effectively models the acoustic elements with limited data-sizes and avoids over-fitting due to hidden imbalances in training data. For the task of polyphonic sound event detection from DCASE 2013 challenge, an AED based on the proposed modelling achieves a detection F-measure of 46.5%, a significant improvement of more than 19% as compared to the existing state-of-the-art methods

arXiv.org e-Print Archive

Crossref

Spectral-spatial classification of hyperspectral images: three tricks and a new supervised learning setting

Author: Acquarelli Jacopo
Buydens Lutgarde M. C.
Marchiori Elena
Tran Thanh
van Laarhoven Twan
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Spectral-spatial classification of hyperspectral images has been the subject of many studies in recent years. In the presence of only very few labeled pixels, this task becomes challenging. In this paper we address the following two research questions: 1) Can a simple neural network with just a single hidden layer achieve state of the art performance in the presence of few labeled pixels? 2) How is the performance of hyperspectral image classification methods affected when using disjoint train and test sets? We give a positive answer to the first question by using three tricks within a very basic shallow Convolutional Neural Network (CNN) architecture: a tailored loss function, and smooth- and label-based data augmentation. The tailored loss function enforces that neighborhood wavelengths have similar contributions to the features generated during training. A new label-based technique here proposed favors selection of pixels in smaller classes, which is beneficial in the presence of very few labeled pixels and skewed class distributions. To address the second question, we introduce a new sampling procedure to generate disjoint train and test set. Then the train set is used to obtain the CNN model, which is then applied to pixels in the test set to estimate their labels. We assess the efficacy of the simple neural network method on five publicly available hyperspectral images. On these images our method significantly outperforms considered baselines. Notably, with just 1% of labeled pixels per class, on these datasets our method achieves an accuracy that goes from 86.42% (challenging dataset) to 99.52% (easy dataset). Furthermore we show that the simple neural network method improves over other baselines in the new challenging supervised setting. Our analysis substantiates the highly beneficial effect of using the entire image (so train and test data) for constructing a model.Comment: Remote Sensing 201

arXiv.org e-Print Archive

Open University of the Netherlands Research Portal

Multidisciplinary Digital Publishing Institute

Radboud Repository

Transportation in Social Media: an automatic classifier for travel-related tweets

Author: Pasquali Arian
Pereira João
Rossetti Rosaldo
Saleiro Pedro
Publication venue
Publication date: 15/06/2017
Field of study

In the last years researchers in the field of intelligent transportation systems have made several efforts to extract valuable information from social media streams. However, collecting domain-specific data from any social media is a challenging task demanding appropriate and robust classification methods. In this work we focus on exploring geo-located tweets in order to create a travel-related tweet classifier using a combination of bag-of-words and word embeddings. The resulting classification makes possible the identification of interesting spatio-temporal relations in S\~ao Paulo and Rio de Janeiro

arXiv.org e-Print Archive

Crossref