5,804 research outputs found
Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation
Feature selection (FS) has become an indispensable task in dealing with
today's highly complex pattern recognition problems with massive number of
features. In this study, we propose a new wrapper approach for FS based on
binary simultaneous perturbation stochastic approximation (BSPSA). This
pseudo-gradient descent stochastic algorithm starts with an initial feature
vector and moves toward the optimal feature vector via successive iterations.
In each iteration, the current feature vector's individual components are
perturbed simultaneously by random offsets from a qualified probability
distribution. We present computational experiments on datasets with numbers of
features ranging from a few dozens to thousands using three widely-used
classifiers as wrappers: nearest neighbor, decision tree, and linear support
vector machine. We compare our methodology against the full set of features as
well as a binary genetic algorithm and sequential FS methods using
cross-validated classification error rate and AUC as the performance criteria.
Our results indicate that features selected by BSPSA compare favorably to
alternative methods in general and BSPSA can yield superior feature sets for
datasets with tens of thousands of features by examining an extremely small
fraction of the solution space. We are not aware of any other wrapper FS
methods that are computationally feasible with good convergence properties for
such large datasets.Comment: This is the Istanbul Sehir University Technical Report
#SHR-ISE-2016.01. A short version of this report has been accepted for
publication at Pattern Recognition Letter
Features for the classification and clustering of music in symbolic format
Tese de mestrado, Engenharia Informática, Universidade de Lisboa, Faculdade de Ciências, 2008Este documento descreve o trabalho realizado no âmbito da disciplina de Projecto em Engenharia Informática do Mestrado em Engenharia Informática da Faculdade de Ciências da Universidade de Lisboa. Recuperação de Informação Musical é, hoje em dia, um ramo altamente activo de investigação e desenvolvimento na área de ciência da computação, e incide em diversos tópicos, incluindo a classificação musical por géneros. O trabalho apresentado centra-se na Classificação de Pistas e de Géneros de música armazenada usando o formato MIDI. Para resolver o problema da classificação de pistas MIDI, extraimos um conjunto de descritores que são usados para treinar um classificador implementado através de uma técnica de Máquinas de Aprendizagem, Redes Neuronais, com base nas notas, e durações destas, que descrevem cada faixa. As faixas são classificadas em seis categorias: Melody (Melodia), Harmony (Harmonia), Bass (Baixo) e Drums (Bateria). Para caracterizar o conteúdo musical de cada faixa, um vector de descritores numérico, normalmente conhecido como ”shallow structure description”, é extraído. Em seguida, eles são utilizados no classificador — Neural Network — que foi implementado no ambiente Matlab. Na Classificação por Géneros, duas propostas foram usadas: Modelação de Linguagem, na qual uma matriz de transição de probabilidades é criada para cada tipo de pista midi (Melodia, Harmonia, Baixo e Bateria) e também para cada género; e Redes Neuronais, em que um vector de descritores numéricos é extraído de cada pista, e é processado num Classificador baseado numa Rede Neuronal. Seis Colectâneas de Musica no formato Midi, de seis géneros diferentes, Blues, Country, Jazz, Metal, Punk e Rock, foram formadas para efectuar as experiências. Estes géneros foram escolhidos por partilharem os mesmos instrumentos, na sua maioria, como por exemplo, baixo, bateria, piano ou guitarra. Estes géneros também partilham algumas características entre si, para que a classificação não seja trivial, e para que a robustez dos classificadores seja testada. As experiências de Classificação de Pistas Midi, nas quais foram testados, numa primeira abordagem, todos os descritores, e numa segunda abordagem, os melhores descritores, mostrando que o uso de todos os descritores é uma abordagem errada, uma vez que existem descritores que confundem o classificador. Provou-se que a melhor maneira, neste contexto, de se classificar estas faixas MIDI é utilizar descritores cuidadosamente seleccionados. As experiências de Classificação por Géneros, mostraram que os Classificadores por Instrumentos (Single-Instrument) obtiveram os melhores resultados. Quatro géneros, Jazz, Country, Metal e Punk, obtiveram resultados de classificação com sucesso acima dos 80%
O trabalho futuro inclui: algoritmos genéticos para a selecção de melhores descritores; estruturar pistas e musicas; fundir todos os classificadores desenvolvidos num único classificador.This document describes the work carried out under the discipline of Computing Engineering Project of the Computer Engineering Master, Sciences Faculty of the Lisbon University. Music Information Retrieval is, nowadays, a highly active branch of research and development in the computer science field, and focuses several topics, including music genre classification. The work presented in this paper focus on Track and Genre Classification of music stored using MIDI format, To address the problem of MIDI track classification, we extract a set of descriptors that are used to train a classifier implemented by a Neural Network, based on the pitch levels and durations that describe each track. Tracks are classified into four classes: Melody, Harmony, Bass and Drums. In order to characterize the musical content from each track, a vector of numeric descriptors, normally known as shallow structure description, is extracted. Then they are used as inputs for the classifier which was implemented in the Matlab environment. In the Genre Classification task, two approaches are used: Language Modeling, in which a transition probabilities matrix is created for each type of track (Melody, Harmony, Bass and Drums) and also for each genre; and an approach based on Neural Networks, where a vector of numeric descriptors is extracted from each track (Melody, Harmony, Bass and Drums) and fed to a Neural Network Classifier. Six MIDI Music Corpora were assembled for the experiments, from six different genres, Blues, Country, Jazz, Metal, Punk and Rock. These genres were selected because all of them have the same base instruments, such as bass, drums, piano or guitar. Also, the genres chosen share some characteristics between them, so that the classification isn’t trivial, and tests the classifiers robustness. Track Classification experiments using all descriptors and best descriptors were made, showing that using all descriptors is a wrong approach, as there are descriptors which confuse the classifier. Using carefully selected descriptors proved to be the best way to classify these MIDI tracks. Genre Classification experiments showed that the Single-Instrument Classifiers achieved the best results. Four genres achieved higher than 80% success rates: Jazz, Country, Metal and Punk. Future work includes: genetic algorithms; structurize tracks and songs; merge all presented classifiers into one full Automatic Genre Classification System
Automated design of robust discriminant analysis classifier for foot pressure lesions using kinematic data
In the recent years, the use of motion tracking systems for acquisition of functional biomechanical gait data, has received increasing interest due to the richness and accuracy of the measured kinematic information. However, costs frequently restrict the number of subjects employed, and this makes the dimensionality of the collected data far higher than the available samples. This paper applies discriminant analysis algorithms to the classification of patients with different types of foot lesions, in order to establish an association between foot motion and lesion formation. With primary attention to small sample size situations, we compare different types of Bayesian classifiers and evaluate their performance with various dimensionality reduction techniques for feature extraction, as well as search methods for selection of raw kinematic variables. Finally, we propose a novel integrated method which fine-tunes the classifier parameters and selects the most relevant kinematic variables simultaneously. Performance comparisons are using robust resampling techniques such as Bootstrapand k-fold cross-validation. Results from experimentations with lesion subjects suffering from pathological plantar hyperkeratosis, show that the proposed method can lead tocorrect classification rates with less than 10% of the original features
Recommended from our members
Recursive Percentage based Hybrid Pattern Training for Supervised Learning
Supervised learning algorithms, often used to find the I/O relationship in data, have the tendency to be trapped in local optima as opposed to the desirable global optima. In this paper, we discuss the RPHP learning algorithm. The algorithm uses Real Coded Genetic Algorithm based global and local searches to find a set of pseudo global optimal solutions. Each pseudo global optimum is a local optimal solution from the point of view of all the patterns but globally optimal from the point of view of a subset of patterns. Together with RPHP, a Kth nearest neighbor algorithm is used as a second level pattern distributor to solve a test pattern. We also show theoretically the condition under which finding several pseudo global optimal solutions requires a shorter training time than finding a single global optimal solution. As the difficulty of curve fitting problems is easily estimated, we verify the capability of the RPHP algorithm against them and compare the RPHP algorithm with three counterparts to show the benefits of hybrid learning and active recursive subset selection. The RPHP shows a clear superiority in performance. We conclude our paper by identifying possible loopholes in the RPHP algorithm and proposing possible solutions
New approach for Arabic named entity recognition on social media based on feature selection using genetic algorithm
Many features can be extracted from the massive volume of data in different types that are available nowadays on social media. The growing demand for multimedia applications was an essential factor in this regard, particularly in the case of text data. Often, using the full feature set for each of these activities can be time-consuming and can also negatively impact performance. It is challenging to find a subset of features that are useful for a given task due to a large number of features. In this paper, we employed a feature selection approach using the genetic algorithm to identify the optimized feature set. Afterward, the best combination of the optimal feature set is used to identify and classify the Arabic named entities (NEs) based on support vector. Experimental results show that our system reaches a state-of-the-art performance of the Arab NER on social media and significantly outperforms the previous systems
- …