6 research outputs found

    Abordagens multivariadas para seleção de variáveis com vistas à classificação e predição de propriedades de amostras

    Get PDF
    A seleção de variáveis é uma etapa importante para a análise de dados, visto que identifica os subconjuntos de variáveis mais informativas para a construção de modelos precisos de classificação e predição. Além disso, a seleção de variáveis facilita a interpretação e análise dos modelos obtidos, potencialmente reduzindo o tempo computacional de geração dos modelos e o custo/tempo para obtenção das amostras. Neste contexto, a presente tese apresenta proposições inovadoras de abordagens com vistas à seleção de variáveis para classificação e predição de propriedades de amostras de produtos diversos. Tais abordagens são abordadas em três artigos apresentados nesta tese, com intuito de melhorar a precisão dos modelos de classificação e predição em diferentes áreas. No primeiro artigo, integram-se índices de importância de variáveis a sistemáticas de classificação hierárquica para categorizar amostras de espumantes de acordo com seu país de origem. No segundo artigo, para selecionar as variáveis mais informativas para a predição de amostras via PLS, propõe-se um índice de importância de variáveis baseado na Lei de Lambert-Beer combinado a um processo iterativo de seleção do tipo forward. Por fim, o terceiro artigo utilizou cluster de variáveis espectrais e índice de importância para selecionar as variáveis que produzem modelos de predição mais consistentes. Em todos os artigos dessa tese, os resultados obtidos pelos métodos propostos foram superiores quando comparados a outros métodos tradicionais da literatura voltados à identificação das variáveis mais informativas.Variable selection is an important step in data analysis, since it identifies the most informative subsets of variables for build accurate classification and prediction models. In addition, variable selection improves the interpretation and analysis of obtained models, reduces the computational time to build models and reduces the obtained samples costs. In this context, this thesis presents propositions for a variable selection method aiming to classifying and predicting sample properties. Such methods are presented in three papers in this thesis, in order to improve the classification and prediction accuracy in different areas. In first paper, we applied variable importance index coupled with a hierarchical classification technique to identify the country of origin of sparkling wines. In second paper, to select the most informative variables for prediction, a variable improtance index was built based on Lambert-Beer law and an iterative forward process was performed. Finally, in third paper was used clustering of variables and variable importance index to select the variables that produce more consistent prediction models. In all papers of this thesis, when conpared to other traditional methods, our proposition obtained better results

    GAdaboost: Accelerating adaboost feature selection with genetic algorithms

    Get PDF
    Throughout recent years Machine Learning has acquired attention, due to the abundant data. Thus, devising techniques to reduce the dimensionality of data has been on going. Object detection is one of the Machine Learning techniques which suffer from this draw back. As an example, one of the most famous object detection frameworks is the Viola-Jones Rapid Object Detector, which suffers from a lengthy training process due to the vast search space, which can reach more than 160,000 features for a 24X24 image. The Viola-Jones Rapid Object Detector also uses Adaboost, which is a brute force method, and is required to pass by the set of all possible features in order to train the classifiers. Consequently, ways for reducing the whole feature set into a smaller representative one, eliminating those features that have non relevant information, were devised. The most commonly used technique for this is Feature Selection with its three categories: Filters, Wrappers and Embedded. Feature Selection has proven its success in providing fast and accurate classifiers. Wrapper methods harvest the power of evolutionary computing, most commonly Genetic Algorithms, in finding the set of representative features. This is mostly due to the Advantage of Genetic Algorithms and their power in finding adequate solutions more efficiently. In this thesis we propose GAdaboost: A Genetic Algorithm to accelerate the training procedure of the Viola-Jones Rapid Object Detector through Feature Selection. Specifically, we propose to limit the Adaboost search within a sub-set of the huge feature space, while evolving this subset following a Genetic Algorithm. Experiments demonstrate that our proposed GAdaboost is up to 3.7 times faster than Adaboost. We also demonstrate that the price of this speedup is a mere decrease (3%, 4%) in detection accuracy when tested on FDDB benchmark face detection set, and Caltech Web Faces respectivel

    Feature selection for face recognition based on multi-objective evolutionary wrappers

    Get PDF
    Feature selection is a key issue in pattern recognition, specially when prior knowledge of the most discriminant features is not available. Moreover, in order to perform the classification task with reduced complexity and acceptable performance, usually features that are irrelevant, redundant, or noisy are excluded from the problem representation. This work presents a multi-objective wrapper, based on genetic algorithms, to select the most relevant set of features for face recognition tasks. The proposed strategy explores the space of multiple feasible selections in order to minimize the cardinality of the feature subset, and at the same time to maximize its discriminative capacity. Experimental results show that, in comparison with other state-of-the-art approaches, the proposed approach allows to improve the classification performance, while reducing the representation dimensionality.Fil: Vignolo, Leandro Daniel. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Centro Cientifico Tecnológico Santa Fe. Instituto de Investigacion en Señales, Sistemas e Inteligencia Computacional; Argentina; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Centro Cientifico Tecnológico Santa Fe. Instituto de Investigacion en Señales, Sistemas e Inteligencia Computacional; Argentina; ArgentinaFil: Scharcanski, Jacob. Universidade Federal do Rio Grande do Sul. Instituto de Informatica and Dept. de Engenharia Eletrica; Brasi

    A Survey on Evolutionary Computation Approaches to Feature Selection

    Get PDF
    Feature selection is an important task in data mining and machine learning to reduce the dimensionality of the data and increase the performance of an algorithm, such as a classification algorithm. However, feature selection is a challenging task due mainly to the large search space. A variety of methods have been applied to solve feature selection problems, where evolutionary computation (EC) techniques have recently gained much attention and shown some success. However, there are no comprehensive guidelines on the strengths and weaknesses of alternative approaches. This leads to a disjointed and fragmented field with ultimately lost opportunities for improving performance and successful applications. This paper presents a comprehensive survey of the state-of-the-art work on EC for feature selection, which identifies the contributions of these different algorithms. In addition, current issues and challenges are also discussed to identify promising areas for future research.</p

    Individual and ensemble functional link neural networks for data classification

    Full text link
    This study investigated the Functional Link Neural Network (FLNN) for solving data classification problems. FLNN based models were developed using evolutionary methods as well as ensemble methods. The outcomes of the experiments covering benchmark classification problems, positively demonstrated the efficacy of the proposed models for undertaking data classification problems
    corecore