11 research outputs found

    Hybrid feature selection method based on particle swarm optimization and adaptive local search method

    Get PDF
    Machine learning has been expansively examined with data classification as the most popularly researched subject. The accurateness of prediction is impacted by the data provided to the classification algorithm. Meanwhile, utilizing a large amount of data may incur costs especially in data collection and preprocessing. Studies on feature selection were mainly to establish techniques that can decrease the number of utilized features (attributes) in classification, also using data that generate accurate prediction is important. Hence, a particle swarm optimization (PSO) algorithm is suggested in the current article for selecting the ideal set of features. PSO algorithm showed to be superior in different domains in exploring the search space and local search algorithms are good in exploiting the search regions. Thus, we propose the hybridized PSO algorithm with an adaptive local search technique which works based on the current PSO search state and used for accepting the candidate solution. Having this combination balances the local intensification as well as the global diversification of the searching process. Hence, the suggested algorithm surpasses the original PSO algorithm and other comparable approaches, in terms of performance

    A New Quadratic Binary Harris Hawk Optimization For Feature Selection

    Get PDF
    Harris hawk optimization (HHO) is one of the recently proposed metaheuristic algorithms that has proven to be work more effectively in several challenging optimization tasks. However, the original HHO is developed to solve the continuous optimization problems, but not to the problems with binary variables. This paper proposes the binary version of HHO (BHHO) to solve the feature selection problem in classification tasks. The proposed BHHO is equipped with an S-shaped or V-shaped transfer function to convert the continuous variable into a binary one. Moreover, another variant of HHO, namely quadratic binary Harris hawk optimization (QBHHO), is proposed to enhance the performance of BHHO. In this study, twenty-two datasets collected from the UCI machine learning repository are used to validate the performance of proposed algorithms. A comparative study is conducted to compare the effectiveness of QBHHO with other feature selection algorithms such as binary differential evolution (BDE), genetic algorithm (GA), binary multi-verse optimizer (BMVO), binary flower pollination algorithm (BFPA), and binary salp swarm algorithm (BSSA). The experimental results show the superiority of the proposed QBHHO in terms of classification performance, feature size, and fitness values compared to other algorithms

    A wrapper approach for feature selection based on Bat Algorithm and Optimum-Path Forest

    No full text
    Besides optimizing classifier predictive performance and addressing the curse of the dimensionality problem, feature selection techniques support a classification model as simple as possible. In this paper, we present a wrapper feature selection approach based on Bat Algorithm (BA) and Optimum-Path Forest (OPF), in which we model the problem of feature selection as an binary-based optimization technique, guided by BA using the OPF accuracy over a validating set as the fitness function to be maximized. Moreover, we present a methodology to better estimate the quality of the reduced feature set. Experiments conducted over six public datasets demonstrated that the proposed approach provides statistically significant more compact sets and, in some cases, it can indeed improve the classification effectiveness

    A wrapper approach for feature selection based on Bat Algorithm and Optimum-Path Forest

    No full text
    Besides optimizing classifier predictive performance and addressing the curse of the dimensionality problem, feature selection techniques support a classification model as simple as possible. In this paper, we present a wrapper feature selection approach based on Bat Algorithm (BA) and Optimum-Path Forest (OPF), in which we model the problem of feature selection as an binary-based optimization technique, guided by BA using the OPF accuracy over a validating set as the fitness function to be maximized. Moreover, we present a methodology to better estimate the quality of the reduced feature set. Experiments conducted over six public datasets demonstrated that the proposed approach provides statistically significant more compact sets and, in some cases, it can indeed improve the classification effectiveness

    An enhanced binary bat and Markov clustering algorithms to improve event detection for heterogeneous news text documents

    Get PDF
    Event Detection (ED) works on identifying events from various types of data. Building an ED model for news text documents greatly helps decision-makers in various disciplines in improving their strategies. However, identifying and summarizing events from such data is a non-trivial task due to the large volume of published heterogeneous news text documents. Such documents create a high-dimensional feature space that influences the overall performance of the baseline methods in ED model. To address such a problem, this research presents an enhanced ED model that includes improved methods for the crucial phases of the ED model such as Feature Selection (FS), ED, and summarization. This work focuses on the FS problem by automatically detecting events through a novel wrapper FS method based on Adapted Binary Bat Algorithm (ABBA) and Adapted Markov Clustering Algorithm (AMCL), termed ABBA-AMCL. These adaptive techniques were developed to overcome the premature convergence in BBA and fast convergence rate in MCL. Furthermore, this study proposes four summarizing methods to generate informative summaries. The enhanced ED model was tested on 10 benchmark datasets and 2 Facebook news datasets. The effectiveness of ABBA-AMCL was compared to 8 FS methods based on meta-heuristic algorithms and 6 graph-based ED methods. The empirical and statistical results proved that ABBAAMCL surpassed other methods on most datasets. The key representative features demonstrated that ABBA-AMCL method successfully detects real-world events from Facebook news datasets with 0.96 Precision and 1 Recall for dataset 11, while for dataset 12, the Precision is 1 and Recall is 0.76. To conclude, the novel ABBA-AMCL presented in this research has successfully bridged the research gap and resolved the curse of high dimensionality feature space for heterogeneous news text documents. Hence, the enhanced ED model can organize news documents into distinct events and provide policymakers with valuable information for decision making

    Aprendizado ativo com aplicações ao diagnóstico de parasitos

    Get PDF
    Orientadores: Alexandre Xavier Falcão, Pedro Jussieu de RezendeTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Conjuntos de imagens têm crescido consideravelmente com o rápido avanço de inúmeras tecnologias de imagens, demandando soluções urgentes para o processamento, organização e recuperação da informação. O processamento, neste caso, objetiva anotar uma dada imagem atribuindo-na um rótulo que representa seu conteúdo semântico. A anotação é crucial para a organizaçao e recuperação efetiva da informação relacionada às imagens. No entanto, a anotação manual é inviável em grandes conjuntos de dados. Além disso, a anotação automática bem sucedida por um classificador de padrões depende fortemente da qualidade de um conjunto de treinamento reduzido. Técnicas de aprendizado ativo têm sido propostas para selecionar, a partir de um grande conjunto, amostras de treinamento representativas, com uma sugestão de rótulo que pode ser confirmado ou corrigido pelo especialista. Apesar disso, essas técnicas muitas vezes ignoram a necessidade de tempos de resposta interativos durante o processo de aprendizado ativo. Portanto, esta tese de doutorado apresenta métodos de aprendizado ativo que podem reduzir e/ou organizar um grande conjunto de dados, tal que a fase de seleção não requer reprocessá-lo inteiramente a cada iteração do aprendizado. Além disso, tal seleção pode ser interrompida quando o número de amostras desejadas, a partir do conjunto de dados reduzido e organizado, é identificado. Os métodos propostos mostram um progresso cada vez maior, primeiro apenas com a redução de dados, e em seguida com a subsequente organização do conjunto reduzido. Esta tese também aborda um problema real --- o diagnóstico de parasitos --- em que a existência de uma classe diversa (isto é, uma classe de impureza), com tamanho muito maior e amostras que são similares a alguns tipos de parasitos, torna a redução de dados consideravelmente menos eficaz. Este problema é finalmente contornado com um tipo de organização de dados diferente, que ainda permite tempos de resposta interativos e produz uma abordagem de aprendizado ativo melhor e robusta para o diagnóstico de parasitos. Os métodos desenvolvidos foram extensivamente avaliados com diferentes tipos de classificadores supervisionados e não-supervisionados utilizando conjunto de dados a partir de aplicações distintas e abordagens baselines que baseiam-se em seleção aleatória de amostras e/ou reprocessamento de todo o conjunto de dados a cada iteração do aprendizado. Por fim, esta tese demonstra que outras melhorias são obtidas com o aprendizado semi-supervisionadoAbstract: Image datasets have grown large with the fast advances and varieties of the imaging technologies, demanding urgent solutions for information processing, organization, and retrieval. Processing here aims to annotate the image by assigning to it a label that represents its semantic content. Annotation is crucial for the effective organization and retrieval of the information related to the images. However, manual annotation is unfeasible in large datasets and successful automatic annotation by a pattern classifier strongly depends on the quality of a much smaller training set. Active learning techniques have been proposed to select those representative training samples from the large dataset with a label suggestion, which can be either confirmed or corrected by the expert. Nevertheless, these techniques very often ignore the need for interactive response times during the active learning process. Therefore, this PhD thesis presents active learning methods that can reduce and/or organize the large dataset such that sample selection does not require to reprocess it entirely at every learning iteration. Moreover, it can be interrupted as soon as a desired number of samples from the reduced and organized dataset is identified. These methods show an increasing progress, first with data reduction only, and then with subsequent organization of the reduced dataset. However, the thesis also addresses a real problem --- the diagnosis of parasites --- in which the existence of a diverse class (i.e., the impurity class), with much larger size and samples that are similar to some types of parasites, makes data reduction considerably less effective. The problem is finally circumvented with a different type of data organization, which still allows interactive response times and yields a better and robust active learning approach for the diagnosis of parasites. The methods have been extensively assessed with different types of unsupervised and supervised classifiers using datasets from distinct applications and baseline approaches that rely on random sample selection and/or reprocess the entire dataset at each learning iteration. Finally, the thesis demonstrates that further improvements are obtained with semi-supervised learningDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã
    corecore