1,702 research outputs found

    Feature Selection for Classification with QAOA

    Get PDF
    Feature selection is of great importance in Machine Learning, where it can be used to reduce the dimensionality of classification, ranking and prediction problems. The removal of redundant and noisy features can improve both the accuracy and scalability of the trained models. However, feature selection is a computationally expensive task with a solution space that grows combinatorically. In this work, we consider in particular a quadratic feature selection problem that can be tackled with the Quantum Approximate Optimization Algorithm (QAOA), already employed in combinatorial optimization. First we represent the feature selection problem with the QUBO formulation, which is then mapped to an Ising spin Hamiltonian. Then we apply QAOA with the goal of finding the ground state of this Hamiltonian, which corresponds to the optimal selection of features. In our experiments, we consider seven different real-world datasets with dimensionality up to 21 and run QAOA on both a quantum simulator and, for small datasets, the 7-qubit IBM (ibm-perth) quantum computer. We use the set of selected features to train a classification model and evaluate its accuracy. Our analysis shows that it is possible to tackle the feature selection problem with QAOA and that currently available quantum devices can be used effectively. Future studies could test a wider range of classification models as well as improve the effectiveness of QAOA by exploring better performing optimizers for its classical step

    Feature Selection for Classification under Anonymity Constraint

    Get PDF
    Over the last decade, proliferation of various online platforms and their increasing adoption by billions of users have heightened the privacy risk of a user enormously. In fact, security researchers have shown that sparse microdata containing information about online activities of a user although anonymous, can still be used to disclose the identity of the user by cross-referencing the data with other data sources. To preserve the privacy of a user, in existing works several methods (k-anonymity, l-diversity, differential privacy) are proposed that ensure a dataset which is meant to share or publish bears small identity disclosure risk. However, the majority of these methods modify the data in isolation, without considering their utility in subsequent knowledge discovery tasks, which makes these datasets less informative. In this work, we consider labeled data that are generally used for classification, and propose two methods for feature selection considering two goals: first, on the reduced feature set the data has small disclosure risk, and second, the utility of the data is preserved for performing a classification task. Experimental results on various real-world datasets show that the method is effective and useful in practice

    Feature Selection for Classification with Artificial Bee Colony Programming

    Get PDF
    Feature selection and classification are the most applied machine learning processes. In the feature selection, it is aimed to find useful properties containing class information by eliminating noisy and unnecessary features in the data sets and facilitating the classifiers. Classification is used to distribute data among the various classes defined on the resulting feature set. In this chapter, artificial bee colony programming (ABCP) is proposed and applied to feature selection for classification problems on four different data sets. The best models are obtained by using the sensitivity fitness function defined according to the total number of classes in the data sets and are compared with the models obtained by genetic programming (GP). The results of the experiments show that the proposed technique is accurate and efficient when compared with GP in terms of critical features selection and classification accuracy on well-known benchmark problems

    Filter � GA Based Approach to Feature Selection for Classification

    Get PDF
    This paper presents a new approach to select reduced number of features in databases. Every database has a given number of features but it is observed that some of these features can be redundant and can be harmful as well as and can confuse the process of classification. The proposed method applies filter attribute measure and binary coded Genetic Algorithm to select a small subset of features. The importance of these features is judged by applying K-nearest neighbor (KNN) method of classification. The best reduced subset of features which has high classification accuracy on given databases is adopted. The classification accuracy obtained by proposed method is compared with that reported recently in publications on twenty eight databases. It is noted that proposed method performs satisfactory on these databases and achieves higher classification accuracy but with smaller number of features

    Self-adaptive MOEA feature selection for classification of bankruptcy prediction data

    Get PDF
    Article ID 314728Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved.This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy).The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier.This work was partially supported by the Portuguese Foundation for Science and Technology under Grant PEst-C/CTM/LA0025/2011 (Strategic Project-LA 25-2011-2012) and by the Spanish Ministerio de Ciencia e Innovacion, under the project "Gestion de movilidad efficiente y sostenible, MOVES" with Grant Reference TIN2011-28336

    ALGORITHM COMPARISON AND FEATURE SELECTION FOR CLASSIFICATION OF BROILER CHICKEN HARVEST

    Get PDF
    Broiler chickens are the result of superior breeds that produce a lot of meat. In practice, however, many breeders experience crop failure, which has a serious impact on the economy and can also affect farmer quality, resulting in sanctions. The value of the performance index produced at harvest indicates the success rate of harvesting broiler chickens. Broiler crop yield data can be used to help classify broiler crop yield data using an approach method. The CRISP-DM (Cross Industry Standard Process for Data Mining) method was used in this study's data mining technique. This study compares 3 classification algorithms to determine the best algorithm and 3 feature selection methods to determine the best method for improving algorithm performance. According to the findings of this study, the Random Forest algorithm is the best algorithm for classifying harvest data, with an accuracy rate of 89.14 percent. The best way to improve the algorithm's performance is to use the Backward Elimination method, which can increase the accuracy by 7.53 percent. As a result, the Random Forest + Backward Elimination algorithm yields an accuracy value of 96.67 percent. According to this study, the factors that influence crop yield increase are FCR, number of harvests, and body weight

    Feature selection for classification of nucleic acid sequences

    Get PDF
    • …
    corecore