99,446 research outputs found

    An Effective Algorithm for Correlation Attribute Subset Selection by Using Genetic Algorithm Based On Naive Bays Classifier

    Full text link
    In recent years, application of feature selection methods in various datasets has greatly increased. Feature selection is an important topic in data mining, especially for high dimensional datasets. Feature selection (also known as subset selection) is a process commonly used in machine learning, wherein subsets of the features available from the data are selected for application of a learning algorithm. The main idea of feature selection is to choose a subset of input variables by eliminating features with little or no predictive information. The challenging task in feature selection is how to obtain an optimal subset of relevant and non redundant features which will give an optimal solution without increasing the complexity of the modeling task. Feature selection that selects a subset of most salient features and removes irrelevant, redundant and noisy features is a process commonly employed in machine learning to solve the high dimensionality problem. It focuses learning algorithms on most useful aspects of data, thereby making learning task faster and more accurate. A data warehouse is designed to consolidate and maintain all features that are relevant for the analysis processes

    Application of Global Optimization Methods for Feature Selection and Machine Learning

    Get PDF
    The feature selection process constitutes a commonly encountered problem of global combinatorial optimization. The process reduces the number of features by removing irrelevant and redundant data. This paper proposed a novel immune clonal genetic algorithm based on immune clonal algorithm designed to solve the feature selection problem. The proposed algorithm has more exploration and exploitation abilities due to the clonal selection theory, and each antibody in the search space specifies a subset of the possible features. Experimental results show that the proposed algorithm simplifies the feature selection process effectively and obtains higher classification accuracy than other feature selection algorithms

    Embedding Feature Selection for Large-scale Hierarchical Classification

    Full text link
    Large-scale Hierarchical Classification (HC) involves datasets consisting of thousands of classes and millions of training instances with high-dimensional features posing several big data challenges. Feature selection that aims to select the subset of discriminant features is an effective strategy to deal with large-scale HC problem. It speeds up the training process, reduces the prediction time and minimizes the memory requirements by compressing the total size of learned model weight vectors. Majority of the studies have also shown feature selection to be competent and successful in improving the classification accuracy by removing irrelevant features. In this work, we investigate various filter-based feature selection methods for dimensionality reduction to solve the large-scale HC problem. Our experimental evaluation on text and image datasets with varying distribution of features, classes and instances shows upto 3x order of speed-up on massive datasets and upto 45% less memory requirements for storing the weight vectors of learned model without any significant loss (improvement for some datasets) in the classification accuracy. Source Code: https://cs.gmu.edu/~mlbio/featureselection.Comment: IEEE International Conference on Big Data (IEEE BigData 2016

    Efficient Feature Subset Selection Algorithm for High Dimensional Data

    Get PDF
    Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Existing Feature selection algorithms take more time to obtain feature subset for high dimensional data. This paper proposes a feature selection algorithm based on Information gain measures for high dimensional data termed as IFSA (Information gain based Feature Selection Algorithm) to produce optimal feature subset in efficient time and improve the computational performance of learning algorithms. IFSA algorithm works in two folds: First apply filter on dataset. Second produce the small feature subset by using information gain measure. Extensive experiments are carried out to compare proposed algorithm and other methods with respect to two different classifiers (Naive bayes and IBK) on microarray and text data sets. The results demonstrate that IFSA not only produces the most select feature subset in efficient time but also improves the classifier performance

    Performing Feature Selection with ACO

    Get PDF
    Summary. The main aim of feature selection is to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. In real world problems FS is a must due to the abundance of noisy, irrelevant or misleading features. However, current methods are inadequate at finding optimal reductions. This chapter presents a feature selection mechanism based on Ant Colony Optimization in an attempt to combat this. The method is then applied to the problem of finding optimal feature subsets in the fuzzy-rough data reduction process. The present work is applied to two very different challenging tasks, namely web classification and complex systems monitoring.

    Predictive based hybrid ranker to yield significant features in writer identification

    Get PDF
    The contribution of writer identification (WI) towards personal identification in biometrics traits is known because it is easily accessible, cheaper, more reliable and acceptable as compared to other methods such as personal identification based DNA, iris and fingerprint. However, the production of high dimensional datasets has resulted into too many irrelevant or redundant features. These unnecessary features increase the size of the search space and decrease the identification performance. The main problem is to identify the most significant features and select the best subset of features that can precisely predict the authors. Therefore, this study proposed the hybridization of GRA Features Ranking and Feature Subset Selection (GRAFeSS) to develop the best subsets of highest ranking features and developed discretization model with the hybrid method (Dis-GRAFeSS) to improve classification accuracy. Experimental results showed that the methods improved the performance accuracy in identifying the authorship of features based ranking invariant discretization by substantially reducing redundant features

    Angle Modulated Artificial Bee Colony Algorithms for Feature Selection

    Get PDF
    Optimal feature subset selection is an important and a difficult task for pattern classification, data mining, and machine intelligence applications. The objective of the feature subset selection is to eliminate the irrelevant and noisy feature in order to select optimum feature subsets and increase accuracy. The large number of features in a dataset increases the computational complexity thus leading to performance degradation. In this paper, to overcome this problem, angle modulation technique is used to reduce feature subset selection problem to four-dimensional continuous optimization problem instead of presenting the problem as a high-dimensional bit vector. To present the effectiveness of the problem presentation with angle modulation and to determine the efficiency of the proposed method, six variants of Artificial Bee Colony (ABC) algorithms employ angle modulation for feature selection. Experimental results on six high-dimensional datasets show that Angle Modulated ABC algorithms improved the classification accuracy with fewer feature subsets
    corecore