    Clustering based Feature Selection from High Dimensional Data

    Data mining techniques have been widely applied to extract knowledge from large databases. Data mining searches for relationships and global patterns that exist in large databases that are ‘hidden’ among the huge data. Feature selection involves selecting the most useful features from the given data set and reduces dimensionality. Graph clustering method is used for feature selection. Features which are most relevant to the target class and independent of other are selected from the cluster. The feature subset obtained are given to the various supervised learning algorithms to increase the learning accuracy and obtain best feature subset. The feature selection can be efficient and effective using clustering approach. Based on the criteria of efficiency in terms of time complexity and effectiveness in terms of quality of data, useful features from the big data can be selected. DOI: 10.17762/ijritcc2321-8169.15061

    Tackling Ant Colony Optimization Meta-Heuristic as Search Method in Feature Subset Selection Based on Correlation or Consistency Measures

    This paper introduces the use of an ant colony optimization (ACO) algorithm, called Ant System, as a search method in two wellknown feature subset selection methods based on correlation or consistency measures such as CFS (Correlation-based Feature Selection) and CNS (Consistency-based Feature Selection). ACO guides the search using a heuristic evaluator. Empirical results on twelve real-world classification problems are reported. Statistical tests have revealed that InfoGain is a very suitable heuristic for CFS or CNS feature subset selection methods with ACO acting as search method. The use of InfoGain is shown to be the significantly better heuristic over a range of classifiers. The results achieved by means of ACO-based feature subset selection with the suitable heuristic evaluator are better for most of the problems comparing with those obtained with CFS or CNS combined with Best First search.MICYT TIN2007-68084- C02-02MICYT TIN2011-28956-C02-02Junta de Andalucía P11-TIC-752

    Data Cleansing Meets Feature Selection: A Supervised Machine Learning Approach

    This paper presents a novel procedure to apply in a sequential way two data preparation techniques from a different nature such as data cleansing and feature selection. For the former we have experienced with a partial removal of outliers via inter-quartile range whereas for the latter we have chosen relevant attributes with two widespread feature subset selectors like CFS (Correlation-based Feature Selection) and CNS (Consistency-based Feature Selection), which are founded on correlation and consistency measures, respectively. Empirical results on seven difficult binary and multi-class data sets, that is, with a test error rate of at least a 10%, according to accuracy, with C4.5 or 1-nearest neighbour classifiers without any kind of prior data pre-processing are outlined. Non-parametric statistical tests assert that the meeting of the aforementioned two data preparation strategies using a correlation measure for feature selection with C4.5 algorithm is significant better, measured with roc measure, than the single application of the data cleansing approach. Last but not least, a weak and not very powerful learner like PART achieved promising results with the new proposal based on a consistency measure and is able to compete with the best configuration of C4.5. To sum up, bearing in mind the new approach, for roc measure PART classifier with a consistency metric behaves slightly better than C4.5 and a correlation measureMICYT TIN2007-68084-C02- 02MICYT TIN2011-28956-C02-02Junta de Andalucía P11-TIC-752


    Currently, a large number of trait selection methods are used. They are becoming more and more of interest among researchers. Some of the methods are of course used more frequently. The article describes the basics of selection-based algorithms. FS methods fall into three categories: filter wrappers, embedded methods. Particular attention was paid to finding examples of applications of the described methods in the diagnosisof skin melanoma.Obecnie stosuje się wiele metod selekcji cech. Cieszą się coraz większym zainteresowaniem badaczy. Oczywiście niektóre metody są stosowane częściej. W artykule zostały opisane podstawy działania algorytmów opartych na selekcji. Metody selekcji cech należące dzielą się na trzy kategorie: metody filtrowe, metody opakowujące, metody wbudowane. Zwrócono szczególnie uwagę na znalezienie przykładów zastosowań opisanych metod w diagnostyce czerniaka skóry