103 research outputs found
Clustering based Feature Selection from High Dimensional Data
Data mining techniques have been widely applied to extract knowledge from large databases. Data mining searches for relationships and global patterns that exist in large databases that are ‘hidden’ among the huge data. Feature selection involves selecting the most useful features from the given data set and reduces dimensionality. Graph clustering method is used for feature selection. Features which are most relevant to the target class and independent of other are selected from the cluster. The feature subset obtained are given to the various supervised learning algorithms to increase the learning accuracy and obtain best feature subset. The feature selection can be efficient and effective using clustering approach. Based on the criteria of efficiency in terms of time complexity and effectiveness in terms of quality of data, useful features from the big data can be selected.
DOI: 10.17762/ijritcc2321-8169.15061
Tackling Ant Colony Optimization Meta-Heuristic as Search Method in Feature Subset Selection Based on Correlation or Consistency Measures
This paper introduces the use of an ant colony optimization
(ACO) algorithm, called Ant System, as a search method in two wellknown
feature subset selection methods based on correlation or consistency
measures such as CFS (Correlation-based Feature Selection) and
CNS (Consistency-based Feature Selection). ACO guides the search using
a heuristic evaluator. Empirical results on twelve real-world classification
problems are reported. Statistical tests have revealed that InfoGain is a
very suitable heuristic for CFS or CNS feature subset selection methods
with ACO acting as search method. The use of InfoGain is shown to be
the significantly better heuristic over a range of classifiers. The results
achieved by means of ACO-based feature subset selection with the suitable
heuristic evaluator are better for most of the problems comparing
with those obtained with CFS or CNS combined with Best First search.MICYT TIN2007-68084- C02-02MICYT TIN2011-28956-C02-02Junta de Andalucía P11-TIC-752
Data Cleansing Meets Feature Selection: A Supervised Machine Learning Approach
This paper presents a novel procedure to apply in a sequential
way two data preparation techniques from a different nature such as
data cleansing and feature selection. For the former we have experienced
with a partial removal of outliers via inter-quartile range whereas for
the latter we have chosen relevant attributes with two widespread feature
subset selectors like CFS (Correlation-based Feature Selection) and
CNS (Consistency-based Feature Selection), which are founded on correlation
and consistency measures, respectively. Empirical results on seven
difficult binary and multi-class data sets, that is, with a test error rate of
at least a 10%, according to accuracy, with C4.5 or 1-nearest neighbour
classifiers without any kind of prior data pre-processing are outlined.
Non-parametric statistical tests assert that the meeting of the aforementioned
two data preparation strategies using a correlation measure for
feature selection with C4.5 algorithm is significant better, measured with
roc measure, than the single application of the data cleansing approach.
Last but not least, a weak and not very powerful learner like PART
achieved promising results with the new proposal based on a consistency
measure and is able to compete with the best configuration of C4.5. To
sum up, bearing in mind the new approach, for roc measure PART classifier
with a consistency metric behaves slightly better than C4.5 and a
correlation measureMICYT TIN2007-68084-C02- 02MICYT TIN2011-28956-C02-02Junta de Andalucía P11-TIC-752
PRZEGLĄD METOD SELEKCJI CECH UŻYWANYCH W DIAGNOSTYCE CZERNIAKA
Currently, a large number of trait selection methods are used. They are becoming more and more of interest among researchers. Some of the methods are of course used more frequently. The article describes the basics of selection-based algorithms. FS methods fall into three categories: filter wrappers, embedded methods. Particular attention was paid to finding examples of applications of the described methods in the diagnosisof skin melanoma.Obecnie stosuje się wiele metod selekcji cech. Cieszą się coraz większym zainteresowaniem badaczy. Oczywiście niektóre metody są stosowane częściej. W artykule zostały opisane podstawy działania algorytmów opartych na selekcji. Metody selekcji cech należące dzielą się na trzy kategorie: metody filtrowe, metody opakowujące, metody wbudowane. Zwrócono szczególnie uwagę na znalezienie przykładów zastosowań opisanych metod w diagnostyce czerniaka skóry
- …