Search CORE

78,671 research outputs found

Clustering based feature selection using Partitioning Around Medoids (PAM)

Author: Ismi Dewi Pramudi
Murinto Murinto
Publication venue: 'Universitas Ahmad Dahlan, Kampus 3'
Publication date: 19/05/2020
Field of study

High-dimensional data contains a large number of features. With many features, high dimensional data requires immense computational resources, including space and time. Several studies indicate that not all features of high dimensional data are relevant to classification result. Dimensionality reduction is inevitable and is required due to classifier performance improvement. Several dimensionality reduction techniques were carried out, including feature selection techniques and feature extraction techniques. Sequential forward feature selection and backward feature selection are feature selection using the greedy approach. The heuristics approach is also applied in feature selection, using the Genetic Algorithm, PSO, and Forest Optimization Algorithm. PCA is the most well-known feature extraction method. Besides, other methods such as multidimensional scaling and linear discriminant analysis. In this work, a different approach is applied to perform feature selection. Cluster analysis based feature selection using Partitioning Around Medoids (PAM) clustering is carried out. Our experiment results showed that classification accuracy gained when using feature vectors' medoids to represent the original dataset is high, above 80%

Journal of Education and Learning (EduLearn)

UAD Journal Management System

Similarity Based Entropy on Feature Selection for High Dimensional Data Classification

Author: Arifin A. Z. (Agus)
Muchtar M. (Mutmainnah)
Sari J. Y. (Jayanti)
Zarkasi M. (Mohammad)
Publication venue: Indonesian Society for Soft Computing
Publication date: 01/01/2014
Field of study

Curse of dimensionality is a major problem in most classification tasks. Feature transformation and feature selection as a feature reduction method can be applied to overcome this problem. Despite of its good performance, feature transformation is not easily interpretable because the physical meaning of the original features cannot be retrieved. On the other side, feature selection with its simple computational process is able to reduce unwanted features and visualize the data to facilitate data understanding. We propose a new feature selection method using similarity based entropy to overcome the high dimensional data problem. Using 6 datasets with high dimensional feature, we have computed the similarity between feature vector and class vector. Then we find the maximum similarity that can be used for calculating the entropy values of each feature. The selected features are features that having higher entropy than mean entropy of overall features. The fuzzy k-NN classifier was implemented to evaluate the selected features. The experiment result shows that proposed method is able to deal with high dimensional data problem with average accuracy of 80.5%

Neliti

Directory of Open Access Journals

Jurnal Ilmu Komputer dan Informasi

Efficient Feature Subset Selection Algorithm for High Dimensional Data

Author: Chormunge Smita
Jena Sudarson
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2016
Field of study

Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Existing Feature selection algorithms take more time to obtain feature subset for high dimensional data. This paper proposes a feature selection algorithm based on Information gain measures for high dimensional data termed as IFSA (Information gain based Feature Selection Algorithm) to produce optimal feature subset in efficient time and improve the computational performance of learning algorithms. IFSA algorithm works in two folds: First apply filter on dataset. Second produce the small feature subset by using information gain measure. Extensive experiments are carried out to compare proposed algorithm and other methods with respect to two different classifiers (Naive bayes and IBK) on microarray and text data sets. The results demonstrate that IFSA not only produces the most select feature subset in efficient time but also improves the classifier performance

IAES journal

Crossref

Institute of Advanced Engineering and Science

A new approach for feature extraction from functional MR images

Author: ÖZMEN Güzin
ÖZŞEN Seral
Publication venue: Selçuk Üniversitesi Teknoloji Fakültesi
Publication date: 13/01/2019
Field of study

The functional MR images consist of very high dimensional data containing thousands of voxels, even for a single subject. Data reduction methods are inevitable for the classification of these three-dimensional images. In this study in the first step of the data reduction, the first level statistical analysis was applied to fMRI data and brain maps of each subject were obtained for the feature extraction. In the second step the feature selection was applied to brain maps. According to the feature selection method used in the classification studies of fMRI and which is called as the active method, the intensity values of all brain voxels are ranked from high to low and some of these features are presented to the classifier. However, the location information of the voxels is lost with this method. In this study, a new feature extraction method was presented for use in the classification of fMRI. According to this method, active voxels can be used as features by considering brain maps obtained in three dimensions as slice based. Since the functional MR images have big data sets, the selected features were once again reduced by Principal Component Analysis and the voxel intensity values were presented to the classifiers. As a result; 83.9% classification accuracy was obtained by using kNN classifier with purposed slice-based feature extraction method and it was seen that the slice-based feature extraction method increased the classification.The functional MR images consist of very high dimensional data containing thousands of voxels, even for a single subject. Data reduction methods are inevitable for the classification of these three-dimensional images. In this study in the first step of the data reduction, the first level statistical analysis was applied to fMRI data and brain maps of each subject were obtained for the feature extraction. In the second step the feature selection was applied to brain maps. According to the feature selection method used in the classification studies of fMRI and which is called as the active method, the intensity values of all brain voxels are ranked from high to low and some of these features are presented to the classifier. However, the location information of the voxels is lost with this method. In this study, a new feature extraction method was presented for use in the classification of fMRI. According to this method, active voxels can be used as features by considering brain maps obtained in three dimensions as slice based. Since the functional MR images have big data sets, the selected features were once again reduced by Principal Component Analysis and the voxel intensity values were presented to the classifiers. As a result; 83.9% classification accuracy was obtained by using kNN classifier with purposed slice-based feature extraction method and it was seen that the slice-based feature extraction method increased the classification

Selçuk-Teknik Dergisi (E-Journal - Selçuk Üniversiti)

Clustering based feature selection using Partitioning Around Medoids (PAM)

Author: ISMI DEWI PRAMUDI
Murinto Murinto
Publication venue: 'Universitas Ahmad Dahlan'
Publication date
Field of study

Universitas Ahmad Dahlan Repository

Selection of online Features and its Application

Author: Adhav Pradip N., Bhalerao Amol R., Deore Tushar J., Shinde Nitin M.
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/09/2015
Field of study

Selection of Online Feature is significant important concept in data mining. Batch learning is the mostly used learning algorithm in feature selection. Instead of Batch learning, online learning is most efficient and scalable machine learning method. Most existing system studies of online learning should access the data related to features. But accessing all data becomes a problem when we deal with high dimensional data. To avoid this limitation we proposed system in this online learner allowed to operate a classifier having fixed and small number of features related data. But the significant challenge Selection of online features (SOF) is how to construct accurate prediction for a data using a small number of operative features. To develop novel Selection of Online Feature algorithms to perform a various tasks of Selection of Online Feature by using semi supervised and supervised with unlabeled and label data for full input and partial input. Hence it provides integrity and scalability to the data storage system efficiently and users will be accessing the data through online

International Journal on Recent and Innovation Trends in Computing and Communication

On feature selection protocols for very low-sample-size data

Author: Kuncheva Ludmila I. .
Rodríguez Diez Juan José
Publication venue: 'Elsevier BV'
Publication date: 01/09/2020
Field of study

High-dimensional data with very few instances are typical in many application domains. Selecting a highly discriminative subset of the original features is often the main interest of the end user. The widely-used feature selection protocol for such type of data consists of two steps. First, features are selected from the data (possibly through cross-validation), and, second, a cross-validation protocol is applied to test a classifier using the selected features. The selected feature set and the testing accuracy are then returned to the user. For the lack of a better option, the same low-sample-size dataset is used in both steps. Questioning the validity of this protocol, we carried out an experiment using 24 high-dimensional datasets, three feature selection methods and five classifier models. We found that the accuracy returned by the above protocol is heavily biased, and therefore propose an alternative protocol which avoids the contamination by including both steps in a single cross-validation loop. Statistical tests verify that the classification accuracy returned by the proper protocol is significantly closer to the true accuracy (estimated from an independent testing set) compared to that returned by the currently favoured protocol.project RPG-2015-188 funded by The Leverhulme Trust, UK and by project TIN2015-67534-P (MINECO/FEDER, UE) funded by the Ministerio de Economía y Competitividad of the Spanish Government and European Union FEDER fund

Repositorio Institucional de la Universidad de Burgos