Search CORE

117,019 research outputs found

Unsupervised Feature Selection Based on Self-configuration Approaches using Multidimensional Scaling

Author: Ananda Ridho
Dewi Atika Ratna
Gushelmi Gushelmi
Huda Miftahul
Mohd Amin Maifuza Binti
Publication venue: Department of Mathematics, Universitas Negeri Gorontalo
Publication date: 01/08/2023
Field of study

Some researchers often collect features so the principal information does not lose. However, many features sometimes cause problems. The truth of analysis results will decrease because of the irrelevant or repetitive features. To overcome it, one of the solutions is feature selection. They are divided into two, namely supervised and unsupervised learning. In supervised, the feature selection can only be carried out on data containing labels. Meanwhile, in unsupervised, there are three approaches correlation, configuration, and variance. This study proposes an unsupervised feature selection by combining correlation and configuration using multidimensional scaling (MDS). The proposed algorithm is MDS-Clustering, which uses hierarchical and non-hierarchical clustering. The result of MDS-clustering is compared with the existing feature selection. There are three schemes in the comparison process, namely, 75\%, 50\%, and 25\% feature selected. The dataset used in this study is the UCI dataset. The validities used are the goodness-of-fit of the proximity matrix (GoFP) and the accuracy of the classification algorithm. The comparison results show that the feature selection proposed is certainly worth recommending as a new approach in the feature selection process. Besides, on certain data, the algorithm can outperform the existing feature selection

E-Journals Universitas Negeri Gorontalo

What are the Best Hierarchical Descriptors for Complex Networks?

Author: Abe S
Clauset A Moore C Newman M E
Costa L da F
Costa L da F Kaiser M Hilgetag C
Duda R O
Erdos P
Han J
Hand D
Luciano da Fontoura Costa
McLachlan G J
Roberto Fernandes Silva Andrade
Wasserman S
Publication venue: 'IOP Publishing'
Publication date: 29/05/2007
Field of study

This work reviews several hierarchical measurements of the topology of complex networks and then applies feature selection concepts and methods in order to quantify the relative importance of each measurement with respect to the discrimination between four representative theoretical network models, namely Erd\"{o}s-R\'enyi, Barab\'asi-Albert, Watts-Strogatz as well as a geographical type of network. The obtained results confirmed that the four models can be well-separated by using a combination of measurements. In addition, the relative contribution of each considered feature for the overall discrimination of the models was quantified in terms of the respective weights in the canonical projection into two dimensions, with the traditional clustering coefficient, hierarchical clustering coefficient and neighborhood clustering coefficient resulting particularly effective. Interestingly, the average shortest path length and hierarchical node degrees contributed little for the separation of the four network models.Comment: 9 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Towards Clustering of Mobile and Smartwatch Accelerometer Data for Physical Activity Recognition

Author: Guyon
Haindl
James
Jeng
Khan
Kikhia
Kurzweil
Maimon
Maner
Rokach
Villar
Yu
Publication venue: 'MDPI AG'
Publication date: 01/06/2018
Field of study

Mobile and wearable devices now have a greater capability of sensing human activity ubiquitously and unobtrusively through advancements in miniaturization and sensing abilities. However, outstanding issues remain around the energy restrictions of these devices when processing large sets of data. This paper presents our approach that uses feature selection to refine the clustering of accelerometer data to detect physical activity. This also has a positive effect on the computational burden that is associated with processing large sets of data, as energy efficiency and resource use is decreased because less data is processed by the clustering algorithms. Raw accelerometer data, obtained from smartphones and smartwatches, have been preprocessed to extract both time and frequency domain features. Principle component analysis feature selection (PCAFS) and correlation feature selection (CFS) have been used to remove redundant features. The reduced feature sets have then been evaluated against three widely used clustering algorithms, including hierarchical clustering analysis (HCA), k-means, and density-based spatial clustering of applications with noise (DBSCAN). Using the reduced feature sets resulted in improved separability, reduced uncertainty, and improved efficiency compared with the baseline, which utilized all features. Overall, the CFS approach in conjunction with HCA produced higher Dunn Index results of 9.7001 for the phone and 5.1438 for the watch features, which is an improvement over the baseline. The results of this comparative study of feature selection and clustering, with the specific algorithms used, has not been performed previously and provides an optimistic and usable approach to recognize activities using either a smartphone or smartwatch

Multidisciplinary Digital Publishing Institute

LJMU Research Online (Liverpool John Moores University)

Crossref

Directory of Open Access Journals

Complete agglomerative hierarchy document’s clustering based on fuzzy luhn’s gibbs latent dirichlet allocation

Author: Giriantari I. A. D.
Prihatini P. M.
Putra I. K. G. D.
Sudarma M.
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/06/2019
Field of study

Agglomerative hierarchical is a bottom up clustering method, where the distances between documents can be retrieved by extracting feature values using a topic-based latent dirichlet allocation method. To reduce the number of features, term selection can be done using Luhn’s Idea. Those methods can be used to build the better clusters for document. But, there is less research discusses it. Therefore, in this research, the term weighting calculation uses Luhn’s Idea to select the terms by defining upper and lower cut-off, and then extracts the feature of terms using gibbs sampling latent dirichlet allocation combined with term frequency and fuzzy Sugeno method. The feature values used to be the distance between documents, and clustered with single, complete and average link algorithm. The evaluations show the feature extraction with and without lower cut-off have less difference. But, the topic determination of each term based on term frequency and fuzzy Sugeno method is better than Tsukamoto method in finding more relevant documents. The used of lower cut-off and fuzzy Sugeno gibbs latent dirichlet allocation for complete agglomerative hierarchical clustering have consistent metric values. This clustering method suggested as a better method in clustering documents that is more relevant to its gold standard

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

BUGOPTIMIZE: Bugs dataset Optimization with Majority Vote Cluster-Based Fine-Tuned Feature Selection for Scalable Handling

Author: Manoj Eknath Patil Sayyed Jasmin Isahak,
Publication venue: Auricle Global Society of Education and Research
Publication date: 01/03/2024
Field of study

Software bugs are prevalent in the software development lifecycle, posing challenges to developers in ensuring product quality and reliability. Accurate prediction of bug counts can significantly aid in resource allocation and prioritization of bug-fixing efforts. However, the vast number of attributes in bug datasets often requires effective feature selection techniques to enhance prediction accuracy and scalability. Existing feature selection methods, though diverse, suffer from limitations such as suboptimal feature subsets and lack of scalability. This paper proposes BUGOPTIMIZE, a novel algorithm tailored to address these challenges. BUGOPTIMIZE innovatively integrates majority voting cluster-based fine-tuned feature selection to optimize bug datasets for scalable handling and accurate prediction. The algorithm initiates by clustering the dataset using K-means, EM, and Hierarchical clustering algorithms and performs majority voting to assign data points to final clusters. It then employs filter-based, wrapper-based, and embedded feature selection techniques within each cluster to identify common features. Additionally, feature selection is applied to the entire dataset to extract another set of common features. These selected features are combined to form the final best feature set. Experimental results demonstrate the efficacy of BUGOPTIMIZE compared to existing feature selection methods, reducing MAE and RMSE in Linear Regression (MAE: 0.2668 to 0.2609, RMSE: 0.3251 to 0.308) and Random Forest (MAE: 0.1626 to 0.1341, RMSE: 0.2363 to 0.224), highlighting its significant contribution to bug dataset optimization and prediction accuracy in software development while addressing feature selection limitations. By mitigating the disadvantages of current approaches and introducing a comprehensive and scalable solution, BUGOPTIMIZE presents a significant advancement in bug dataset optimization and prediction accuracy in software development environments

International Journal on Recent and Innovation Trends in Computing and Communication

Partitional Clustering

Author: Kutbay Uğurhan
Publication venue: 'IntechOpen'
Publication date: 01/08/2018
Field of study

People are living in a world full of data. Humans are collecting data from many measurements and observations in their daily works. The sorting of these numerous data is important and necessary in terms of analyzing, reasoning, and decision-making processes. For this reason, clustering has been used in many areas and has become very important in recent years. Feature selection and classifying the data in subsets can be changed data to data. As a result of these feature selection methods, some clustering methods have been revealed. Hierarchical clustering, partitional clustering, artificial system clustering, kernel-based clustering, and sequential data clustering are determined for different clustering strategies. This chapter examines some popular partitional clustering techniques and algorithms. Partitional clustering assigns a set of data points into k-clusters by using iterative processes. The predefined criterion function (J) assigns the datum into kth number set. As a result of this criterion function value in k sets (maximization and minimization calculation), clustering can be done. This chapter starts with criterion function for clustering process. In addition, some applications will be done for each algorithm in this chapter

IntechOpen

Crossref

Feature selection by multi-objective optimization: application to network anomaly detection by hierarchical self-organizing maps.

Author: De la Hoz Correa Eduardo
De la Hoz Franco Emiro
Martínez-Álvarez Antonio
Ortega Julio
Ortiz-García Andrés
Publication venue: Elsevier
Publication date: 20/08/2014
Field of study

Feature selection is an important and active issue in clustering and classification problems. By choosing an adequate feature subset, a dataset dimensionality reduction is allowed, thus contributing to decreasing the classification computational complexity, and to improving the classifier performance by avoiding redundant or irrelevant features. Although feature selection can be formally defined as an optimisation problem with only one objective, that is, the classification accuracy obtained by using the selected feature subset, in recent years, some multi-objective approaches to this problem have been proposed. These either select features that not only improve the classification accuracy, but also the generalisation capability in case of supervised classifiers, or counterbalance the bias toward lower or higher numbers of features that present some methods used to validate the clustering/classification in case of unsupervised classifiers. The main contribution of this paper is a multi-objective approach for feature selection and its application to an unsupervised clustering procedure based on Growing Hierarchical Self-Organizing Maps (GHSOM) that includes a new method for unit labelling and efficient determination of the winning unit. In the network anomaly detection problem here considered, this multi-objective approach makes it possible not only to differentiate between normal and anomalous traffic but also among different anomalies. The efficiency of our proposals has been evaluated by using the well-known DARPA/NSL-KDD datasets that contain extracted features and labeled attacks from around 2 million connections. The selected feature sets computed in our experiments provide detection rates up to 99.8% with normal traffic and up to 99.6% with anomalous traffic, as well as accuracy values up to 99.12%.This work has been funded by FEDER funds and the Ministerio de Ciencia e Innovación of the Spanish Government under Project No. TIN2012-32039

Repositorio Institucional Universidad de Málaga

Recommended from our members

Feature-based clustering of stomach cancer gene expression data

Author: Bramble Matthew David
Publication venue
Publication date: 01/04/2019
Field of study

This report presents the results of using a probabilistic clustering technique in the analysis of microRNAseq and RNAseq data from gastric cancer tumor samples deposited at TCGA (The Cancer Genome Atlas). Using the method of Hoff, who has proposed a Dirichlet process unsupervised clustering framework with feature selection, it is possible to reveal interesting structure in gastric cancer gene expression data that relates to Epstein-Barr virus (EBV) microRNA levels. This structure is not as readily identified by a typical hierarchical clustering method, and the results of this analysis contribute to an understanding of the role of EBV viral microRNAs in gastric cancer tumors.Statistic

Texas ScholarWorks

A supervised clustering approach for fMRI-based inference of brain states

Author: Alexandre Gramfort
Bertrand Thirion
Bishop
Carroll
Christine Keribin
Cordes
Cortes
Cox
Dayan
Eger
Evelyn Eger
Fan
Filzmoser
Flandin
Friedman
Friston
Gaël Varoquaux
Ghebreab
Golland
Haynes
Haynes
He
Hughes
Johnson
Kamitani
Keller
Kontos
Kriegeskorte
Krishnapuram
Mitchell
Norman
Oliver
Palatucci
Thirion
Thyreau
Tucholka
Tzourio-Mazoyer
Ugurbil
Vincent Michel
Ward
Zou
Publication venue: 'Elsevier BV'
Publication date: 20/04/2011
Field of study

We propose a method that combines signals from many brain regions observed in functional Magnetic Resonance Imaging (fMRI) to predict the subject's behavior during a scanning session. Such predictions suffer from the huge number of brain regions sampled on the voxel grid of standard fMRI data sets: the curse of dimensionality. Dimensionality reduction is thus needed, but it is often performed using a univariate feature selection procedure, that handles neither the spatial structure of the images, nor the multivariate nature of the signal. By introducing a hierarchical clustering of the brain volume that incorporates connectivity constraints, we reduce the span of the possible spatial configurations to a single tree of nested regions tailored to the signal. We then prune the tree in a supervised setting, hence the name supervised clustering, in order to extract a parcellation (division of the volume) such that parcel-based signal averages best predict the target information. Dimensionality reduction is thus achieved by feature agglomeration, and the constructed features now provide a multi-scale representation of the signal. Comparisons with reference methods on both simulated and real data show that our approach yields higher prediction accuracy than standard voxel-based approaches. Moreover, the method infers an explicit weighting of the regions involved in the regression or classification task

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Inserm

HAL-CEA