6,803 research outputs found

    Performance Study Of Uncertainty Based Feature Selection Method On Detection Of Chronic Kidney Disease With SVM Classification

    Get PDF
    Chronic Kidney Disease (CKD) is a disorder that impairs kidney function. Early signs of CKD patients are very difficult until they lose 25% of their kidney function. Therefore, early detection and effective treatment are needed to reduce the mortality rate of CKD sufferers. In this study, the authors diagnose the CKD dataset using the Support Vector Machine (SVM) classification method to obtain accurate diagnostic results. The authors propose a comparison of the result on applying the feature selec- tion method to get the best feature candidates in improving the classification result. The testing process compares the Symmetrical Uncertainty (SU) and Multivariate Symmetrical Uncertainty (MSU) feature selection method and the SVM method as a classification method. Several experimental scenarios were carried out using the SU and MSU feature selection methods using the CKD dataset. From the results of the tests carried out, it shows that using the MSU feature selection method with 80%: 20% data split produces nine important features with an accuracy value of 0.9, sensi- tivity 0.84, specification 1.0, and when viewed on the ROC graph, the MSU method graph shows the true positive value is higher than the false positive value. So the classification using the MSU feature selection method is better than the SU feature selection method by 90% accurac

    Predictive based hybrid ranker to yield significant features in writer identification

    Get PDF
    The contribution of writer identification (WI) towards personal identification in biometrics traits is known because it is easily accessible, cheaper, more reliable and acceptable as compared to other methods such as personal identification based DNA, iris and fingerprint. However, the production of high dimensional datasets has resulted into too many irrelevant or redundant features. These unnecessary features increase the size of the search space and decrease the identification performance. The main problem is to identify the most significant features and select the best subset of features that can precisely predict the authors. Therefore, this study proposed the hybridization of GRA Features Ranking and Feature Subset Selection (GRAFeSS) to develop the best subsets of highest ranking features and developed discretization model with the hybrid method (Dis-GRAFeSS) to improve classification accuracy. Experimental results showed that the methods improved the performance accuracy in identifying the authorship of features based ranking invariant discretization by substantially reducing redundant features

    High-Dimensional Software Engineering Data and Feature Selection

    Get PDF
    Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set of software metrics that have the same predictive capability as a larger set of metrics – we strive to answer that question in this paper. We present a comprehensive comparison between seven commonly-used filter-based feature ranking techniques (FRT) and our proposed hybrid feature selection (HFS) technique. Our case study consists of a very highdimensional (42 software attributes) software measurement data set obtained from a large telecommunications system. The empirical analysis indicates that HFS performs better than FRT; however, the Kolmogorov-Smirnov feature ranking technique demonstrates competitive performance. For the telecommunications system, it is found that only 10% of the software attributes are sufficient for effective software quality prediction

    Clustering based Feature Selection from High Dimensional Data

    Get PDF
    Data mining techniques have been widely applied to extract knowledge from large databases. Data mining searches for relationships and global patterns that exist in large databases that are ‘hidden’ among the huge data. Feature selection involves selecting the most useful features from the given data set and reduces dimensionality. Graph clustering method is used for feature selection. Features which are most relevant to the target class and independent of other are selected from the cluster. The feature subset obtained are given to the various supervised learning algorithms to increase the learning accuracy and obtain best feature subset. The feature selection can be efficient and effective using clustering approach. Based on the criteria of efficiency in terms of time complexity and effectiveness in terms of quality of data, useful features from the big data can be selected. DOI: 10.17762/ijritcc2321-8169.15061

    A Review of Fault Diagnosing Methods in Power Transmission Systems

    Get PDF
    Transient stability is important in power systems. Disturbances like faults need to be segregated to restore transient stability. A comprehensive review of fault diagnosing methods in the power transmission system is presented in this paper. Typically, voltage and current samples are deployed for analysis. Three tasks/topics; fault detection, classification, and location are presented separately to convey a more logical and comprehensive understanding of the concepts. Feature extractions, transformations with dimensionality reduction methods are discussed. Fault classification and location techniques largely use artificial intelligence (AI) and signal processing methods. After the discussion of overall methods and concepts, advancements and future aspects are discussed. Generalized strengths and weaknesses of different AI and machine learning-based algorithms are assessed. A comparison of different fault detection, classification, and location methods is also presented considering features, inputs, complexity, system used and results. This paper may serve as a guideline for the researchers to understand different methods and techniques in this field

    Deep fusion of multi-channel neurophysiological signal for emotion recognition and monitoring

    Get PDF
    How to fuse multi-channel neurophysiological signals for emotion recognition is emerging as a hot research topic in community of Computational Psychophysiology. Nevertheless, prior feature engineering based approaches require extracting various domain knowledge related features at a high time cost. Moreover, traditional fusion method cannot fully utilise correlation information between different channels and frequency components. In this paper, we design a hybrid deep learning model, in which the 'Convolutional Neural Network (CNN)' is utilised for extracting task-related features, as well as mining inter-channel and inter-frequency correlation, besides, the 'Recurrent Neural Network (RNN)' is concatenated for integrating contextual information from the frame cube sequence. Experiments are carried out in a trial-level emotion recognition task, on the DEAP benchmarking dataset. Experimental results demonstrate that the proposed framework outperforms the classical methods, with regard to both of the emotional dimensions of Valence and Arousal

    Feature selection for high dimensional imbalanced class data using harmony search

    Get PDF
    Misclassification costs of minority class data in real-world applications can be very high. This is a challenging problem especially when the data is also high in dimensionality because of the increase in overfitting and lower model interpretability. Feature selection is recently a popular way to address this problem by identifying features that best predict a minority class. This paper introduces a novel feature selection method call SYMON which uses symmetrical uncertainty and harmony search. Unlike existing methods, SYMON uses symmetrical uncertainty to weigh features with respect to their dependency to class labels. This helps to identify powerful features in retrieving the least frequent class labels. SYMON also uses harmony search to formulate the feature selection phase as an optimisation problem to select the best possible combination of features. The proposed algorithm is able to deal with situations where a set of features have the same weight, by incorporating two vector tuning operations embedded in the harmony search process. In this paper, SYMON is compared against various benchmark feature selection algorithms that were developed to address the same issue. Our empirical evaluation on different micro-array data sets using G-Mean and AUC measures confirm that SYMON is a comparable or a better solution to current benchmarks

    A Study of Spam E-mail classification using Feature Selection package

    Get PDF
    Feature selection (FS) is the technique of selecting a subset of relevant features for building learning models. FS algorithms typically fall into two categories: feature ranking and subset selection. Feature ranking ranks the features by a metric and eliminates all features that do not achieve an adequate score. Subset selection searches the set of possible features for the optimal subset. Many FS algorithm have been proposed. This paper presents a new FS technique which is guided by Fselector Package. The package Fselector implements a novel FS algorithm which is devoted to the feature ranking and feature subset selection of high dimensional data. This package provides functions for selecting attributes from a given dataset. Attribute subset selection is the process of identifying and removing as much of the irrelevant and redundant information as possible. The R package provides a convenient interface to the algorithm. This paper investigates the effectiveness of twelve commonly used FS methods on spam data set. One of the basic popular methods involves filter which select the subset of feature as preprocessing step independent of chosen classifier, Support vector machine classifier. The algorithm is designed as a wrapper around five classification algorithms. The short description of the algorithm and performance measure of its classification is presented with the spam data set
    corecore