15,012 research outputs found

    A new multi-objective wrapper method for feature selection – Accuracy and stability analysis for BCI

    Get PDF
    Feature selection is an important step in building classifiers for high-dimensional data problems, such as EEG classification for BCI applications. This paper proposes a new wrapper method for feature selection, based on a multi-objective evolutionary algorithm, where the representation of the individuals or potential solutions, along with the breeding operators and objective functions, have been carefully designed to select a small subset of features that has good generalization capability, trying to avoid the over-fitting problems that wrapper methods usually suffer. A novel feature ranking procedure is also proposed in order to analyze the stability of the proposed wrapper method. Four different classification schemes have been applied within the proposed wrapper method in order to evaluate its accuracy and stability for feature selection on a real motor imagery dataset. Experimental results show that the wrapper method presented in this paper is able to obtain very small subsets of features, which are quite stable and also achieve high classification accuracy, regardless of the classifiers used.Project TIN2015-67020-P (Spanish “Ministerio de Economía y Competitividad”)European Regional Development Funds (ERDF

    Intelligent artificial ants based feature extraction from wavelet packet coefficients for biomedical signal classification

    Full text link
    In this paper, a new feature extraction method utilizing ant colony optimization in the selection of wavelet packet transform (WPT) best basis is presented and adopted in classifying biomedical signals. The new algorithm, termed Intelligent Artificial Ants (IAA), searches the wavelet packet tree for subsets of features that best interact together to produce high classification accuracies. While traversing the WPT tree, the IAA takes into account existing correlation between features thus avoiding information redundancy. The IAA method is a mixture of filter and wrapper approaches in feature subset selection. The pheromone that the ants lay down is updated by means of an estimation of the information contents of a single feature or feature subset. The significance of the subsets selected by the ants is measured using linear discriminant analysis (LDA) classifier. The IAA method is tested on one of the most important biosignal driven applications, which is the Brain Computer Interface (BCI) problem with 56 EEG channels. Practical results indicate the significance of the proposed method achieving a maximum accuracy of 83%. ©2008 IEEE

    WRAPPER FEATURE SELECTION PADA EKSTRAKSI CIRI SINYAL ELEKTROKARDIOGRAM MENGGUNAKAN EMPIRICAL MODE DECOMPOSITION

    Get PDF
    ABSTRAKSI: Jantung adalah rongga berotot yang memompa darah lewat pembuluh darah oleh kontraksi berirama yang berulang. Dari aktifitas listrik otot jantung, dihasilkan suatu sinyal yang dinamakan elektrokardiogram. Elektrokardiogram (EKG) adalah gambaran sinyal hasil dari aktifitas impuls elektrik (kelistrikan) otot jantung selama periode waktu tertentu, yang direkam atau diinterpretasikan oleh perangkat atau alat bernama elektrokardiograf yang terhubung ke tubuh dengan prosedur non-invansif. Rekaman EKG digunakan oleh para dokter ahli untuk menentukan kondisi jantung seorang pasien. Pada tugas akhir ini menggunakan metode EMD (Empirical Mode Decomposition) dengan WFS (Wrapper Feature Selection). Konsep dasar dari EMD adalah untuk mengidentifikasi skala waktu yang tepat yang dapat menunjukkan karakteristik fisik sinyal dan kemudian mengubah sinyal ke mode intrinsik dengan fungsi, yaitu Intrinsic Mode Function (IMF). Setelah itu dilakukan seleksi fitur menggunakan Wrapper Feature Selection (WFS). Wrapper adalah salah satu tipe seleksi fitur yang bertujuan untuk mendapatkan classifier pola yang mengevaluasi subset fitur dengan akurasi prediktif dengan menggunakan statistical resampling atau cross-validation. Untuk klasifikasi menggunakan K-Nearest Neighbor, suatu metode klasifikasi terhadap objek / data baru berdasarkan jarak data baru tersebut ke beberapa data / tetangga (neighbour) terdekat. Hasil akhir dari tugas akhir ini didapatkan nilai akurasi dari sistem yang dirancang dengan metode Wrapper Feature Selection dan K-Nearest Neighbor untuk melakukan klasifikasi tipe kondisi jantung seperti Normal Sinus Rhythm, Congestif Heart Failure dan Atrial Fibrillation melalui sinyal elektrokardiogram dengan mengunakan Feature Selection mencapai 84%. Sedangkan apabila tidak menggunakan Wrapper Feature Selection tetapi hanya menggunakan K-Nearest Neighbor, nilai akurasi terbaik hanya mencapai 70%.Kata Kunci : Kelainan Jantung, Elektrokardiogram, Empirical Mode Decomposition, Wrapper Feature Selection, K-Nearest NeighborABSTRACT: Heart is a muscle of cavity that pumps blood through blood vessels with repeated rhythmic contractions. Electrocardiogram (ECG) is an overview signal of the results of an electrical impulse activity (electrical) of heart muscle during a certain time period, which is recorded or interpreted by an electrocardiograph that connected to the body with a non-invasive procedure. ECG used by medical personnel to determine a patient\u27s cardiac condition. In this final project using EMD (Empirical Mode Decomposition) and WFS (Wrapper Feature Selection). The basic concept of the EMD is to identify the appropriate time scale to demonstrate the physical characteristics of the signal and then converts the signal into Intrinsic Mode Functions (IMF). Calculation process by reducing the number of signals analyzed by mean of the number of signals, and do repeatedly to obtain a stable signal conditions. For the feature selection using the Wrapper Feature Selection (WFS). wrapper is a type of feature selection that aims to get a pattern classifier with feature subset evaluate the predictive accuracy by using a statistical resampling or cross-validation. For classification using the K-Nearest Neighbor, a method using supervised algorithms wherein one of the methods of classification of objects / new data based on the distance of the new data to some nearby data / neighbour. The result of this final project is an accuracy of the system designed by the method of Wrapper Feature Selection and K-Nearest Neighbor to classify types of heart conditions such as Normal Sinus Rhythm, congestive Heart Failure and Atrial Fibrillation through electrocardiogram reaches 84%. Whereas when not using Wrapper Feature Selection but only use K-Nearest Neighbor, the best accuracy just 70%.Keyword: Heart Disease, Electrocardiogram, Empirical Mode Decomposition, Wrapper Feature Selection, K-Nearest Neighbo

    Integration of feature subset selection methods for sentiment analysis

    Get PDF
    Feature selection is one of the main challenges in sentiment analysis to find an optimal feature subset from a real-world domain. The complexity of an optimal feature subset selection grows exponentially based on the number of features for analysing and organizing data in high-dimensional spaces that lead to the high-dimensional problems. To overcome the problem, this study attempted to enhance the feature subset selection in high-dimensional data by removing irrelevant and redundant features using filter and wrapper approaches. Initially, a filter method based on dispersion of samples on feature space known as mutual standard deviation method was developed to minimize intra-class and maximize inter-class distances. The filter-based methods have some advantages such as they are easily scaled to high-dimensional datasets and are computationally simple and fast. Besides, they only depend on feature selection space and ignore the hypothesis model space. Hence, the next step of this study developed a new feature ranking approach by integrating various filter methods. The ordinal-based and frequency-based integration of different filter methods were developed. Finally, a hybrid harmony search based on search strategy was developed and used to enhance the feature subset selection to overcome the problem of ignoring the dependency of feature selection on the classifier. Therefore, a search strategy on feature space using integration of filter and wrapper approaches was introduced to find a semantic relationship among the model selections and subsets of the search features. Comparative experiments were performed on five sentiment datasets, namely movie, music, book, electronics, and kitchen review dataset. A sizeable performance improvement was noted whereby the proposed integration-based feature subset selection method yielded a result of 98.32% accuracy in sentiment classification using POS-based features on movie reviews. Finally, a statistical test conducted based on the accuracy showed significant differences between the proposed methods and the baseline methods in almost all the comparisons in k-fold cross-validation. The findings of the study have shown the effectiveness of the mutual standard deviation and integration-based feature subset selection methods have outperformed the other baseline methods in terms of accuracy

    Short Text Classification Using An Enhanced Term Weighting Scheme And Filter-Wrapper Feature Selection

    Get PDF
    Social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Social networks, such as Twitter, are common mechanisms through which people can share information. The utilization of data that are available through social media for many applications is gradually increasing. Redundancy and noise in short texts are common problems in social media and in different applications that use short text. However, the shortness and high sparsity of short text lead to poor classification performance. Employing a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. This research aims to investigate and develop solutions for feature discrimination and selection in short texts classification. For feature discrimination, we introduce a term weighting approach namely, simple supervised weight (SW), which considers the special nature of short text in terms of term strength and distribution. To address the drawbacks of using existing feature selection with short text, this thesis proposes a filter-wrapper feature selection approach. In the first stage, we propose an adaptive filter-based feature selection method that is derived from the odd ratio method, used in reducing the dimensionality of feature space. In the second stage, grey wolf optimization (GWO) algorithm, a new heuristic search algorithm, uses the SVM accuracy as a fitness function to find the optimal subset feature

    Particle Swarm Optimisation for Feature Selection in Classification

    No full text
    Classification problems often have a large number of features, but not all of them are useful for classification. Irrelevant and redundant features may even reduce the classification accuracy. Feature selection is a process of selecting a subset of relevant features, which can decrease the dimensionality, shorten the running time, and/or improve the classification accuracy. There are two types of feature selection approaches, i.e. wrapper and filter approaches. Their main difference is that wrappers use a classification algorithm to evaluate the goodness of the features during the feature selection process while filters are independent of any classification algorithm. Feature selection is a difficult task because of feature interactions and the large search space. Existing feature selection methods suffer from different problems, such as stagnation in local optima and high computational cost. Evolutionary computation (EC) techniques are well-known global search algorithms. Particle swarm optimisation (PSO) is an EC technique that is computationally less expensive and can converge faster than other methods. PSO has been successfully applied to many areas, but its potential for feature selection has not been fully investigated. The overall goal of this thesis is to investigate and improve the capability of PSO for feature selection to select a smaller number of features and achieve similar or better classification performance than using all features. This thesis investigates the use of PSO for both wrapper and filter, and for both single objective and multi-objective feature selection, and also investigates the differences between wrappers and filters. This thesis proposes a new PSO based wrapper, single objective feature selection approach by developing new initialisation and updating mechanisms. The results show that by considering the number of features in the initialisation and updating procedures, the new algorithm can improve the classification performance, reduce the number of features and decrease computational time. This thesis develops the first PSO based wrapper multi-objective feature selection approach, which aims to maximise the classification accuracy and simultaneously minimise the number of features. The results show that the proposed multi-objective algorithm can obtain more and better feature subsets than single objective algorithms, and outperform other well-known EC based multi-objective feature selection algorithms. This thesis develops a filter, single objective feature selection approach based on PSO and information theory. Two measures are proposed to evaluate the relevance of the selected features based on each pair of features and a group of features, respectively. The results show that PSO and information based algorithms can successfully address feature selection tasks. The group based method achieves higher classification accuracies, but the pair based method is faster and selects smaller feature subsets. This thesis proposes the first PSO based multi-objective filter feature selection approach using information based measures. This work is also the first work using other two well-known multi-objective EC algorithms in filter feature selection, which are also used to compare the performance of the PSO based approach. The results show that the PSO based multiobjective filter approach can successfully address feature selection problems, outperform single objective filter algorithms and achieve better classification performance than other multi-objective algorithms. This thesis investigates the difference between wrapper and filter approaches in terms of the classification performance and computational time, and also examines the generality of wrappers. The results show that wrappers generally achieve better or similar classification performance than filters, but do not always need longer computational time than filters. The results also show that wrappers built with simple classification algorithms can be general to other classification algorithms

    Feature selection using mutual information in network intrusion detection system

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Network technologies have made significant progress in development, while the security issues alongside these technologies have not been well addressed. Current research on network security mainly focuses on developing preventative measures, such as security policies and secure communication protocols. Meanwhile, attempts have been made to protect computer systems and networks against malicious behaviours by deploying Intrusion Detection Systems (IDSs). The collaboration of IDSs and preventative measures can provide a safe and secure communication environment. Intrusion detection systems are now an essential complement to security project infrastructure of most organisations. However, current IDSs suffer from three significant issues that severely restrict their utility and performance. These issues are: a large number of false alarms, very high volume of network traffic and the classification problem when the class labels are not available. In this thesis, these three issues are addressed and efficient intrusion detection systems are developed which are effective in detecting a wide variety of attacks and result in very few false alarms and low computational cost. The principal contribution is the efficient and effective use of mutual information, which offers a solid theoretical framework for quantifying the amount of information that two random variables share with each other. The goal of this thesis is to develop an IDS that is accurate in detecting attacks and fast enough to make real-time decisions. First, a nonlinear correlation coefficient-based similarity measure to help extract both linear and nonlinear correlations between network traffic records is used. This measure is based on mutual information. The extracted information is used to develop an IDS to detect malicious network behaviours. However, the current network traffic data, which consist of a great number of traffic patterns, create a serious challenge to IDSs. Therefore, to address this issue, two feature selection methods are proposed; filter-based feature selection and hybrid feature selection algorithms, added to our current IDS for supervised classification. These methods are used to select a subset of features from the original feature set and use the selected subset to build our IDS and enhance the detection performance. The filter-based feature selection algorithm, named Flexible Mutual Information Feature Selection (FMIFS), uses the theoretical analyses of mutual information as evaluation criteria to measure the relevance between the input features and the output classes. To eliminate the redundancy among selected features, FMIFS introduces a new criterion to estimate the redundancy of the current selected features with respect to the previously selected subset of features. The hybrid feature selection algorithm is a combination of filter and wrapper algorithms. The filter method searches for the best subset of features using mutual information as a measure of relevance between the input features and the output class. The wrapper method is used to further refine the selected subset from the previous phase and select the optimal subset of features that can produce better accuracy. In addition to the supervised feature selection methods, the research is extended to unsupervised feature selection methods, and an Extended Laplacian score EL and a Modified Laplacian score ML methods are proposed which can select features in unsupervised scenarios. More specifically, each of EL and ML consists of two main phases. In the first phase, the Laplacian score algorithm is applied to rank the features by evaluating the power of locality preservation for each feature in the initial data. In the second phase, a new redundancy penalization technique uses mutual information to remove the redundancy among the selected features. The final output of these algorithms is then used to build the detection model. The proposed IDSs are then tested on three publicly available datasets, the KDD Cup 99, NSL-KDD and Kyoto dataset. Experimental results confirm the effectiveness and feasibility of these proposed solutions in terms of detection accuracy, false alarm rate, computational complexity and the capability of utilising unlabelled data. The unsupervised feature selection methods have been further tested on five more well-known datasets from the UCI Machine Learning Repository. These newly added datasets are frequently used in literature to evaluate the performance of feature selection methods. Furthermore, these datasets have different sample sizes and various numbers of features, so they are a lot more challenging for comprehensively testing feature selection algorithms. The experimental results show that ML performs better than EL and four other state-of-art methods (including the Variance score algorithm and the Laplacian score algorithm) in terms of the classification accuracy
    corecore