579,348 research outputs found

    Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach

    Get PDF
    Feature selection is playing an increasingly significant role with respect to many computer vision applications spanning from object recognition to visual object tracking. However, most of the recent solutions in feature selection are not robust across different and heterogeneous set of data. In this paper, we address this issue proposing a robust probabilistic latent graph-based feature selection algorithm that performs the ranking step while considering all the possible subsets of features, as paths on a graph, bypassing the combinatorial problem analytically. An appealing characteristic of the approach is that it aims to discover an abstraction behind low-level sensory data, that is, relevancy. Relevancy is modelled as a latent variable in a PLSA-inspired generative process that allows the investigation of the importance of a feature when injected into an arbitrary set of cues. The proposed method has been tested on ten diverse benchmarks, and compared against eleven state of the art feature selection methods. Results show that the proposed approach attains the highest performance levels across many different scenarios and difficulties, thereby confirming its strong robustness while setting a new state of the art in feature selection domain.Comment: Accepted at the IEEE International Conference on Computer Vision (ICCV), 2017, Venice. Preprint cop

    Data mining of many-attribute data : investigating the interaction between feature selection strategy and statistical features of datasets

    Get PDF
    In many datasets, there is a very large number of attributes (e.g. many thousands). Such datasets can cause many problems for machine learning methods. Various feature selection (FS) strategies have been developed to address these problems. The idea of an FS strategy is to reduce the number of features in a dataset (e.g. from many thousands to a few hundred) so that machine learning and/or statistical analysis can be done much more quickly and effectively. Obviously, FS strategies attempt to select the features that are most important, considering the machine learning task to be done. The work presented in this dissertation concerns the comparison between several popular feature selection strategies, and, in particular, investigation of the interaction between feature selection strategy and simple statistical features of the dataset. The basic hypothesis, not investigated before, is that the correct choice of FS strategy for a particular dataset should be based on a simple (at least) statistical analysis of the dataset. First, we examined the performance of several strategies on a selection of datasets. Strategies examined were: four widely-used FS strategies (Correlation, Relief F, Evolutionary Algorithm, no-feature-selection), several feature bias (FB) strategies (in which the machine learning method considers all features, but makes use of bias values suggested by the FB strategy), and also combinations of FS and FB strategies. The results showed us that FB methods displayed strong capability on some datasets and that combined strategies were also often successful. Examining these results, we noted that patterns of performance were not immediately understandable. This led to the above hypothesis (one of the main contributions of the thesis) that statistical features of the dataset are an important consideration when choosing an FS strategy. We then investigated this hypothesis with several further experiments. Analysis of the results revealed that a simple statistical feature of a dataset, that can be easily pre-calculated, has a clear relationship with the performance Silang Luo PHD-06-2009 Page 2 of certain FS methods, and a similar relationship with differences in performance between certain pairs of FS strategies. In particular, Correlation based FS is a very widely-used FS technique based on the basic hypothesis that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. By analysing the outcome of several FS strategies on different artificial datasets, the experiments suggest that CFS is never the best choice for poorly correlated data. Finally, considering several methods, we suggest tentative guidelines for choosing an FS strategy based on simply calculated measures of the dataset

    Exploring EEG Features in Cross-Subject Emotion Recognition

    Get PDF
    Recognizing cross-subject emotions based on brain imaging data, e.g., EEG, has always been difficult due to the poor generalizability of features across subjects. Thus, systematically exploring the ability of different EEG features to identify emotional information across subjects is crucial. Prior related work has explored this question based only on one or two kinds of features, and different findings and conclusions have been presented. In this work, we aim at a more comprehensive investigation on this question with a wider range of feature types, including 18 kinds of linear and non-linear EEG features. The effectiveness of these features was examined on two publicly accessible datasets, namely, the dataset for emotion analysis using physiological signals (DEAP) and the SJTU emotion EEG dataset (SEED). We adopted the support vector machine (SVM) approach and the "leave-one-subject-out" verification strategy to evaluate recognition performance. Using automatic feature selection methods, the highest mean recognition accuracy of 59.06% (AUC = 0.605) on the DEAP dataset and of 83.33% (AUC = 0.904) on the SEED dataset were reached. Furthermore, using manually operated feature selection on the SEED dataset, we explored the importance of different EEG features in cross-subject emotion recognition from multiple perspectives, including different channels, brain regions, rhythms, and feature types. For example, we found that the Hjorth parameter of mobility in the beta rhythm achieved the best mean recognition accuracy compared to the other features. Through a pilot correlation analysis, we further examined the highly correlated features, for a better understanding of the implications hidden in those features that allow for differentiating cross-subject emotions. Various remarkable observations have been made. The results of this paper validate the possibility of exploring robust EEG features in cross-subject emotion recognition

    Pattern Classification Using an Olfactory Model with PCA Feature Selection in Electronic Noses: Study and Application

    Get PDF
    Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor) as well as its parallel channels (inner factor). The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6∼8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3∼5 pattern classes considering the trade-off between time consumption and classification rate

    Feature selection in detection of adverse drug reactions from the Health Improvement Network (THIN) database

    Get PDF
    Adverse drug reaction (ADR) is widely concerned for public health issue. ADRs are one of most common causes to withdraw some drugs from market. Prescription event monitoring (PEM) is an important approach to detect the adverse drug reactions. The main problem to deal with this method is how to automatically extract the medical events or side effects from high-throughput medical events, which are collected from day to day clinical practice. In this study we propose a novel concept of feature matrix to detect the ADRs. Feature matrix, which is extracted from big medical data from The Health Improvement Network (THIN) database, is created to characterize the medical events for the patients who take drugs. Feature matrix builds the foundation for the irregular and big medical data. Then feature selection methods are performed on feature matrix to detect the significant features. Finally the ADRs can be located based on the significant features. The experiments are carried out on three drugs: Atorvastatin, Alendronate, and Metoclopramide. Major side effects for each drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on computerized methods, further investigation is needed.Comment: International Journal of Information Technology and Computer Science (IJITCS), in print, 201

    A review on feature extraction and feature selection for handwritten character recognition

    Get PDF
    The development of handwriting character recognition (HCR) is an interesting area in pattern recognition. HCR system consists of a number of stages which are preprocessing, feature extraction, classification and followed by the actual recognition. It is generally agreed that one of the main factors influencing performance in HCR is the selection of an appropriate set of features for representing input samples. This paper provides a review of these advances. In a HCR, the set of features plays as main issues, as procedure in choosing the relevant feature that yields minimum classification error. To overcome these issues and maximize classification performance, many techniques have been proposed for reducing the dimensionality of the feature space in which data have to be processed. These techniques, generally denoted as feature reduction, may be divided in two main categories, called feature extraction and feature selection. A large number of research papers and reports have already been published on this topic. In this paper we provide an overview of some of the methods and approach of feature extraction and selection. Throughout this paper, we apply the investigation and analyzation of feature extraction and selection approaches in order to obtain the current trend. Throughout this paper also, the review of metaheuristic harmony search algorithm (HSA) has provide

    Radiomics in the characterization of lipid-poor adrenal adenomas at unenhanced CT: time to look beyond usual density metrics

    Get PDF
    Objectives: In this study, we developed a radiomic signature for the classification of benign lipid-poor adenomas, which may potentially help clinicians limit the number of unnecessary investigations in clinical practice. Indeterminate adrenal lesions of benign and malignant nature may exhibit different values of key radiomics features. Methods: Patients who had available histopathology reports and a non-contrast-enhanced CT scan were included in the study. Radiomics feature extraction was done after the adrenal lesions were contoured. The primary feature selection and prediction performance scores were calculated using the least absolute shrinkage and selection operator (LASSO). To eliminate redundancy, the best-performing features were further examined using the Pearson correlation coefficient, and new predictive models were created. Results: This investigation covered 50 lesions in 48 patients. After LASSO-based radiomics feature selection, the test dataset’s 30 iterations of logistic regression models produced an average performance of 0.72. The model with the best performance, made up of 13 radiomics features, had an AUC of 0.99 in the training phase and 1.00 in the test phase. The number of features was lowered to 5 after performing Pearson’s correlation to prevent overfitting. The final radiomic signature trained a number of machine learning classifiers, with an average AUC of 0.93. Conclusions: Including more radiomics features in the identification of adenomas may improve the accuracy of NECT and reduce the need for additional imaging procedures and clinical workup, according to this and other recent radiomics studies that have clear points of contact with current clinical practice. Clinical relevance statement: The study developed a radiomic signature using unenhanced CT scans for classifying lipid-poor adenomas, potentially reducing unnecessary investigations that scored a final accuracy of 93%. Key Points: • Radiomics has potential for differentiating lipid-poor adenomas and avoiding unnecessary further investigations. • Quadratic mean, strength, maximum 3D diameter, volume density, and area density are promising predictors for adenomas. • Radiomics models reach high performance with average AUC of 0.95 in the training phase and 0.72 in the test phase
    corecore