9,955 research outputs found

    Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation

    Full text link
    Feature selection (FS) has become an indispensable task in dealing with today's highly complex pattern recognition problems with massive number of features. In this study, we propose a new wrapper approach for FS based on binary simultaneous perturbation stochastic approximation (BSPSA). This pseudo-gradient descent stochastic algorithm starts with an initial feature vector and moves toward the optimal feature vector via successive iterations. In each iteration, the current feature vector's individual components are perturbed simultaneously by random offsets from a qualified probability distribution. We present computational experiments on datasets with numbers of features ranging from a few dozens to thousands using three widely-used classifiers as wrappers: nearest neighbor, decision tree, and linear support vector machine. We compare our methodology against the full set of features as well as a binary genetic algorithm and sequential FS methods using cross-validated classification error rate and AUC as the performance criteria. Our results indicate that features selected by BSPSA compare favorably to alternative methods in general and BSPSA can yield superior feature sets for datasets with tens of thousands of features by examining an extremely small fraction of the solution space. We are not aware of any other wrapper FS methods that are computationally feasible with good convergence properties for such large datasets.Comment: This is the Istanbul Sehir University Technical Report #SHR-ISE-2016.01. A short version of this report has been accepted for publication at Pattern Recognition Letter

    Fuzzy rough and evolutionary approaches to instance selection

    Get PDF

    A Comparative Analysis of EEG-based Stress Detection Utilizing Machine Learning and Deep Learning Classifiers with a Critical Literature Review

    Get PDF
    Background: Mental stress is considered to be a major contributor to different psychological and physical diseases. Different socio-economic issues, competition in the workplace and amongst the students, and a high level of expectations are the major causes of stress. This in turn transforms into several diseases and may extend to dangerous stages if not treated properly and timely, causing the situations such as depression, heart attack, and suicide. This stress is considered to be a very serious health abnormality. Stress is to be recognized and managed before it ruins the health of a person. This has motivated the researchers to explore the techniques for stress detection. Advanced machine learning and deep learning techniques are to be investigated for stress detection.  Methodology: A survey of different techniques used for stress detection is done here. Different stages of detection including pre-processing, feature extraction, and classification are explored and critically reviewed. Electroencephalogram (EEG) is the main parameter considered in this study for stress detection. After reviewing the state-of-the-art methods for stress detection, a typical methodology is implemented, where feature extraction is done by using principal component analysis (PCA), ICA, and discrete cosine transform. After the feature extraction, some state-of-art machine learning classifiers are employed for classification including support vector machine (SVM), K-nearest neighbor (KNN), NB, and CT. In addition to these classifiers, a typical deep-learning classifier is also utilized for detection purposes. The dataset used for the study is the Database for Emotion Analysis using Physiological Signals (DEAP) dataset. Results: Different performance measures are considered including precision, recall, F1-score, and accuracy. PCA with KNN, CT, SVM and NB have given accuracies of 65.7534%, 58.9041%, 61.6438%, and 57.5342% respectively. With ICA as feature extractor accuracies obtained are 58.9041%, 61.64384%, 57.5342%, and 54.79452% for the classifiers KNN, CT, SVM, and NB respectively. DCT is also considered a feature extractor with classical machine learning algorithms giving the accuracies of 56.16438%, 50.6849%, 54.7945%, and 45.2055% for the classifiers KNN, CT, SVM, and NB respectively. A conventional DCNN classification is performed given an accuracy of 76% and precision, recall, and F1-score of 0.66, 0.77, and 0.64 respectively. Conclusion: For EEG-based stress detection, different state-of-the-art machine learning and deep learning methods are used along with different feature extractors such as PCA, ICA, and DCT. Results show that the deep learning classifier gives an overall accuracy of 76%, which is a significant improvement over classical machine learning techniques with the accuracies as PCA+ KNN (65.75%), DCT+KNN (56.16%), and ICA+CT (61.64%)

    Multivariate NIR studies of seed-water interaction in Scots Pine Seeds (Pinus sylvestris L.)

    Get PDF
    This thesis describes seed-water interaction using near infrared (NIR) spectroscopy, multivariate regression models and Scots pine seeds. The presented research covers classification of seed viability, prediction of seed moisture content, selection of NIR wavelengths and interpretation of seed-water interaction modelled and analysed by principal component analysis, ordinary least squares (OLS), partial least squares (PLS), bi-orthogonal least squares (BPLS) and genetic algorithms. The potential of using multivariate NIR calibration models for seed classification was demonstrated using filled viable and non-viable seeds that could be separated with an accuracy of 98-99%. It was also shown that multivariate NIR calibration models gave low errors (0.7% and 1.9%) in prediction of seed moisture content for bulk seed and single seeds, respectively, using either NIR reflectance or transmittance spectroscopy. Genetic algorithms selected three to eight wavelength bands in the NIR region and these narrow bands gave about the same prediction of seed moisture content (0.6% and 1.7%) as using the whole NIR interval in the PLS regression models. The selected regions were simulated as NIR filters in OLS regression resulting in predictions of the same quality (0.7 % and 2.1%). This finding opens possibilities to apply NIR sensors in fast and simple spectrometers for the determination of seed moisture content. Near infrared (NIR) radiation interacts with overtones of vibrating bonds in polar molecules. The resulting spectra contain chemical and physical information. This offers good possibilities to measure seed-water interactions, but also to interpret processes within seeds. It is shown that seed-water interaction involves both transitions and changes mainly in covalent bonds of O-H, C-H, C=O and N-H emanating from ongoing physiological processes like seed respiration and protein metabolism. I propose that BPLS analysis that has orthonormal loadings and orthogonal scores giving the same predictions as using conventional PLS regression, should be used as a standard to harmonise the interpretation of NIR spectra

    Radiomics to predict response to neoadjuvant chemotherapy in rectal cancer: influence of simultaneous feature selection and classifier optimization

    Get PDF
    According to the guidelines, patients with locally advanced colorectal cancer undergo neoadjuvant chemotherapy. However, response to therapy is reached only up to 30% of cases. Therefore, it would be important to predict response to therapy before treatment. In this study, we demonstrated that the simultaneous optimization of feature subset and classifier parameters on different imaging datasets (T2w, DWI and PET) could improve classification performance. On a dataset of 51 patients (21 responders, 30 non responders), we obtained an accuracy of 90%, 84% and 76% using three optimized SVM classifiers fed with selected features from PET, T2w and ADC images, respectively
    corecore