34,822 research outputs found
Real-valued feature selection for process approximation and prediction
The selection of features for classification, clustering and approximation is an important task in pattern recognition, data mining and soft computing. For real-valued features, this contribution shows how feature selection for a high number of features can be implemented using mutual in-formation. Especially, the common problem for mutual information computation of computing joint probabilities for many dimensions using only a few samples is treated by using the Rènyi mutual information of order two as computational base. For this, the Grassberger-Takens corre-lation integral is used which was developed for estimating probability densities in chaos theory. Additionally, an adaptive procedure for computing the hypercube size is introduced and for real world applications, the treatment of missing values is included. The computation procedure is accelerated by exploiting the ranking of the set of real feature values especially for the example of time series. As example, a small blackbox-glassbox example shows how the relevant features and their time lags are determined in the time series even if the input feature time series determine nonlinearly the output. A more realistic example from chemical industry shows that this enables a better ap-proximation of the input-output mapping than the best neural network approach developed for an international contest. By the computationally efficient implementation, mutual information becomes an attractive tool for feature selection even for a high number of real-valued features
Wavelet feature extraction and genetic algorithm for biomarker detection in colorectal cancer data
Biomarkers which predict patient’s survival can play an important role in medical diagnosis and
treatment. How to select the significant biomarkers from hundreds of protein markers is a key step in
survival analysis. In this paper a novel method is proposed to detect the prognostic biomarkers ofsurvival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier. One dimensional discrete wavelet transform (DWT) is normally used to reduce the dimensionality of biomedical data. In this study one dimensional continuous wavelet transform (CWT) was proposed to extract the features of colorectal cancer data. One dimensional CWT has no ability to reduce
dimensionality of data, but captures the missing features of DWT, and is complementary part of DWT. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier to build its fitness function. The corresponding protein markers were
located based on the position of optimized features. Kaplan-Meier curve and Cox regression model 2 were used to evaluate the performance of selected biomarkers. Experiments were conducted on colorectal cancer dataset and several significant biomarkers were detected. A new protein biomarker CD46 was found to significantly associate with survival time
Automated design of robust discriminant analysis classifier for foot pressure lesions using kinematic data
In the recent years, the use of motion tracking systems for acquisition of functional biomechanical gait data, has received increasing interest due to the richness and accuracy of the measured kinematic information. However, costs frequently restrict the number of subjects employed, and this makes the dimensionality of the collected data far higher than the available samples. This paper applies discriminant analysis algorithms to the classification of patients with different types of foot lesions, in order to establish an association between foot motion and lesion formation. With primary attention to small sample size situations, we compare different types of Bayesian classifiers and evaluate their performance with various dimensionality reduction techniques for feature extraction, as well as search methods for selection of raw kinematic variables. Finally, we propose a novel integrated method which fine-tunes the classifier parameters and selects the most relevant kinematic variables simultaneously. Performance comparisons are using robust resampling techniques such as Bootstrapand k-fold cross-validation. Results from experimentations with lesion subjects suffering from pathological plantar hyperkeratosis, show that the proposed method can lead tocorrect classification rates with less than 10% of the original features
Multi-objective particle swarm optimization for channel selection in brain-computer interfaces
This paper presents a novel application of a multi-objective particle swarm optimization (MOPSO) method to solve the problem of effective channel selection for Brain-Computer Interface (BCI) systems. The proposed method is tested on 6 subjects and compared to another search based method, Sequential Floating Forward Search (SFFS). The results demonstrate the effectiveness of MOPSO in selecting a fewer number of channels with insignificant sacrifice in accuracy, which is very important to build robust online BCI systems
Covariate Analysis for View-point Independent Gait Recognition
Many studies have shown that gait can be deployed as a biometric. Few of these have addressed the effects of view-point and covariate factors on the recognition process. We describe the first analysis which combines view-point invariance for gait recognition which is based on a model-based pose estimation approach from a single un-calibrated camera. A set of experiments are carried out to explore how such factors including clothing, carrying conditions and view-point can affect the identification process using gait. Based on a covariate-based probe dataset of over 270 samples, a recognition rate of 73.4% is achieved using the KNN classifier. This confirms that people identification using dynamic gait features is still perceivable with better recognition rate even under the different covariate factors. As such, this is an important step in translating research from the laboratory to a surveillance environment
Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors
This paper presents a low-overhead optimizer for the ubiquitous sparse
matrix-vector multiplication (SpMV) kernel. Architectural diversity among
different processors together with structural diversity among different sparse
matrices lead to bottleneck diversity. This justifies an SpMV optimizer that is
both matrix- and architecture-adaptive through runtime specialization. To this
direction, we present an approach that first identifies the performance
bottlenecks of SpMV for a given sparse matrix on the target platform either
through profiling or by matrix property inspection, and then selects suitable
optimizations to tackle those bottlenecks. Our optimization pool is based on
the widely used Compressed Sparse Row (CSR) sparse matrix storage format and
has low preprocessing overheads, making our overall approach practical even in
cases where fast decision making and optimization setup is required. We
evaluate our optimizer on three x86-based computing platforms and demonstrate
that it is able to distinguish and appropriately optimize SpMV for the majority
of matrices in a representative test suite, leading to significant speedups
over the CSR and Inspector-Executor CSR SpMV kernels available in the latest
release of the Intel MKL library.Comment: 10 pages, 7 figures, ICPP 201
Generating Interpretable Fuzzy Controllers using Particle Swarm Optimization and Genetic Programming
Autonomously training interpretable control strategies, called policies,
using pre-existing plant trajectory data is of great interest in industrial
applications. Fuzzy controllers have been used in industry for decades as
interpretable and efficient system controllers. In this study, we introduce a
fuzzy genetic programming (GP) approach called fuzzy GP reinforcement learning
(FGPRL) that can select the relevant state features, determine the size of the
required fuzzy rule set, and automatically adjust all the controller parameters
simultaneously. Each GP individual's fitness is computed using model-based
batch reinforcement learning (RL), which first trains a model using available
system samples and subsequently performs Monte Carlo rollouts to predict each
policy candidate's performance. We compare FGPRL to an extended version of a
related method called fuzzy particle swarm reinforcement learning (FPSRL),
which uses swarm intelligence to tune the fuzzy policy parameters. Experiments
using an industrial benchmark show that FGPRL is able to autonomously learn
interpretable fuzzy policies with high control performance.Comment: Accepted at Genetic and Evolutionary Computation Conference 2018
(GECCO '18
Two-stage hybrid feature selection algorithms for diagnosing erythemato-squamous diseases
This paper proposes two-stage hybrid feature selection algorithms to build the stable and efficient diagnostic models where a new accuracy measure is introduced to assess the models. The two-stage hybrid algorithms adopt Support Vector Machines (SVM) as a classification tool, and the extended Sequential Forward Search (SFS), Sequential Forward Floating Search (SFFS), and Sequential Backward Floating Search (SBFS), respectively, as search strategies, and the generalized F-score (GF) to evaluate the importance of each feature. The new accuracy measure is used as the criterion to evaluated the performance of a temporary SVM to direct the feature selection algorithms. These hybrid methods combine the advantages of filters and wrappers to select the optimal feature subset from the original feature set to build the stable and efficient classifiers. To get the stable, statistical and optimal classifiers, we conduct 10-fold cross validation experiments in the first stage; then we merge the 10 selected feature subsets of the 10-cross validation experiments, respectively, as the new full feature set to do feature selection in the second stage for each algorithm. We repeat the each hybrid feature selection algorithm in the second stage on the one fold that has got the best result in the first stage. Experimental results show that our proposed two-stage hybrid feature selection algorithms can construct efficient diagnostic models which have got better accuracy than that built by the corresponding hybrid feature selection algorithms without the second stage feature selection procedures. Furthermore our methods have got better classification accuracy when compared with the available algorithms for diagnosing erythemato-squamous diseases
- …