96,575 research outputs found
Two-stage hybrid feature selection algorithms for diagnosing erythemato-squamous diseases
This paper proposes two-stage hybrid feature selection algorithms to build the stable and efficient diagnostic models where a new accuracy measure is introduced to assess the models. The two-stage hybrid algorithms adopt Support Vector Machines (SVM) as a classification tool, and the extended Sequential Forward Search (SFS), Sequential Forward Floating Search (SFFS), and Sequential Backward Floating Search (SBFS), respectively, as search strategies, and the generalized F-score (GF) to evaluate the importance of each feature. The new accuracy measure is used as the criterion to evaluated the performance of a temporary SVM to direct the feature selection algorithms. These hybrid methods combine the advantages of filters and wrappers to select the optimal feature subset from the original feature set to build the stable and efficient classifiers. To get the stable, statistical and optimal classifiers, we conduct 10-fold cross validation experiments in the first stage; then we merge the 10 selected feature subsets of the 10-cross validation experiments, respectively, as the new full feature set to do feature selection in the second stage for each algorithm. We repeat the each hybrid feature selection algorithm in the second stage on the one fold that has got the best result in the first stage. Experimental results show that our proposed two-stage hybrid feature selection algorithms can construct efficient diagnostic models which have got better accuracy than that built by the corresponding hybrid feature selection algorithms without the second stage feature selection procedures. Furthermore our methods have got better classification accuracy when compared with the available algorithms for diagnosing erythemato-squamous diseases
A New Search Algorithm for Feature Selection in Hyperspectral Remote Sensing Images
A new suboptimal search strategy suitable for feature selection in very high-dimensional remote-sensing images (e.g. those acquired by hyperspectral sensors) is proposed. Each solution of the feature selection problem is represented as a binary string that indicates which features are selected and which are disregarded. In turn, each binary string corresponds to a point of a multidimensional binary space. Given a criterion function to evaluate the effectiveness of a selected solution, the proposed strategy is based on the search for constrained local extremes of such a function in the above-defined binary space. In particular, two different algorithms are presented that explore the space of solutions in different ways. These algorithms are compared with the classical sequential forward selection and sequential forward floating selection suboptimal techniques, using hyperspectral remote-sensing images (acquired by the AVIRIS sensor) as a data set. Experimental results point out the effectiveness of both algorithms, which can be regarded as valid alternatives to classical methods, as they allow interesting tradeoffs between the qualities of selected feature subsets and computational cost
Predictive modelling benchmark of nitrate Vulnerable Zones at a regional scale based on Machine learning and remote sensing
Nitrate leaching losses from arable lands into groundwater were a main driver in designating Nitrate Vulnerable Zones (NVZs) according to the Nitrates Directive, with a view to enhancing their water quality. Despite this, developing common strategies for effective water quality control in these areas remains a challenge in the European Union. This paper evaluates the performance of the Random Forest (RF) machine learning algorithm combined with Feature Selection (FS) techniques in predicting nitrate pollution in NVZs groundwater bodies in different periods and using updated environmental features in Andalusia, Spain. A set of forty-four features extrinsic to groundwater bodies were used as environmental predictors, with an aim to make this methodology exportable to other regions. Phenological features obtained through remote-sensing techniques were included to measure the dynamics of agricultural activity. In addition, other dynamic features derived from weather and livestock effluents were included to analyse seasonal and interannual changes in nitrate pollution. Three feature stacks and two nitrate databases were used in the predictive modelling: Period 1 (2009), with 321 nitrate samples for training; Period 2 (2010), with 282 nitrate samples for validation and initial spatial prediction; and Period 3 (2017), to assess the changes in the probability of groundwater nitrate content exceeding 50 mg/L. Random Forest as a wrapper with four sequential search methods was considered: sequential backward selection (SBS), sequential forward selection (SFS), sequential forward floating selection (SFFS) and sequential backward floating selection (SBFS). From among all the Feature Selection methods applied, Random Forest with SFS had the best performance (overall accuracy = 0.891 and six predictor features) and linked the highest probability of nitrate pollution with three dynamic features: the Normalized Difference Vegetation Index (NDVI) base level, NDVI value for the end of the growing season and accumulated manure production of livestock farms; and three static features: slope, sediment depositional areas and valley depth
Computer science approach to the stellar fabric of violent starforming regions in AGN
In order to analyse the large numbers of Seyfert galaxy spectra available at
present, we are testing new techniques to derive their physical parameters
fastly and accurately.
We present an experiment on such a new technique to segregate old and young
stellar populations in galactic spectra using machine learning methods. We used
an ensemble of classifiers, each classifier in the ensemble specializes in
young or old populations and was trained with locally weighted regression and
tested using ten-fold cross-validation. Since the relevant information
concentrates in certain regions of the spectra we used the method of sequential
floating backward selection offline for feature selection.
Very interestingly, the application to Seyfert galaxies proved that this
technique is very insensitive to the dilution by the Active Galactic Nucleus
(AGN) continuum. Comparing with exhaustive search we concluded that both
methods are similar in terms of accuracy but the machine learning method is
faster by about two orders of magnitude.Comment: 4 pages, 1 figure, Contribution to IAU Symp. 222, The interplay among
Black Holes, Stars and ISM in Galactic Nuclei, Gramado, Brazil, 200
Search Strategies for Binary Feature Selection for a Naive Bayes Classifier
We compare in this paper several feature selection methods for the Naive
Bayes Classifier (NBC) when the data under study are described by a large
number of redundant binary indicators. Wrapper approaches guided by the NBC
estimation of the classification error probability out-perform filter
approaches while retaining a reasonable computational cost
Automated design of robust discriminant analysis classifier for foot pressure lesions using kinematic data
In the recent years, the use of motion tracking systems for acquisition of functional biomechanical gait data, has received increasing interest due to the richness and accuracy of the measured kinematic information. However, costs frequently restrict the number of subjects employed, and this makes the dimensionality of the collected data far higher than the available samples. This paper applies discriminant analysis algorithms to the classification of patients with different types of foot lesions, in order to establish an association between foot motion and lesion formation. With primary attention to small sample size situations, we compare different types of Bayesian classifiers and evaluate their performance with various dimensionality reduction techniques for feature extraction, as well as search methods for selection of raw kinematic variables. Finally, we propose a novel integrated method which fine-tunes the classifier parameters and selects the most relevant kinematic variables simultaneously. Performance comparisons are using robust resampling techniques such as Bootstrapand k-fold cross-validation. Results from experimentations with lesion subjects suffering from pathological plantar hyperkeratosis, show that the proposed method can lead tocorrect classification rates with less than 10% of the original features
- …