2,998 research outputs found

    Numerical Analysis for Relevant Features in Intrusion Detection (NARFid)

    Get PDF
    Identification of cyber attacks and network services is a robust field of study in the machine learning community. Less effort has been focused on understanding the domain space of real network data in identifying important features for cyber attack and network service classification. Motivations for such work allow for anomaly detection systems with less requirements on data “sniffed” off the network, extraction of features from the traffic, reduced learning time of algorithms, and ideally increased classification performance of anomalous behavior. This thesis evaluates the usefulness of a good feature subset for the general classification task of identifying cyber attacks and network services. The generality of the selected features elucidates the relevance or irrelevance of the feature set for the classification task of intrusion detection. Additionally, the thesis provides an extension to the Bhattacharyya method, which selects features by means of inter-class separability (Bhattacharyya coefficient). The extension for multiple class problems selects a minimal set of features with the best separability across all class pairs. Several feature selection algorithms (e.g., accuracy rate with genetic algorithm, RELIEF-F, GRLVQI, median Bhattacharyya and minimum surface Bhattacharyya methods) create feature subsets that describe the decision boundary for intrusion detection problems. The selected feature subsets maintain or improve the classification performance for at least three out of the four anomaly detectors (i.e., classifiers) under test. The feature subsets, which illustrate generality for the intrusion detection problem, range in size from 12 to 27 features. The original feature set consists of 248 features. Of the feature subsets demonstrating generality, the extension to the Bhattacharyya method generates the second smallest feature subset. This thesis quantitatively demonstrates that a relatively small feature set may be used for intrusion detection with machine learning classifiers

    Automated design of robust discriminant analysis classifier for foot pressure lesions using kinematic data

    Get PDF
    In the recent years, the use of motion tracking systems for acquisition of functional biomechanical gait data, has received increasing interest due to the richness and accuracy of the measured kinematic information. However, costs frequently restrict the number of subjects employed, and this makes the dimensionality of the collected data far higher than the available samples. This paper applies discriminant analysis algorithms to the classification of patients with different types of foot lesions, in order to establish an association between foot motion and lesion formation. With primary attention to small sample size situations, we compare different types of Bayesian classifiers and evaluate their performance with various dimensionality reduction techniques for feature extraction, as well as search methods for selection of raw kinematic variables. Finally, we propose a novel integrated method which fine-tunes the classifier parameters and selects the most relevant kinematic variables simultaneously. Performance comparisons are using robust resampling techniques such as Bootstrap632+632+and k-fold cross-validation. Results from experimentations with lesion subjects suffering from pathological plantar hyperkeratosis, show that the proposed method can lead tosim96sim 96%correct classification rates with less than 10% of the original features

    A New Search Algorithm for Feature Selection in Hyperspectral Remote Sensing Images

    Get PDF
    A new suboptimal search strategy suitable for feature selection in very high-dimensional remote-sensing images (e.g. those acquired by hyperspectral sensors) is proposed. Each solution of the feature selection problem is represented as a binary string that indicates which features are selected and which are disregarded. In turn, each binary string corresponds to a point of a multidimensional binary space. Given a criterion function to evaluate the effectiveness of a selected solution, the proposed strategy is based on the search for constrained local extremes of such a function in the above-defined binary space. In particular, two different algorithms are presented that explore the space of solutions in different ways. These algorithms are compared with the classical sequential forward selection and sequential forward floating selection suboptimal techniques, using hyperspectral remote-sensing images (acquired by the AVIRIS sensor) as a data set. Experimental results point out the effectiveness of both algorithms, which can be regarded as valid alternatives to classical methods, as they allow interesting tradeoffs between the qualities of selected feature subsets and computational cost

    Application of multiobjective genetic programming to the design of robot failure recognition systems

    Get PDF
    We present an evolutionary approach using multiobjective genetic programming (MOGP) to derive optimal feature extraction preprocessing stages for robot failure detection. This data-driven machine learning method is compared both with conventional (nonevolutionary) classifiers and a set of domain-dependent feature extraction methods. We conclude MOGP is an effective and practical design method for failure recognition systems with enhanced recognition accuracy over conventional classifiers, independent of domain knowledge

    Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification: Automatic Feature Selection and Spectral Band Clustering

    Get PDF
    Hyperspectral imagery consists of hundreds of contiguous spectral bands. However, most of them are redundant. Thus a subset of well-chosen bands is generally sufficient for a specific problem, enabling to design adapted superspectral sensors dedicated to specific land cover classification. Related both to feature selection and extraction, spectral optimization identifies the most relevant band subset for specific applications, involving a band subset relevance score as well as a method to optimize it. This study first focuses on the choice of such relevance score. Several criteria are compared through both quantitative and qualitative analyses. To have a fair comparison, all tested criteria are compared to classic hyperspectral data sets using the same optimization heuristics: an incremental one to assess the impact of the number of selected bands and a stochastic one to obtain several possible good band subsets and to derive band importance measures out of intermediate good band subsets. Last, a specific approach is proposed to cope with the optimization of bandwidth. It consists in building a hierarchy of groups of adjacent bands, according to a score to decide which adjacent bands must be merged, before band selection is performed at the different levels of this hierarchy

    Assessment of Dispersion and Bubble Entropy Measures for Enhancing Preterm Birth Prediction Based on Electrohysterographic Signals

    Full text link
    [EN] One of the remaining challenges for the scientific-technical community is predicting preterm births, for which electrohysterography (EHG) has emerged as a highly sensitive prediction technique. Sample and fuzzy entropy have been used to characterize EHG signals, although they require optimizing many internal parameters. Both bubble entropy, which only requires one internal parameter, and dispersion entropy, which can detect any changes in frequency and amplitude, have been proposed to characterize biomedical signals. In this work, we attempted to determine the clinical value of these entropy measures for predicting preterm birth by analyzing their discriminatory capacity as an individual feature and their complementarity to other EHG characteristics by developing six prediction models using obstetrical data, linear and non-linear EHG features, and linear discriminant analysis using a genetic algorithm to select the features. Both dispersion and bubble entropy better discriminated between the preterm and term groups than sample, spectral, and fuzzy entropy. Entropy metrics provided complementary information to linear features, and indeed, the improvement in model performance by including other non-linear features was negligible. The best model performance obtained an F1-score of 90.1 ± 2% for testing the dataset. This model can easily be adapted to real-time applications, thereby contributing to the transferability of the EHG technique to clinical practice.This work was supported by the Spanish Ministry of Economy and Competitiveness, the European Regional Development Fund (MCIU/AEI/FEDER, UE RTI2018-094449-A-I00-AR), and by the Generalitat Valenciana (AICO/2019/220)Nieto Del-Amor, F.; Beskhani, R.; Ye Lin, Y.; Garcia-Casado, J.; Díaz-Martínez, MDA.; Monfort-Ortiz, R.; Diago-Almela, VJ.... (2021). Assessment of Dispersion and Bubble Entropy Measures for Enhancing Preterm Birth Prediction Based on Electrohysterographic Signals. Sensors. 21(18):1-17. https://doi.org/10.3390/s21186071S117211
    • …
    corecore