11,176 research outputs found

    Numerical Analysis for Relevant Features in Intrusion Detection (NARFid)

    Get PDF
    Identification of cyber attacks and network services is a robust field of study in the machine learning community. Less effort has been focused on understanding the domain space of real network data in identifying important features for cyber attack and network service classification. Motivations for such work allow for anomaly detection systems with less requirements on data “sniffed” off the network, extraction of features from the traffic, reduced learning time of algorithms, and ideally increased classification performance of anomalous behavior. This thesis evaluates the usefulness of a good feature subset for the general classification task of identifying cyber attacks and network services. The generality of the selected features elucidates the relevance or irrelevance of the feature set for the classification task of intrusion detection. Additionally, the thesis provides an extension to the Bhattacharyya method, which selects features by means of inter-class separability (Bhattacharyya coefficient). The extension for multiple class problems selects a minimal set of features with the best separability across all class pairs. Several feature selection algorithms (e.g., accuracy rate with genetic algorithm, RELIEF-F, GRLVQI, median Bhattacharyya and minimum surface Bhattacharyya methods) create feature subsets that describe the decision boundary for intrusion detection problems. The selected feature subsets maintain or improve the classification performance for at least three out of the four anomaly detectors (i.e., classifiers) under test. The feature subsets, which illustrate generality for the intrusion detection problem, range in size from 12 to 27 features. The original feature set consists of 248 features. Of the feature subsets demonstrating generality, the extension to the Bhattacharyya method generates the second smallest feature subset. This thesis quantitatively demonstrates that a relatively small feature set may be used for intrusion detection with machine learning classifiers

    Minimum Distance Estimation of Milky Way Model Parameters and Related Inference

    Get PDF
    We propose a method to estimate the location of the Sun in the disk of the Milky Way using a method based on the Hellinger distance and construct confidence sets on our estimate of the unknown location using a bootstrap based method. Assuming the Galactic disk to be two-dimensional, the sought solar location then reduces to the radial distance separating the Sun from the Galactic center and the angular separation of the Galactic center to Sun line, from a pre-fixed line on the disk. On astronomical scales, the unknown solar location is equivalent to the location of us earthlings who observe the velocities of a sample of stars in the neighborhood of the Sun. This unknown location is estimated by undertaking pairwise comparisons of the estimated density of the observed set of velocities of the sampled stars, with densities estimated using synthetic stellar velocity data sets generated at chosen locations in the Milky Way disk according to four base astrophysical models. The "match" between the pair of estimated densities is parameterized by the affinity measure based on the familiar Hellinger distance. We perform a novel cross-validation procedure to establish a desirable "consistency" property of the proposed method.Comment: 25 pages, 10 Figures. This version incorporates the suggestions made by the referees. To appear in SIAM/ASA Journal on Uncertainty Quantificatio

    A New Search Algorithm for Feature Selection in Hyperspectral Remote Sensing Images

    Get PDF
    A new suboptimal search strategy suitable for feature selection in very high-dimensional remote-sensing images (e.g. those acquired by hyperspectral sensors) is proposed. Each solution of the feature selection problem is represented as a binary string that indicates which features are selected and which are disregarded. In turn, each binary string corresponds to a point of a multidimensional binary space. Given a criterion function to evaluate the effectiveness of a selected solution, the proposed strategy is based on the search for constrained local extremes of such a function in the above-defined binary space. In particular, two different algorithms are presented that explore the space of solutions in different ways. These algorithms are compared with the classical sequential forward selection and sequential forward floating selection suboptimal techniques, using hyperspectral remote-sensing images (acquired by the AVIRIS sensor) as a data set. Experimental results point out the effectiveness of both algorithms, which can be regarded as valid alternatives to classical methods, as they allow interesting tradeoffs between the qualities of selected feature subsets and computational cost

    Thermo-visual feature fusion for object tracking using multiple spatiogram trackers

    Get PDF
    In this paper, we propose a framework that can efficiently combine features for robust tracking based on fusing the outputs of multiple spatiogram trackers. This is achieved without the exponential increase in storage and processing that other multimodal tracking approaches suffer from. The framework allows the features to be split arbitrarily between the trackers, as well as providing the flexibility to add, remove or dynamically weight features. We derive a mean-shift type algorithm for the framework that allows efficient object tracking with very low computational overhead. We especially target the fusion of thermal infrared and visible spectrum features as the most useful features for automated surveillance applications. Results are shown on multimodal video sequences clearly illustrating the benefits of combining multiple features using our framework
    corecore