1,760 research outputs found

    Clustering Hyperspectral Imagery for Improved Adaptive Matched Filter Performance

    Get PDF
    This paper offers improvements to adaptive matched filter (AMF) performance by addressing correlation and non-homogeneity problems inherent to hyperspectral imagery (HSI). The estimation of the mean vector and covariance matrix of the background should be calculated using “target-free” data. This statement reflects the difficulty that including target data in estimates of the mean vector and covariance matrix of the background could entail. This data could act as statistical outliers and severely contaminate the estimators. This fact serves as the impetus for a 2-stage process: First, attempt to remove the target data from the background by way of the employment of anomaly detectors. Next, with remaining data being relatively “target-free” the way is cleared for signature matching. Relative to the first stage, we were able to test seven different anomaly detectors, some of which are designed specifically to deal with the spatial correlation of HSI data and/or the presence of anomalous pixels in local or global mean and covariance estimators. Relative to the second stage, we investigated the use of cluster analytic methods to boost AMF performance. The research shows that accounting for spatial correlation effects in the detector yields nearly “target-free” data for use in an AMF that is greatly benefitted through the use of cluster analysis methods

    Improved N-dimensional Data Visualization from Hyper-radial Values

    Get PDF
    Higher-dimensional data, which is becoming common in many disciplines due to big data problems, are inherently difficult to visualize in a meaningful way. While many visualization methods exist, they are often difficult to interpret, involve multiple plots and overlaid points, or require simultaneous interpretations. This research adapts and extends hyper-radial visualization, a technique used to visualize Pareto fronts in multi-objective optimizations, to become an n-dimensional visualization tool. Hyper-radial visualization is seen to offer many advantages by presenting a low-dimensionality representation of data through easily understood calculations. First, hyper-radial visualization is extended for use with general multivariate data. Second, a method is developed by which to optimally determine groupings of the data for use in hyper-radial visualization to create a meaningful visualization based on class separation and geometric properties. Finally, this optimal visualization is expanded from two to three dimensions in order to support even higher-dimensional data. The utility of this work is illustrated by examples using seven datasets of varying sizes, ranging in dimensionality from Fisher Iris with 150 observations, 4 features, and 3 classes to the Mixed National Institute of Standards and Technology data with 60,000 observations, 717 non-zero features, and 10 classes

    A Locally Adaptable Iterative RX Detector

    Get PDF
    We present an unsupervised anomaly detection method for hyperspectral imagery (HSI) based on data characteristics inherit in HSI. A locally adaptive technique of iteratively refining the well-known RX detector (LAIRX) is developed. The technique is motivated by the need for better first- and second-order statistic estimation via avoidance of anomaly presence. Overall, experiments show favorable Receiver Operating Characteristic (ROC) curves when compared to a global anomaly detector based upon the Support Vector Data Description (SVDD) algorithm, the conventional RX detector, and decomposed versions of the LAIRX detector. Furthermore, the utilization of parallel and distributed processing allows fast processing time making LAIRX applicable in an operational setting

    Steganalysis Embedding Percentage Determination with Learning Vector Quantization

    Get PDF
    Steganography (stego) is used primarily when the very existence of a communication signal is to be kept covert. Detecting the presence of stego is a very difficult problem which is made even more difficult when the embedding technique is not known. This article presents an investigation of the process and necessary considerations inherent in the development of a new method applied for the detection of hidden data within digital images. We demonstrate the effectiveness of learning vector quantization (LVQ) as a clustering technique which assists in discerning clean or non-stego images from anomalous or stego images. This comparison is conducted using 7 featuresover a small set of 200 observations with varying levels of embedded information from 1% to 10% in increments of 1%. The results demonstrate that LVQ not only more accurately identify when an image contains LSB hidden information when compared to k-means or using just the raw feature sets, but also provides a simple method for determining the percentage of embedding given low information embedding percentages. Abstract ©2006 IEEE

    Journal in Entirety

    Get PDF

    Cyber-Physical Security with RF Fingerprint Classification through Distance Measure Extensions of Generalized Relevance Learning Vector Quantization

    Get PDF
    Radio frequency (RF) fingerprinting extracts fingerprint features from RF signals to protect against masquerade attacks by enabling reliable authentication of communication devices at the “serial number” level. Facilitating the reliable authentication of communication devices are machine learning (ML) algorithms which find meaningful statistical differences between measured data. The Generalized Relevance Learning Vector Quantization-Improved (GRLVQI) classifier is one ML algorithm which has shown efficacy for RF fingerprinting device discrimination. GRLVQI extends the Learning Vector Quantization (LVQ) family of “winner take all” classifiers that develop prototype vectors (PVs) which represent data. In LVQ algorithms, distances are computed between exemplars and PVs, and PVs are iteratively moved to accurately represent the data. GRLVQI extends LVQ with a sigmoidal cost function, relevance learning, and PV update logic improvements. However, both LVQ and GRLVQI are limited due to a reliance on squared Euclidean distance measures and a seemingly complex algorithm structure if changes are made to the underlying distance measure. Herein, the authors (1) develop GRLVQI-D (distance), an extension of GRLVQI to consider alternative distance measures and (2) present the Cosine GRLVQI classifier using this framework. To evaluate this framework, the authors consider experimentally collected Z -wave RF signals and develop RF fingerprints to identify devices. Z -wave devices are low-cost, low-power communication technologies seen increasingly in critical infrastructure. Both classification and verification, claimed identity, and performance comparisons are made with the new Cosine GRLVQI algorithm. The results show more robust performance when using the Cosine GRLVQI algorithm when compared with four algorithms in the literature. Additionally, the methodology used to create Cosine GRLVQI is generalizable to alternative measures

    The Effectiveness of Using Diversity to Select Multiple Classifier Systems with Varying Classification Thresholds

    Get PDF
    In classification applications, the goal of fusion techniques is to exploit complementary approaches and merge the information provided by these methods to provide a solution superior than any single method. Associated with choosing a methodology to fuse pattern recognition algorithms is the choice of algorithm or algorithms to fuse. Historically, classifier ensemble accuracy has been used to select which pattern recognition algorithms are included in a multiple classifier system. More recently, research has focused on creating and evaluating diversity metrics to more effectively select ensemble members. Using a wide range of classification data sets, methodologies, and fusion techniques, current diversity research is extended by expanding classifier domains before employing fusion methodologies. The expansion is made possible with a unique classification score algorithm developed for this purpose. Correlation and linear regression techniques reveal that the relationship between diversity metrics and accuracy is tenuous and optimal ensemble selection should be based on ensemble accuracy. The strengths and weaknesses of popular diversity metrics are examined in the context of the information they provide with respect to changing classification thresholds and accuracies

    anomalyDetection: Implementation of Augmented Network Log Anomaly Detection Procedures

    Get PDF
    As the number of cyber-attacks continues to grow on a daily basis, so does the delay in threat detection. For instance, in 2015, the Office of Personnel Management discovered that approximately 21.5 million individual records of Federal employees and contractors had been stolen. On average, the time between an attack and its discovery is more than 200 days. In the case of the OPM breach, the attack had been going on for almost a year. Currently, cyber analysts inspect numerous potential incidents on a daily basis, but have neither the time nor the resources available to perform such a task. anomalyDetection aims to curtail the time frame in which anomalous cyber activities go unnoticed and to aid in the efficient discovery of these anomalous transactions among the millions of daily logged events by i) providing an efficient means for pre-processing and aggregating cyber data for analysis by employing a tabular vector transformation and handling multicollinearity concerns; ii) offering numerous built-in multivariate statistical functions such as Mahalanobis distance, factor analysis, principal components analysis to identify anomalous activity, iii) incorporating the pipe operator (%\u3e%) to allow it to work well in the tidyverse workflow. Combined, anomalyDetection offers cyber analysts an efficient and simplified approach to break up network events into time-segment blocks and identify periods associated with suspected anomalies for further evaluation

    Cyber Anomaly Detection: Using Tabulated Vectors and Embedded Analytics for Efficient Data Mining

    Get PDF
    Firewalls, especially at large organizations, process high velocity internet traffic and flag suspicious events and activities. Flagged events can be benign, such as misconfigured routers, or malignant, such as a hacker trying to gain access to a specific computer. Confounding this is that flagged events are not always obvious in their danger and the high velocity nature of the problem. Current work in firewall log analysis is manual intensive and involves manpower hours to find events to investigate. This is predominantly achieved by manually sorting firewall and intrusion detection/prevention system log data. This work aims to improve the ability of analysts to find events for cyber forensics analysis. A tabulated vector approach is proposed to create meaningful state vectors from time-oriented blocks. Multivariate and graphical analysis is then used to analyze state vectors in human–machine collaborative interface. Statistical tools, such as the Mahalanobis distance, factor analysis, and histogram matrices, are employed for outlier detection. This research also introduces the breakdown distance heuristic as a decomposition of the Mahalanobis distance, by indicating which variables contributed most to its value. This work further explores the application of the tabulated vector approach methodology on collected firewall logs. Lastly, the analytic methodologies employed are integrated into embedded analytic tools so that cyber analysts on the front-line can efficiently deploy the anomaly detection capabilities

    Malware Type Recognition and Cyber Situational Awareness

    Get PDF
    Current technologies for computer network and host defense do not provide suitable information to support strategic and tactical decision making processes. Although pattern-based malware detection is an active research area, the additional context of the type of malware can improve cyber situational awareness. This additional context is an indicator of threat capability thus allowing organizations to assess information losses and focus response actions appropriately. Malware Type Recognition (MaTR) is a research initiative extending detection technologies to provide the additional context of malware types using only static heuristics. Test results with MaTR demonstrate over a 99% accurate detection rate and 59% test accuracy in malware typing
    • …
    corecore