16 research outputs found
Recommended from our members
STAND: Sanitization Tool for ANomaly Detection
The efficacy of Anomaly Detection (AD) sensors depends heavily on the quality of the data used to train them. Artificial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets are dirty; that is, they contain a number of attacks or anomalous events. The size of these high-quality training data sets makes manual removal or labeling of attack data infeasible. As a result, sensors trained on this data can miss attacks and their variations. We propose extending the training phase of AD sensors (in a manner agnostic to the underlying AD algorithm) to include a sanitization phase. This phase generates multiple models conditioned on small slices of the training data. We use these "micro-models"Ă‚ to produce provisional labels for each training input, and we combine the micro-models in a voting scheme to determine which parts of the training data may represent attacks. Our results suggest that this phase automatically and significantly improves the quality of unlabeled training data by making it as "attack-free"Ă‚ and "regular"Ă‚ as possible in the absence of absolute ground truth. We also show how a collaborative approach that combines models from different networks or domains can further refine the sanitization process to thwart targeted training or mimicry attacks against a single site
Adaptive Anomaly Detection via Self-Calibration and Dynamic Updating
The deployment and use of Anomaly Detection (AD) sensors often requires the intervention of a human expert to manually calibrate and optimize their performance. Depending on the site and the type of traffic it receives, the operators might have to provide recent and sanitized training data sets, the characteristics of expected traffic (i.e. outlier ratio), and exceptions or even expected future modifications of system's behavior. In this paper, we study the potential performance issues that stem from fully automating the AD sensors' day-to-day maintenance and calibration. Our goal is to remove the dependence on human operator using an unlabeled, and thus potentially dirty, sample of incoming traffic. To that end, we propose to enhance the training phase of AD sensors with a self-calibration phase, leading to the automatic determination of the optimal AD parameters. We show how this novel calibration phase can be employed in conjunction with previously proposed methods for training data sanitization resulting in a fully automated AD maintenance cycle. Our approach is completely agnostic to the underlying AD sensor algorithm. Furthermore, the self-calibration can be applied in an online fashion to ensure that the resulting AD models reflect changes in the system's behavior which would otherwise render the sensor's internal state inconsistent. We verify the validity of our approach through a series of experiments where we compare the manually obtained optimal parameters with the ones computed from the self-calibration phase. Modeling traffic from two different sources, the fully automated calibration shows a 7.08% reduction in detection rate and a 0.06% increase in false positives, in the worst case, when compared to the optimal selection of parameters. Finally, our adaptive models outperform the statically generated ones retaining the gains in performance from the sanitization process over time
Recommended from our members
Online Training and Sanitization of AD Systems
In this paper, we introduce novel techniques that enhance the training phase of Anomaly Detection (AD) sensors. Our aim is to both improve the detection performance and protect against attacks that target the training dataset. Our approach is two pronged: we employ a novel sanitization method for large training datasets that removes attacks and traffic artifacts by measuring their frequency and position inside the dataset. Furthermore, we extend the training phase in the spatial dimension to include model information from other collaborative systems. We demonstrate that by doing so we can protect all the participating systems against targeted training attacks. Another aspect of our system is its ability to adapt and update the normality model when there is a shift in the nature of inspected traffic that reflects actual changes in the back-end servers. Such "on-line" training appears to be the "Achilles' heel" of AD sensors because they fail to adapt when there is a legitimate deviation in the traffic behavior, thereby flooding the operator with false positives. To counter that, we discuss the integration of what we call a shadow sensor with the AD system. This sensor complements our techniques by acting as an oracle to analyze and classify the resulting "suspect data" identified by the AD sensor. We show that our techniques can be applied to a wide range of unmodified AD sensors without incurring significant additional computational cost beyond the initial training phase
Recommended from our members
Data Sanitization: Improving the Forensic Utility of Anomaly Detection Systems
Anomaly Detection (AD) sensors have become an invaluable tool for forensic analysis and intrusion detection. Unfortunately, the detection accuracy of all learning-based ADs depends heavily on the quality of the training data, which is often poor, severely degrading their reliability as a protection and forensic analysis tool. In this paper, we propose extending the training phase of an AD to include a sanitization phase that aims to improve the quality of unlabeled training data by making them as "attack-free" and "regular" as possible in the absence of absolute ground truth. Our proposed scheme is agnostic to the underlying AD, boosting its performance based solely on training-data sanitization. Our approach is to generate multiple AD models for content-based AD sensors trained on small slices of the training data. These AD "micro-models" are used to test the training data, producing alerts for each training input. We employ voting techniques to determine which of these training items are likely attacks. Our preliminary results show that sanitization increases 0-day attack detection while maintaining a low false positive rate, increasing confidence to the AD alerts. We perform an initial characterization of the performance of our system when we deploy sanitized versus unsanitized AD systems in combination with expensive host-based attack-detection systems. Finally, we provide some preliminary evidence that our system incurs only an initial modest cost, which can be amortized over time during online operation
Recommended from our members
Post-Patch Retraining for Host-Based Anomaly Detection
Applying patches, although a disruptive activity, remains a vital part of software maintenance and defense. When host-based anomaly detection (AD) sensors monitor an application, patching the application requires a corresponding update of the sensor's behavioral model. Otherwise, the sensor may incorrectly classify new behavior as malicious (a false positive) or assert that old, incorrect behavior is normal (a false negative). Although the problem of "model drift" is an almost universally acknowledged hazard for AD sensors, relatively little work has been done to understand the process of re-training a "live" AD model --- especially in response to legal behavioral updates like vendor patches or repairs produced by a self-healing system. We investigate the feasibility of automatically deriving and applying a "model patch" that describes the changes necessary to update a "reasonable" host-based AD behavioral model ({\it i.e.,} a model whose structure follows the core design principles of existing host--based anomaly models). We aim to avoid extensive retraining and regeneration of the entire AD model when only parts may have changed --- a task that seems especially undesirable after the exhaustive testing necessary to deploy a patch
Intrusion and Anomaly Detection Model Exchange for Mobile Ad-Hoc Networks
Mobile Ad-hoc NETworks (MANETs) pose unique security requirements and challenges due to their reliance on open, peer-to-peer models that often don't require authentication between nodes. Additionally, the limited processing power and battery life of the devices used in a MANET also prevent the adoption of heavy-duty cryptographic techniques. While traditional misuse-based Intrusion Detection Systems (IDSes) may work in a MANET, watching for packet dropouts or unknown outsiders is difficult as both occur frequently in both malicious and non-malicious traffic. Anomaly detection approaches hold out more promise, as they utilize learning techniques to adapt to the wireless environment and flag malicious data. The anomaly detection model can also create device behavior profiles, which peers can utilize to help determine its trustworthiness. However, computing the anomaly model itself is a time-consuming and processor-heavy task. To avoid this, we propose the use of model exchange as a device moves between different networks as a means to minimize computation and traffic utilization. Any node should be able to obtain peers' model(s) and evaluate it against its own model of "normal" behavior. We present this model, discuss scenarios in which it may be used, and provide preliminary results and a framework for future implementation
Data Sanitization: Improving the Forensic Utility of Anomaly Detection Systems
Anomaly Detection (AD) sensors have become an invaluable tool for forensic analysis and intrusion detection. Unfortunately, the detection performance of all learning-based ADs depends heavily on the quality of the training data. In this paper, we extend the training phase of an AD to include a sanitization phase. This phase significantly improves the quality of unlabeled training data by making them as "attack-free"Ă‚ as possible in the absence of absolute ground truth. Our approach is agnostic to the underlying AD, boosting its performance based solely on training-data sanitization. Our approach is to generate multiple AD models for content-based AD sensors trained on small slices of the training data. These AD "micro-models"Ă‚ are used to test the training data, producing alerts for each training input. We employ voting techniques to determine which of these training items are likely attacks. Our preliminary results show that sanitization increases 0-day attack detection while in most cases reducing the false positive rate. We analyze the performance gains when we deploy sanitized versus unsanitized AD systems in combination with expensive hostbased attack-detection systems. Finally, we show that our system incurs only an initial modest cost, which can be amortized over time during online operation
Recommended from our members
From STEM to SEAD: Speculative Execution for Automated Defense
Most computer defense systems crash the process that they protect as part of their response to an attack. In contrast, self-healing software recovers from an attack by automatically repairing the underlying vulnerability. Although recent research explores the feasibility of the basic concept, self-healing faces four major obstacles before it can protect legacy applications and COTS software. Besides the practical issues involved in applying the system to such software (e.g., not modifying source code), self-healing has encountered a number of problems: knowing when to engage, knowing how to repair, and handling communication with external entities. Our previous work on a self-healing system, STEM, left these challenges as future work. STEM provides self-healing by speculatively executing "slices" of a process. This paper improves STEM's capabilities along three lines: (1) applicability of the system to COTS software (STEM does not require source code, and it imposes a roughly 73% performance penalty on Apache's normal operation), (2) semantic correctness of the repair (we introduce virtual proxies and repair policy to assist the healing process), and (3) creating a behavior profile based on aspects of data and control flow
Recommended from our members
Casting Out Demons: Sanitizing Training Data for Anomaly Sensors
The efficacy of anomaly detection (AD) sensors depends heavily on the quality of the data used to train them. Artificial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets are dirty; that is, they contain a number of attacks or anomalous events. The size of these high-quality training data sets makes manual removal or labeling of attack data infeasible. As a result, sensors trained on this data can miss attacks and their variations. We propose extending the training phase of AD sensors (in a manner agnostic to the underlying AD algorithm) to include a sanitization phase. This phase generates multiple models conditioned on small slices of the training data. We use these "micro-models" to produce provisional labels for each training input, and we combine the micro-models in a voting scheme to determine which parts of the training data may represent attacks. Our results suggest that this phase automatically and significantly improves the quality of unlabeled training data by making it as "attack-free" and "regular" as possible in the absence of absolute ground truth. We also show how a collaborative approach that combines models from different networks or domains can further refine the sanitization process to thwart targeted training or mimicry attacks against a single site
From STEM to SEAD: Speculative Execution for Automated Defense
Most computer defense systems crash the process that they protect as part of their response to an attack. Although recent research explores the feasibility of selfhealing to automatically recover from an attack, selfhealing faces some obstacles before it can protect legacy applications and COTS (Commercial Off–The–Shelf) software. Besides the practical issue of not modifying source code, self-healing must know both when to engage and how to guide a repair. Previous work on a self-healing system, STEM, left these challenges as future work. This paper improves STEM’s capabilities along three lines to provide practical speculative execution for automated defense (SEAD). First, STEM is now applicable to COTS software: it does not require source code, and it imposes a roughly 73 % performance penalty on Apache’s normal operation. Second, we introduce repair policy to assist the healing process and improve the semantic correctness of the repair. Finally, STEM can create behavior profiles based on aspects of data and control flow.