37 research outputs found
Are Machine Learning Models for Malware Detection Ready for Prime Time?
We investigate why the performance of machine learning models for malware detection observed in a lab setting often cannot be reproduced in practice. We discuss how to set up experiments mimicking a practical deployment and how to measure the robustness of a model over time
Detection and Threat Prioritization of Pivoting Attacks in Large Networks
Several advanced cyber attacks adopt the technique of "pivoting" through which attackers create a command propagation tunnel through two or more hosts in order to reach their final target. Identifying such malicious activities is one of the most tough research problems because of several challenges: command propagation is a rare event that cannot be detected through signatures, the huge amount of internal communications facilitates attackers evasion, timely pivoting discovery is computationally demanding. This paper describes the first pivoting detection algorithm that is based on network flows analyses, does not rely on any a-priori assumption on protocols and hosts, and leverages an original problem formalization in terms of temporal graph analytics. We also introduce a prioritization algorithm that ranks the detected paths on the basis of a threat score thus letting security analysts investigate just the most suspicious pivoting tunnels. Feasibility and effectiveness of our proposal are assessed through a broad set of experiments that demonstrate its higher accuracy and performance against related algorithms
Investigating Labelless Drift Adaptation for Malware Detection
The evolution of malware has long plagued machine learning-based detection systems, as malware authors develop innovative strategies to evade detection and chase profits. This induces concept drift as the test distribution diverges from the training, causing performance decay that requires constant monitoring and adaptation.
In this work, we analyze the adaptation strategy used by DroidEvolver, a state-of-the-art learning system that self-updates using pseudo-labels to avoid the high overhead associated with obtaining a new ground truth. After removing sources of experimental bias present in the original evaluation, we identify a number of flaws in the generation and integration of these pseudo-labels, leading to a rapid onset of performance degradation as the model poisons itself. We propose DroidEvolver++, a more robust variant of DroidEvolver, to address these issues and highlight the role of pseudo-labels in addressing concept drift. We test the tolerance of the adaptation strategy versus different degrees of pseudo-label noise and propose the adoption of methods to ensure only high-quality pseudo-labels are used for updates.
Ultimately, we conclude that the use of pseudo-labeling remains a promising solution to limitations on labeling capacity, but great care must be taken when designing update mechanisms to avoid negative feedback loops and self-poisoning which have catastrophic effects on performance
Scalable architecture for online prioritization of cyber threats
This paper proposes an innovative framework for the early detection of several
cyber attacks, where the main component is an analytics core that gathers streams of raw data
generated by network probes, builds several layer models representing different activities of
internal hosts, analyzes intra-layer and inter-layer information. The online analysis of internal
network activities at different levels distinguishes our approach with respect to most detection
tools and algorithms focusing on separate network levels or interactions between internal and
external hosts. Moreover, the integrated multi-layer analysis carried out through parallel
processing reduces false positives and guarantees scalability with respect to the size of the
network and the number of layers. As a further contribution, the proposed framework executes
autonomous triage by assigning a risk score to each internal host. This key feature allows
security experts to focus their attention on the few hosts with higher scores rather than wasting
time on thousands of daily alerts and false alarms
Intriguing Properties of Adversarial ML Attacks in the Problem Space
Recent research efforts on adversarial ML have investigated problem-space
attacks, focusing on the generation of real evasive objects in domains where,
unlike images, there is no clear inverse mapping to the feature space (e.g.,
software). However, the design, comparison, and real-world implications of
problem-space attacks remain underexplored. This paper makes two major
contributions. First, we propose a novel formalization for adversarial ML
evasion attacks in the problem-space, which includes the definition of a
comprehensive set of constraints on available transformations, preserved
semantics, robustness to preprocessing, and plausibility. We shed light on the
relationship between feature space and problem space, and we introduce the
concept of side-effect features as the byproduct of the inverse feature-mapping
problem. This enables us to define and prove necessary and sufficient
conditions for the existence of problem-space attacks. We further demonstrate
the expressive power of our formalization by using it to describe several
attacks from related literature across different domains. Second, building on
our formalization, we propose a novel problem-space attack on Android malware
that overcomes past limitations. Experiments on a dataset with 170K Android
apps from 2017 and 2018 show the practical feasibility of evading a
state-of-the-art malware classifier along with its hardened version. Our
results demonstrate that "adversarial-malware as a service" is a realistic
threat, as we automatically generate thousands of realistic and inconspicuous
adversarial applications at scale, where on average it takes only a few minutes
to generate an adversarial app. Our formalization of problem-space attacks
paves the way to more principled research in this domain.Comment: This arXiv version (v2) corresponds to the one published at IEEE
Symposium on Security & Privacy (Oakland), 202
TESSERACT:Eliminating Experimental Bias in Malware Classification across Space and Time
Is Android malware classification a solved problem? Published F1 scores of up
to 0.99 appear to leave very little room for improvement. In this paper, we
argue that results are commonly inflated due to two pervasive sources of
experimental bias: "spatial bias" caused by distributions of training and
testing data that are not representative of a real-world deployment; and
"temporal bias" caused by incorrect time splits of training and testing sets,
leading to impossible configurations. We propose a set of space and time
constraints for experiment design that eliminates both sources of bias. We
introduce a new metric that summarizes the expected robustness of a classifier
in a real-world setting, and we present an algorithm to tune its performance.
Finally, we demonstrate how this allows us to evaluate mitigation strategies
for time decay such as active learning. We have implemented our solutions in
TESSERACT, an open source evaluation framework for comparing malware
classifiers in a realistic setting. We used TESSERACT to evaluate three Android
malware classifiers from the literature on a dataset of 129K applications
spanning over three years. Our evaluation confirms that earlier published
results are biased, while also revealing counter-intuitive performance and
showing that appropriate tuning can lead to significant improvements.Comment: This arXiv version (v4) corresponds to the one published at USENIX
Security Symposium 2019, with a fixed typo in Equation (4), which reported an
extra normalization factor of (1/N). The results in the paper and the
released implementation of the TESSERACT framework remain valid and correct
as they rely on Python's numpy implementation of area under the curv
INSOMNIA:Towards Concept-Drift Robustness in Network Intrusion Detection
Despite decades of research in network traffic analysis and incredible advances in artificial intelligence, network intrusion detection systems based on machine learning (ML) have yet to prove their worth. One core obstacle is the existence of concept drift, an issue for all adversary-facing security systems. Additionally, specific challenges set intrusion detection apart from other ML-based security tasks, such as malware detection.
In this work, we offer a new perspective on these challenges. We propose INSOMNIA, a semi-supervised intrusion detector which continuously updates the underlying ML model as network traffic characteristics are affected by concept drift. We use active learning to reduce latency in the model updates, label estimation to reduce labeling overhead, and apply explainable AI to better interpret how the model reacts to the shifting distribution.
To evaluate INSOMNIA, we extend TESSERACT - a framework originally proposed for performing sound time-aware evaluations of ML-based malware detectors - to the network intrusion domain. Our evaluation shows that accounting for drifting scenarios is vital for effective intrusion detection systems
Scalable architecture for multi-user encrypted SQL operations on cloud database services
Abstract-The success of the cloud database paradigm is strictly related to strong guarantees in terms of service availability, scalability and security, but also of data confidentiality. Any cloud provider assures the security and availability of its platform, while the implementation of scalable solutions to guarantee confidentiality of the information stored in cloud databases is an open problem left to the tenant. Existing solutions address some preliminary issues through SQL operations on encrypted data. We propose the first complete architecture that combines data encryption, key management, authentication and authorization solutions, and that addresses the issues related to typical threat scenarios for cloud database services. Formal models describe the proposed solutions for enforcing access control and for guaranteeing confidentiality of data and metadata. Experimental evaluations based on standard benchmarks and real Internet scenarios show that the proposed architecture satisfies also scalability and performance requirements
Universal Adversarial Perturbations for Malware
Machine learning classification models are vulnerable to adversarial examples
-- effective input-specific perturbations that can manipulate the model's
output. Universal Adversarial Perturbations (UAPs), which identify noisy
patterns that generalize across the input space, allow the attacker to greatly
scale up the generation of these adversarial examples. Although UAPs have been
explored in application domains beyond computer vision, little is known about
their properties and implications in the specific context of realizable
attacks, such as malware, where attackers must reason about satisfying
challenging problem-space constraints.
In this paper, we explore the challenges and strengths of UAPs in the context
of malware classification. We generate sequences of problem-space
transformations that induce UAPs in the corresponding feature-space embedding
and evaluate their effectiveness across threat models that consider a varying
degree of realistic attacker knowledge. Additionally, we propose adversarial
training-based mitigations using knowledge derived from the problem-space
transformations, and compare against alternative feature-space defenses. Our
experiments limit the effectiveness of a white box Android evasion attack to
~20 % at the cost of 3 % TPR at 1 % FPR. We additionally show how our method
can be adapted to more restrictive application domains such as Windows malware.
We observe that while adversarial training in the feature space must deal
with large and often unconstrained regions, UAPs in the problem space identify
specific vulnerabilities that allow us to harden a classifier more effectively,
shifting the challenges and associated cost of identifying new universal
adversarial transformations back to the attacker