4,429 research outputs found
Applications of Machine Learning to Threat Intelligence, Intrusion Detection and Malware
Artificial Intelligence (AI) and Machine Learning (ML) are emerging technologies with applications to many fields. This paper is a survey of use cases of ML for threat intelligence, intrusion detection, and malware analysis and detection. Threat intelligence, especially attack attribution, can benefit from the use of ML classification. False positives from rule-based intrusion detection systems can be reduced with the use of ML models. Malware analysis and classification can be made easier by developing ML frameworks to distill similarities between the malicious programs. Adversarial machine learning will also be discussed, because while ML can be used to solve problems or reduce analyst workload, it also introduces new attack surfaces
Herding Vulnerable Cats: A Statistical Approach to Disentangle Joint Responsibility for Web Security in Shared Hosting
Hosting providers play a key role in fighting web compromise, but their
ability to prevent abuse is constrained by the security practices of their own
customers. {\em Shared} hosting, offers a unique perspective since customers
operate under restricted privileges and providers retain more control over
configurations. We present the first empirical analysis of the distribution of
web security features and software patching practices in shared hosting
providers, the influence of providers on these security practices, and their
impact on web compromise rates. We construct provider-level features on the
global market for shared hosting -- containing 1,259 providers -- by gathering
indicators from 442,684 domains. Exploratory factor analysis of 15 indicators
identifies four main latent factors that capture security efforts: content
security, webmaster security, web infrastructure security and web application
security. We confirm, via a fixed-effect regression model, that providers exert
significant influence over the latter two factors, which are both related to
the software stack in their hosting environment. Finally, by means of GLM
regression analysis of these factors on phishing and malware abuse, we show
that the four security and software patching factors explain between 10\% and
19\% of the variance in abuse at providers, after controlling for size. For
web-application security for instance, we found that when a provider moves from
the bottom 10\% to the best-performing 10\%, it would experience 4 times fewer
phishing incidents. We show that providers have influence over patch
levels--even higher in the stack, where CMSes can run as client-side
software--and that this influence is tied to a substantial reduction in abuse
levels
Reviewer Integration and Performance Measurement for Malware Detection
We present and evaluate a large-scale malware detection system integrating
machine learning with expert reviewers, treating reviewers as a limited
labeling resource. We demonstrate that even in small numbers, reviewers can
vastly improve the system's ability to keep pace with evolving threats. We
conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years
and containing 1.1 million binaries with 778GB of raw feature data. Without
reviewer assistance, we achieve 72% detection at a 0.5% false positive rate,
performing comparable to the best vendors on VirusTotal. Given a budget of 80
accurate reviews daily, we improve detection to 89% and are able to detect 42%
of malicious binaries undetected upon initial submission to VirusTotal.
Additionally, we identify a previously unnoticed temporal inconsistency in the
labeling of training datasets. We compare the impact of training labels
obtained at the same time training data is first seen with training labels
obtained months later. We find that using training labels obtained well after
samples appear, and thus unavailable in practice for current training data,
inflates measured detection by almost 20 percentage points. We release our
cluster-based implementation, as well as a list of all hashes in our evaluation
and 3% of our entire dataset.Comment: 20 papers, 11 figures, accepted at the 13th Conference on Detection
of Intrusions and Malware & Vulnerability Assessment (DIMVA 2016
A Comparison of Clustering Techniques for Malware Analysis
In this research, we apply clustering techniques to the malware detection problem. Our goal is to classify malware as part of a fully automated detection strategy. We compute clusters using the well-known �-means and EM clustering algorithms, with scores obtained from Hidden Markov Models (HMM). The previous work in this area consists of using HMM and �-means clustering technique to achieve the same. The current effort aims to extend it to use EM clustering technique for detection and also compare this technique with the �-means clustering
- …