10,317 research outputs found
Malware distributions and graph structure of the Web
Knowledge about the graph structure of the Web is important for understanding
this complex socio-technical system and for devising proper policies supporting
its future development. Knowledge about the differences between clean and
malicious parts of the Web is important for understanding potential treats to
its users and for devising protection mechanisms. In this study, we conduct
data science methods on a large crawl of surface and deep Web pages with the
aim to increase such knowledge. To accomplish this, we answer the following
questions. Which theoretical distributions explain important local
characteristics and network properties of websites? How are these
characteristics and properties different between clean and malicious
(malware-affected) websites? What is the prediction power of local
characteristics and network properties to classify malware websites? To the
best of our knowledge, this is the first large-scale study describing the
differences in global properties between malicious and clean parts of the Web.
In other words, our work is building on and bridging the gap between
\textit{Web science} that tackles large-scale graph representations and
\textit{Web cyber security} that is concerned with malicious activities on the
Web. The results presented herein can also help antivirus vendors in devising
approaches to improve their detection algorithms
Herding Vulnerable Cats: A Statistical Approach to Disentangle Joint Responsibility for Web Security in Shared Hosting
Hosting providers play a key role in fighting web compromise, but their
ability to prevent abuse is constrained by the security practices of their own
customers. {\em Shared} hosting, offers a unique perspective since customers
operate under restricted privileges and providers retain more control over
configurations. We present the first empirical analysis of the distribution of
web security features and software patching practices in shared hosting
providers, the influence of providers on these security practices, and their
impact on web compromise rates. We construct provider-level features on the
global market for shared hosting -- containing 1,259 providers -- by gathering
indicators from 442,684 domains. Exploratory factor analysis of 15 indicators
identifies four main latent factors that capture security efforts: content
security, webmaster security, web infrastructure security and web application
security. We confirm, via a fixed-effect regression model, that providers exert
significant influence over the latter two factors, which are both related to
the software stack in their hosting environment. Finally, by means of GLM
regression analysis of these factors on phishing and malware abuse, we show
that the four security and software patching factors explain between 10\% and
19\% of the variance in abuse at providers, after controlling for size. For
web-application security for instance, we found that when a provider moves from
the bottom 10\% to the best-performing 10\%, it would experience 4 times fewer
phishing incidents. We show that providers have influence over patch
levels--even higher in the stack, where CMSes can run as client-side
software--and that this influence is tied to a substantial reduction in abuse
levels
Applications of Machine Learning to Threat Intelligence, Intrusion Detection and Malware
Artificial Intelligence (AI) and Machine Learning (ML) are emerging technologies with applications to many fields. This paper is a survey of use cases of ML for threat intelligence, intrusion detection, and malware analysis and detection. Threat intelligence, especially attack attribution, can benefit from the use of ML classification. False positives from rule-based intrusion detection systems can be reduced with the use of ML models. Malware analysis and classification can be made easier by developing ML frameworks to distill similarities between the malicious programs. Adversarial machine learning will also be discussed, because while ML can be used to solve problems or reduce analyst workload, it also introduces new attack surfaces
Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse
Domain squatting is a common adversarial practice where attackers register
domain names that are purposefully similar to popular domains. In this work, we
study a specific type of domain squatting called "combosquatting," in which
attackers register domains that combine a popular trademark with one or more
phrases (e.g., betterfacebook[.]com, youtube-live[.]com). We perform the first
large-scale, empirical study of combosquatting by analyzing more than 468
billion DNS records---collected from passive and active DNS data sources over
almost six years. We find that almost 60% of abusive combosquatting domains
live for more than 1,000 days, and even worse, we observe increased activity
associated with combosquatting year over year. Moreover, we show that
combosquatting is used to perform a spectrum of different types of abuse
including phishing, social engineering, affiliate abuse, trademark abuse, and
even advanced persistent threats. Our results suggest that combosquatting is a
real problem that requires increased scrutiny by the security community.Comment: ACM CCS 1
Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data
Recent years have seen the rise of more sophisticated attacks including
advanced persistent threats (APTs) which pose severe risks to organizations and
governments by targeting confidential proprietary information. Additionally,
new malware strains are appearing at a higher rate than ever before. Since many
of these malware are designed to evade existing security products, traditional
defenses deployed by most enterprises today, e.g., anti-virus, firewalls,
intrusion detection systems, often fail at detecting infections at an early
stage.
We address the problem of detecting early-stage infection in an enterprise
setting by proposing a new framework based on belief propagation inspired from
graph theory. Belief propagation can be used either with "seeds" of compromised
hosts or malicious domains (provided by the enterprise security operation
center -- SOC) or without any seeds. In the latter case we develop a detector
of C&C communication particularly tailored to enterprises which can detect a
stealthy compromise of only a single host communicating with the C&C server.
We demonstrate that our techniques perform well on detecting enterprise
infections. We achieve high accuracy with low false detection and false
negative rates on two months of anonymized DNS logs released by Los Alamos
National Lab (LANL), which include APT infection attacks simulated by LANL
domain experts. We also apply our algorithms to 38TB of real-world web proxy
logs collected at the border of a large enterprise. Through careful manual
investigation in collaboration with the enterprise SOC, we show that our
techniques identified hundreds of malicious domains overlooked by
state-of-the-art security products
- …