171 research outputs found
Characterizing Phishing Threats with Natural Language Processing
Spear phishing is a widespread concern in the modern network security
landscape, but there are few metrics that measure the extent to which
reconnaissance is performed on phishing targets. Spear phishing emails closely
match the expectations of the recipient, based on details of their experiences
and interests, making them a popular propagation vector for harmful malware. In
this work we use Natural Language Processing techniques to investigate a
specific real-world phishing campaign and quantify attributes that indicate a
targeted spear phishing attack. Our phishing campaign data sample comprises 596
emails - all containing a web bug and a Curriculum Vitae (CV) PDF attachment -
sent to our institution by a foreign IP space. The campaign was found to
exclusively target specific demographics within our institution. Performing a
semantic similarity analysis between the senders' CV attachments and the
recipients' LinkedIn profiles, we conclude with high statistical certainty (p
) that the attachments contain targeted rather than randomly
selected material. Latent Semantic Analysis further demonstrates that
individuals who were a primary focus of the campaign received CVs that are
highly topically clustered. These findings differentiate this campaign from one
that leverages random spam.Comment: This paper has been accepted for publication by the IEEE Conference
on Communications and Network Security in September 2015 at Florence, Italy.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Tiresias: Predicting Security Events Through Deep Learning
With the increased complexity of modern computer attacks, there is a need for
defenders not only to detect malicious activity as it happens, but also to
predict the specific steps that will be taken by an adversary when performing
an attack. However this is still an open research problem, and previous
research in predicting malicious events only looked at binary outcomes (e.g.,
whether an attack would happen or not), but not at the specific steps that an
attacker would undertake. To fill this gap we present Tiresias, a system that
leverages Recurrent Neural Networks (RNNs) to predict future events on a
machine, based on previous observations. We test Tiresias on a dataset of 3.4
billion security events collected from a commercial intrusion prevention
system, and show that our approach is effective in predicting the next event
that will occur on a machine with a precision of up to 0.93. We also show that
the models learned by Tiresias are reasonably stable over time, and provide a
mechanism that can identify sudden drops in precision and trigger a retraining
of the system. Finally, we show that the long-term memory typical of RNNs is
key in performing event prediction, rendering simpler methods not up to the
task
Detecting and characterizing lateral phishing at scale
We present the first large-scale characterization of lateral phishing attacks, based on a dataset of 113 million employee-sent emails from 92 enterprise organizations. In a lateral phishing attack, adversaries leverage a compromised enterprise account to send phishing emails to other users, benefit-ting from both the implicit trust and the information in the hijacked user's account. We develop a classifier that finds hundreds of real-world lateral phishing emails, while generating under four false positives per every one-million employee-sent emails. Drawing on the attacks we detect, as well as a corpus of user-reported incidents, we quantify the scale of lateral phishing, identify several thematic content and recipient targeting strategies that attackers follow, illuminate two types of sophisticated behaviors that attackers exhibit, and estimate the success rate of these attacks. Collectively, these results expand our mental models of the 'enterprise attacker' and shed light on the current state of enterprise phishing attacks
Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data
Recent years have seen the rise of more sophisticated attacks including
advanced persistent threats (APTs) which pose severe risks to organizations and
governments by targeting confidential proprietary information. Additionally,
new malware strains are appearing at a higher rate than ever before. Since many
of these malware are designed to evade existing security products, traditional
defenses deployed by most enterprises today, e.g., anti-virus, firewalls,
intrusion detection systems, often fail at detecting infections at an early
stage.
We address the problem of detecting early-stage infection in an enterprise
setting by proposing a new framework based on belief propagation inspired from
graph theory. Belief propagation can be used either with "seeds" of compromised
hosts or malicious domains (provided by the enterprise security operation
center -- SOC) or without any seeds. In the latter case we develop a detector
of C&C communication particularly tailored to enterprises which can detect a
stealthy compromise of only a single host communicating with the C&C server.
We demonstrate that our techniques perform well on detecting enterprise
infections. We achieve high accuracy with low false detection and false
negative rates on two months of anonymized DNS logs released by Los Alamos
National Lab (LANL), which include APT infection attacks simulated by LANL
domain experts. We also apply our algorithms to 38TB of real-world web proxy
logs collected at the border of a large enterprise. Through careful manual
investigation in collaboration with the enterprise SOC, we show that our
techniques identified hundreds of malicious domains overlooked by
state-of-the-art security products
Adversarial behaviours knowledge area
The technological advancements witnessed by our society in recent decades have brought
improvements in our quality of life, but they have also created a number of opportunities for
attackers to cause harm. Before the Internet revolution, most crime and malicious activity
generally required a victim and a perpetrator to come into physical contact, and this limited
the reach that malicious parties had. Technology has removed the need for physical contact
to perform many types of crime, and now attackers can reach victims anywhere in the world, as long as they are connected to the Internet. This has revolutionised the characteristics of crime and warfare, allowing operations that would not have been possible before. In this document, we provide an overview of the malicious operations that are happening on the Internet today. We first provide a taxonomy of malicious activities based on the attacker’s motivations and capabilities, and then move on to the technological and human elements that adversaries require to run a successful operation. We then discuss a number of frameworks that have been proposed to model malicious operations. Since adversarial behaviours are not a purely technical topic, we draw from research in a number of fields (computer science, criminology, war studies). While doing this, we discuss how these frameworks can be used by researchers and practitioners to develop effective mitigations against malicious online operations.Published versio
Targeted Attacks: Redefining Spear Phishing and Business Email Compromise
In today's digital world, cybercrime is responsible for significant damage to
organizations, including financial losses, operational disruptions, or
intellectual property theft. Cyberattacks often start with an email, the major
means of corporate communication. Some rare, severely damaging email threats -
known as spear phishing or Business Email Compromise - have emerged. However,
the literature disagrees on their definition, impeding security vendors and
researchers from mitigating targeted attacks. Therefore, we introduce targeted
attacks. We describe targeted-attack-detection techniques as well as
social-engineering methods used by fraudsters. Additionally, we present
text-based attacks - with textual content as malicious payload - and compare
non-targeted and targeted variants
A Large-Scale Study of Phishing PDF Documents
Phishing PDFs are malicious PDF documents that do not embed malware but trick
victims into visiting malicious web pages leading to password theft or drive-by
downloads. While recent reports indicate a surge of phishing PDFs, prior works
have largely neglected this new threat, positioning phishing PDFs as
accessories distributed via email phishing campaigns.
This paper challenges this belief and presents the first systematic and
comprehensive study centered on phishing PDFs. Starting from a real-world
dataset, we first identify 44 phishing PDF campaigns via clustering and
characterize them by looking at their volumetric, temporal, and visual
features. Among these, we identify three large campaigns covering 89% of the
dataset, exhibiting significantly different volumetric and temporal properties
compared to classical email phishing, and relying on web UI elements as visual
baits. Finally, we look at the distribution vectors and show that phishing PDFs
are not only distributed via attachments but also via SEO attacks, placing
phishing PDFs outside the email distribution ecosystem.
This paper also assesses the usefulness of the VirusTotal scoring system,
showing that phishing PDFs are ranked considerably low, creating a blind spot
for organizations. While URL blocklists can help to prevent victims from
visiting the attack web pages, PDF documents seem not subjected to any form of
content-based filtering or detection
- …