54 research outputs found
Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse
Domain squatting is a common adversarial practice where attackers register
domain names that are purposefully similar to popular domains. In this work, we
study a specific type of domain squatting called "combosquatting," in which
attackers register domains that combine a popular trademark with one or more
phrases (e.g., betterfacebook[.]com, youtube-live[.]com). We perform the first
large-scale, empirical study of combosquatting by analyzing more than 468
billion DNS records---collected from passive and active DNS data sources over
almost six years. We find that almost 60% of abusive combosquatting domains
live for more than 1,000 days, and even worse, we observe increased activity
associated with combosquatting year over year. Moreover, we show that
combosquatting is used to perform a spectrum of different types of abuse
including phishing, social engineering, affiliate abuse, trademark abuse, and
even advanced persistent threats. Our results suggest that combosquatting is a
real problem that requires increased scrutiny by the security community.Comment: ACM CCS 1
From Chatbots to PhishBots? -- Preventing Phishing scams created using ChatGPT, Google Bard and Claude
The advanced capabilities of Large Language Models (LLMs) have made them
invaluable across various applications, from conversational agents and content
creation to data analysis, research, and innovation. However, their
effectiveness and accessibility also render them susceptible to abuse for
generating malicious content, including phishing attacks. This study explores
the potential of using four popular commercially available LLMs - ChatGPT (GPT
3.5 Turbo), GPT 4, Claude and Bard to generate functional phishing attacks
using a series of malicious prompts. We discover that these LLMs can generate
both phishing emails and websites that can convincingly imitate well-known
brands, and also deploy a range of evasive tactics for the latter to elude
detection mechanisms employed by anti-phishing systems. Notably, these attacks
can be generated using unmodified, or "vanilla," versions of these LLMs,
without requiring any prior adversarial exploits such as jailbreaking. As a
countermeasure, we build a BERT based automated detection tool that can be used
for the early detection of malicious prompts to prevent LLMs from generating
phishing content attaining an accuracy of 97\% for phishing website prompts,
and 94\% for phishing email prompts
An Automated Methodology for Validating Web Related Cyber Threat Intelligence by Implementing a Honeyclient
Loodud töö panustab küberkaitse valdkonda pakkudes alternatiivse viisi, kuidas hoida ohuteadmus andmebaas uuendatuna. Veebilehti kasutatakse ära viisina toimetada pahatahtlik kood ohvrini. Peale veebilehe klassifitseerimist pahaloomuliseks lisatakse see ohuteadmus andmebaasi kui pahaloomulise indikaatorina. Lõppkokkuvõtteks muutuvad sellised andmebaasid mahukaks ja sisaldavad aegunud kirjeid. Lahendus on automatiseerida aegunud kirjete kontrollimist klient-meepott tarkvaraga ning kogu protsess on täielikult automatiseeritav eesmärgiga hoida kokku aega. Jahtides kontrollitud ja kinnitatud indikaatoreid aitab see vältida valedel alustel küberturbe intsidentide menetlemist.This paper is contributing to the open source cybersecurity community by providing an alternative methodology for analyzing web related cyber threat intelligence. Websites are used commonly as an attack vector to spread malicious content crafted by any malicious party. These websites become threat intelligence which can be stored and collected into corresponding databases. Eventually these cyber threat databases become obsolete and can lead to false positive investigations in cyber incident response. The solution is to keep the threat indicator entries valid by verifying their content and this process can be fully automated to keep the process less time consuming. The proposed technical solution is a low interaction honeyclient regularly tasked to verify the content of the web based threat indicators. Due to the huge amount of database entries, this way most of the web based threat indicators can be automatically validated with less time consumption and they can be kept relevant for monitoring purposes and eventually can lead to avoiding false positives in an incident response processes
Using Context to Improve Network-based Exploit Kit Detection
Today, our computers are routinely compromised while performing seemingly innocuous activities like reading articles on trusted websites (e.g., the NY Times). These compromises are perpetrated via complex interactions involving the advertising networks that monetize these sites. Web-based compromises such as exploit kits are similar to any other scam -- the attacker wants to lure an unsuspecting client into a trap to steal private information, or resources -- generating 10s of millions of dollars annually. Exploit kits are web-based services specifically designed to capitalize on vulnerabilities in unsuspecting client computers in order to install malware without a user's knowledge. Sadly, it only takes a single successful infection to ruin a user's financial life, or lead to corporate breaches that result in millions of dollars of expense and loss of customer trust. Exploit kits use a myriad of techniques to obfuscate each attack instance, making current network-based defenses such as signature-based network intrusion detection systems far less effective than in years past. Dynamic analysis or honeyclient analysis on these exploits plays a key role in identifying new attacks for signature generation, but provides no means of inspecting end-user traffic on the network to identify attacks in real time. As a result, defenses designed to stop such malfeasance often arrive too late or not at all resulting in high false positive and false negative (error) rates. In order to deal with these drawbacks, three new detection approaches are presented. To deal with the issue of a high number of errors, a new technique for detecting exploit kit interactions on a network is proposed. The technique capitalizes on the fact that an exploit kit leads its potential victim through a process of exploitation by forcing the browser to download multiple web resources from malicious servers. This process has an inherent structure that can be captured in HTTP traffic and used to significantly reduce error rates. The approach organizes HTTP traffic into tree-like data structures, and, using a scalable index of exploit kit traces as samples, models the detection process as a subtree similarity search problem. The technique is evaluated on 3,800 hours of web traffic on a large enterprise network, and results show that it reduces false positive rates by four orders of magnitude over current state-of-the-art approaches. While utilizing structure can vastly improve detection rates over current approaches, it does not go far enough in helping defenders detect new, previously unseen attacks. As a result, a new framework that applies dynamic honeyclient analysis directly on network traffic at scale is proposed. The framework captures and stores a configurable window of reassembled HTTP objects network wide, uses lightweight content rendering to establish the chain of requests leading up to a suspicious event, then serves the initial response content back to the honeyclient in an isolated network. The framework is evaluated on a diverse collection of exploit kits as they evolve over a 1 year period. The empirical evaluation suggests that the approach offers significant operational value, and a single honeyclient can support a campus deployment of thousands of users. While the above approaches attempt to detect exploit kits before they have a chance to infect the client, they cannot protect a client that has already been infected. The final technique detects signs of post infection behavior by intrusions that abuses the domain name system (DNS) to make contact with an attacker. Contemporary detection approaches utilize the structure of a domain name and require hundreds of DNS messages to detect such malware. As a result, these detection mechanisms cannot detect malware in a timely manner and are susceptible to high error rates. The final technique, based on sequential hypothesis testing, uses the DNS message patterns of a subset of DNS traffic to detect malware in as little as four DNS messages, and with orders of magnitude reduction in error rates. The results of this work can make a significant operational impact on network security analysis, and open several exciting future directions for network security research.Doctor of Philosoph
Measuring and Disrupting Malware Distribution Networks: An Interdisciplinary Approach
Malware Delivery Networks (MDNs) are networks of webpages, servers, computers, and computer files that are used by cybercriminals to proliferate malicious software (or malware) onto victim machines. The business of malware delivery is a complex and multifaceted one that has become increasingly profitable over the last few years. Due to the ongoing arms race between cybercriminals and the security community, cybercriminals are constantly evolving and streamlining their techniques to beat security countermeasures and avoid disruption to their operations, such as by security researchers infiltrating their botnet operations, or law enforcement taking down their infrastructures and arresting those involved. So far, the research community has conducted insightful but isolated studies into the different facets of malicious file distribution. Hence, only a limited picture of the malicious file delivery ecosystem has been provided thus far, leaving many questions unanswered. Using a data-driven and interdisciplinary approach, the purpose of this research is twofold. One, to study and measure the malicious file delivery ecosystem, bringing prior research into context, and to understand precisely how these malware operations respond to security and law enforcement intervention. And two, taking into account the overlapping research efforts of the information security and crime science communities towards preventing cybercrime, this research aims to identify mitigation strategies and intervention points to disrupt this criminal economy more effectively
An Evasion Attack against ML-based Phishing URL Detectors
Background: Over the year, Machine Learning Phishing URL classification
(MLPU) systems have gained tremendous popularity to detect phishing URLs
proactively. Despite this vogue, the security vulnerabilities of MLPUs remain
mostly unknown. Aim: To address this concern, we conduct a study to understand
the test time security vulnerabilities of the state-of-the-art MLPU systems,
aiming at providing guidelines for the future development of these systems.
Method: In this paper, we propose an evasion attack framework against MLPU
systems. To achieve this, we first develop an algorithm to generate adversarial
phishing URLs. We then reproduce 41 MLPU systems and record their baseline
performance. Finally, we simulate an evasion attack to evaluate these MLPU
systems against our generated adversarial URLs. Results: In comparison to
previous works, our attack is: (i) effective as it evades all the models with
an average success rate of 66% and 85% for famous (such as Netflix, Google) and
less popular phishing targets (e.g., Wish, JBHIFI, Officeworks) respectively;
(ii) realistic as it requires only 23ms to produce a new adversarial URL
variant that is available for registration with a median cost of only
$11.99/year. We also found that popular online services such as Google
SafeBrowsing and VirusTotal are unable to detect these URLs. (iii) We find that
Adversarial training (successful defence against evasion attack) does not
significantly improve the robustness of these systems as it decreases the
success rate of our attack by only 6% on average for all the models. (iv)
Further, we identify the security vulnerabilities of the considered MLPU
systems. Our findings lead to promising directions for future research.
Conclusion: Our study not only illustrate vulnerabilities in MLPU systems but
also highlights implications for future study towards assessing and improving
these systems.Comment: Draft for ACM TOP
Analyzing and Defending Against Evolving Web Threats
The browser has evolved from a simple program that displays static web pages into a continuously-changing platform that is shaping the Internet as we know it today. The fierce competition among browser vendors has led to the introduction of a plethora of features in the past few years. At the same time, it remains the de facto way to access the Internet for billions of users. Because of such rapid evolution and wide popularity, the browser has attracted attackers, who pose new threats to unsuspecting Internet surfers.In this dissertation, I present my work on securing the browser againstcurrent and emerging threats. First, I discuss my work on honeyclients,which are tools that identify malicious pages that compromise the browser, and how one can evade such systems. Then, I describe a new system that I built, called Revolver, that automatically tracks the evolution of JavaScriptand is capable of identifying evasive web-based malware by finding similarities in JavaScript samples with different classifications. Finally, I present Hulk, a system that automatically analyzes and classifies browser extensions
PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool
In this paper, we propose a feature-free method for detecting phishing
websites using the Normalized Compression Distance (NCD), a parameter-free
similarity measure which computes the similarity of two websites by compressing
them, thus eliminating the need to perform any feature extraction. It also
removes any dependence on a specific set of website features. This method
examines the HTML of webpages and computes their similarity with known phishing
websites, in order to classify them. We use the Furthest Point First algorithm
to perform phishing prototype extractions, in order to select instances that
are representative of a cluster of phishing webpages. We also introduce the use
of an incremental learning algorithm as a framework for continuous and adaptive
detection without extracting new features when concept drift occurs. On a large
dataset, our proposed method significantly outperforms previous methods in
detecting phishing websites, with an AUC score of 98.68%, a high true positive
rate (TPR) of around 90%, while maintaining a low false positive rate (FPR) of
0.58%. Our approach uses prototypes, eliminating the need to retain long term
data in the future, and is feasible to deploy in real systems with a processing
time of roughly 0.3 seconds.Comment: 34 pages, 20 figure
- …