54 research outputs found

    Web感染型攻撃における潜在的特徴の解析法

    Get PDF
    早大学位記番号:新7789早稲田大

    Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse

    Full text link
    Domain squatting is a common adversarial practice where attackers register domain names that are purposefully similar to popular domains. In this work, we study a specific type of domain squatting called "combosquatting," in which attackers register domains that combine a popular trademark with one or more phrases (e.g., betterfacebook[.]com, youtube-live[.]com). We perform the first large-scale, empirical study of combosquatting by analyzing more than 468 billion DNS records---collected from passive and active DNS data sources over almost six years. We find that almost 60% of abusive combosquatting domains live for more than 1,000 days, and even worse, we observe increased activity associated with combosquatting year over year. Moreover, we show that combosquatting is used to perform a spectrum of different types of abuse including phishing, social engineering, affiliate abuse, trademark abuse, and even advanced persistent threats. Our results suggest that combosquatting is a real problem that requires increased scrutiny by the security community.Comment: ACM CCS 1

    From Chatbots to PhishBots? -- Preventing Phishing scams created using ChatGPT, Google Bard and Claude

    Full text link
    The advanced capabilities of Large Language Models (LLMs) have made them invaluable across various applications, from conversational agents and content creation to data analysis, research, and innovation. However, their effectiveness and accessibility also render them susceptible to abuse for generating malicious content, including phishing attacks. This study explores the potential of using four popular commercially available LLMs - ChatGPT (GPT 3.5 Turbo), GPT 4, Claude and Bard to generate functional phishing attacks using a series of malicious prompts. We discover that these LLMs can generate both phishing emails and websites that can convincingly imitate well-known brands, and also deploy a range of evasive tactics for the latter to elude detection mechanisms employed by anti-phishing systems. Notably, these attacks can be generated using unmodified, or "vanilla," versions of these LLMs, without requiring any prior adversarial exploits such as jailbreaking. As a countermeasure, we build a BERT based automated detection tool that can be used for the early detection of malicious prompts to prevent LLMs from generating phishing content attaining an accuracy of 97\% for phishing website prompts, and 94\% for phishing email prompts

    An Automated Methodology for Validating Web Related Cyber Threat Intelligence by Implementing a Honeyclient

    Get PDF
    Loodud töö panustab küberkaitse valdkonda pakkudes alternatiivse viisi, kuidas hoida ohuteadmus andmebaas uuendatuna. Veebilehti kasutatakse ära viisina toimetada pahatahtlik kood ohvrini. Peale veebilehe klassifitseerimist pahaloomuliseks lisatakse see ohuteadmus andmebaasi kui pahaloomulise indikaatorina. Lõppkokkuvõtteks muutuvad sellised andmebaasid mahukaks ja sisaldavad aegunud kirjeid. Lahendus on automatiseerida aegunud kirjete kontrollimist klient-meepott tarkvaraga ning kogu protsess on täielikult automatiseeritav eesmärgiga hoida kokku aega. Jahtides kontrollitud ja kinnitatud indikaatoreid aitab see vältida valedel alustel küberturbe intsidentide menetlemist.This paper is contributing to the open source cybersecurity community by providing an alternative methodology for analyzing web related cyber threat intelligence. Websites are used commonly as an attack vector to spread malicious content crafted by any malicious party. These websites become threat intelligence which can be stored and collected into corresponding databases. Eventually these cyber threat databases become obsolete and can lead to false positive investigations in cyber incident response. The solution is to keep the threat indicator entries valid by verifying their content and this process can be fully automated to keep the process less time consuming. The proposed technical solution is a low interaction honeyclient regularly tasked to verify the content of the web based threat indicators. Due to the huge amount of database entries, this way most of the web based threat indicators can be automatically validated with less time consumption and they can be kept relevant for monitoring purposes and eventually can lead to avoiding false positives in an incident response processes

    Using Context to Improve Network-based Exploit Kit Detection

    Get PDF
    Today, our computers are routinely compromised while performing seemingly innocuous activities like reading articles on trusted websites (e.g., the NY Times). These compromises are perpetrated via complex interactions involving the advertising networks that monetize these sites. Web-based compromises such as exploit kits are similar to any other scam -- the attacker wants to lure an unsuspecting client into a trap to steal private information, or resources -- generating 10s of millions of dollars annually. Exploit kits are web-based services specifically designed to capitalize on vulnerabilities in unsuspecting client computers in order to install malware without a user's knowledge. Sadly, it only takes a single successful infection to ruin a user's financial life, or lead to corporate breaches that result in millions of dollars of expense and loss of customer trust. Exploit kits use a myriad of techniques to obfuscate each attack instance, making current network-based defenses such as signature-based network intrusion detection systems far less effective than in years past. Dynamic analysis or honeyclient analysis on these exploits plays a key role in identifying new attacks for signature generation, but provides no means of inspecting end-user traffic on the network to identify attacks in real time. As a result, defenses designed to stop such malfeasance often arrive too late or not at all resulting in high false positive and false negative (error) rates. In order to deal with these drawbacks, three new detection approaches are presented. To deal with the issue of a high number of errors, a new technique for detecting exploit kit interactions on a network is proposed. The technique capitalizes on the fact that an exploit kit leads its potential victim through a process of exploitation by forcing the browser to download multiple web resources from malicious servers. This process has an inherent structure that can be captured in HTTP traffic and used to significantly reduce error rates. The approach organizes HTTP traffic into tree-like data structures, and, using a scalable index of exploit kit traces as samples, models the detection process as a subtree similarity search problem. The technique is evaluated on 3,800 hours of web traffic on a large enterprise network, and results show that it reduces false positive rates by four orders of magnitude over current state-of-the-art approaches. While utilizing structure can vastly improve detection rates over current approaches, it does not go far enough in helping defenders detect new, previously unseen attacks. As a result, a new framework that applies dynamic honeyclient analysis directly on network traffic at scale is proposed. The framework captures and stores a configurable window of reassembled HTTP objects network wide, uses lightweight content rendering to establish the chain of requests leading up to a suspicious event, then serves the initial response content back to the honeyclient in an isolated network. The framework is evaluated on a diverse collection of exploit kits as they evolve over a 1 year period. The empirical evaluation suggests that the approach offers significant operational value, and a single honeyclient can support a campus deployment of thousands of users. While the above approaches attempt to detect exploit kits before they have a chance to infect the client, they cannot protect a client that has already been infected. The final technique detects signs of post infection behavior by intrusions that abuses the domain name system (DNS) to make contact with an attacker. Contemporary detection approaches utilize the structure of a domain name and require hundreds of DNS messages to detect such malware. As a result, these detection mechanisms cannot detect malware in a timely manner and are susceptible to high error rates. The final technique, based on sequential hypothesis testing, uses the DNS message patterns of a subset of DNS traffic to detect malware in as little as four DNS messages, and with orders of magnitude reduction in error rates. The results of this work can make a significant operational impact on network security analysis, and open several exciting future directions for network security research.Doctor of Philosoph

    Measuring and Disrupting Malware Distribution Networks: An Interdisciplinary Approach

    Get PDF
    Malware Delivery Networks (MDNs) are networks of webpages, servers, computers, and computer files that are used by cybercriminals to proliferate malicious software (or malware) onto victim machines. The business of malware delivery is a complex and multifaceted one that has become increasingly profitable over the last few years. Due to the ongoing arms race between cybercriminals and the security community, cybercriminals are constantly evolving and streamlining their techniques to beat security countermeasures and avoid disruption to their operations, such as by security researchers infiltrating their botnet operations, or law enforcement taking down their infrastructures and arresting those involved. So far, the research community has conducted insightful but isolated studies into the different facets of malicious file distribution. Hence, only a limited picture of the malicious file delivery ecosystem has been provided thus far, leaving many questions unanswered. Using a data-driven and interdisciplinary approach, the purpose of this research is twofold. One, to study and measure the malicious file delivery ecosystem, bringing prior research into context, and to understand precisely how these malware operations respond to security and law enforcement intervention. And two, taking into account the overlapping research efforts of the information security and crime science communities towards preventing cybercrime, this research aims to identify mitigation strategies and intervention points to disrupt this criminal economy more effectively

    An Evasion Attack against ML-based Phishing URL Detectors

    Full text link
    Background: Over the year, Machine Learning Phishing URL classification (MLPU) systems have gained tremendous popularity to detect phishing URLs proactively. Despite this vogue, the security vulnerabilities of MLPUs remain mostly unknown. Aim: To address this concern, we conduct a study to understand the test time security vulnerabilities of the state-of-the-art MLPU systems, aiming at providing guidelines for the future development of these systems. Method: In this paper, we propose an evasion attack framework against MLPU systems. To achieve this, we first develop an algorithm to generate adversarial phishing URLs. We then reproduce 41 MLPU systems and record their baseline performance. Finally, we simulate an evasion attack to evaluate these MLPU systems against our generated adversarial URLs. Results: In comparison to previous works, our attack is: (i) effective as it evades all the models with an average success rate of 66% and 85% for famous (such as Netflix, Google) and less popular phishing targets (e.g., Wish, JBHIFI, Officeworks) respectively; (ii) realistic as it requires only 23ms to produce a new adversarial URL variant that is available for registration with a median cost of only $11.99/year. We also found that popular online services such as Google SafeBrowsing and VirusTotal are unable to detect these URLs. (iii) We find that Adversarial training (successful defence against evasion attack) does not significantly improve the robustness of these systems as it decreases the success rate of our attack by only 6% on average for all the models. (iv) Further, we identify the security vulnerabilities of the considered MLPU systems. Our findings lead to promising directions for future research. Conclusion: Our study not only illustrate vulnerabilities in MLPU systems but also highlights implications for future study towards assessing and improving these systems.Comment: Draft for ACM TOP

    Analyzing and Defending Against Evolving Web Threats

    Get PDF
    The browser has evolved from a simple program that displays static web pages into a continuously-changing platform that is shaping the Internet as we know it today. The fierce competition among browser vendors has led to the introduction of a plethora of features in the past few years. At the same time, it remains the de facto way to access the Internet for billions of users. Because of such rapid evolution and wide popularity, the browser has attracted attackers, who pose new threats to unsuspecting Internet surfers.In this dissertation, I present my work on securing the browser againstcurrent and emerging threats. First, I discuss my work on honeyclients,which are tools that identify malicious pages that compromise the browser, and how one can evade such systems. Then, I describe a new system that I built, called Revolver, that automatically tracks the evolution of JavaScriptand is capable of identifying evasive web-based malware by finding similarities in JavaScript samples with different classifications. Finally, I present Hulk, a system that automatically analyzes and classifies browser extensions

    PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool

    Full text link
    In this paper, we propose a feature-free method for detecting phishing websites using the Normalized Compression Distance (NCD), a parameter-free similarity measure which computes the similarity of two websites by compressing them, thus eliminating the need to perform any feature extraction. It also removes any dependence on a specific set of website features. This method examines the HTML of webpages and computes their similarity with known phishing websites, in order to classify them. We use the Furthest Point First algorithm to perform phishing prototype extractions, in order to select instances that are representative of a cluster of phishing webpages. We also introduce the use of an incremental learning algorithm as a framework for continuous and adaptive detection without extracting new features when concept drift occurs. On a large dataset, our proposed method significantly outperforms previous methods in detecting phishing websites, with an AUC score of 98.68%, a high true positive rate (TPR) of around 90%, while maintaining a low false positive rate (FPR) of 0.58%. Our approach uses prototypes, eliminating the need to retain long term data in the future, and is feasible to deploy in real systems with a processing time of roughly 0.3 seconds.Comment: 34 pages, 20 figure
    corecore