38 research outputs found
Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse
Domain squatting is a common adversarial practice where attackers register
domain names that are purposefully similar to popular domains. In this work, we
study a specific type of domain squatting called "combosquatting," in which
attackers register domains that combine a popular trademark with one or more
phrases (e.g., betterfacebook[.]com, youtube-live[.]com). We perform the first
large-scale, empirical study of combosquatting by analyzing more than 468
billion DNS records---collected from passive and active DNS data sources over
almost six years. We find that almost 60% of abusive combosquatting domains
live for more than 1,000 days, and even worse, we observe increased activity
associated with combosquatting year over year. Moreover, we show that
combosquatting is used to perform a spectrum of different types of abuse
including phishing, social engineering, affiliate abuse, trademark abuse, and
even advanced persistent threats. Our results suggest that combosquatting is a
real problem that requires increased scrutiny by the security community.Comment: ACM CCS 1
CharBot: A Simple and Effective Method for Evading DGA Classifiers
Domain generation algorithms (DGAs) are commonly leveraged by malware to
create lists of domain names which can be used for command and control (C&C)
purposes. Approaches based on machine learning have recently been developed to
automatically detect generated domain names in real-time. In this work, we
present a novel DGA called CharBot which is capable of producing large numbers
of unregistered domain names that are not detected by state-of-the-art
classifiers for real-time detection of DGAs, including the recently published
methods FANCI (a random forest based on human-engineered features) and LSTM.MI
(a deep learning approach). CharBot is very simple, effective and requires no
knowledge of the targeted DGA classifiers. We show that retraining the
classifiers on CharBot samples is not a viable defense strategy. We believe
these findings show that DGA classifiers are inherently vulnerable to
adversarial attacks if they rely only on the domain name string to make a
decision. Designing a robust DGA classifier may, therefore, necessitate the use
of additional information besides the domain name alone. To the best of our
knowledge, CharBot is the simplest and most efficient black-box adversarial
attack against DGA classifiers proposed to date
Defending Against Typosquatting Attacks in Programming Language-Based Package Repositories
Program size and complexity have dramatically increased over time. To reduce their workload, developers began to utilize package managers. These packages managers allow third-party functionality, contained in units called packages, to be quickly imported into a project. Due to their utility, packages have become remarkably popular. The largest package repository, npm, has more than 1.2 million publicly available packages and serves more than 80 billion package downloads per month. In recent years, this popularity has attracted the attention of malicious users. Attackers have the ability to upload packages which contain malware. To increase the number of victims, attackers regularly leverage a tactic called typosquatting, which involves giving the malicious package a name that is very similar to the name of a popular package. Users who make a typo when trying to install the popular package fall victim to the attack and are instead served the malicious payload. The consequences of typosquatting attacks can be catastrophic. Historical typosquatting attacks have exported passwords, stolen cryptocurrency, and opened reverse shells. This thesis focuses on typosquatting attacks in package repositories. It explores the extent to which typosquatting exists in npm and PyPI (the de facto standard package repositories for Node.js and Python, respectively), proposes a practical defense against typosquatting attacks, and quantifies the efficacy of the proposed defense. The presented solution incurs an acceptable temporal overhead of 2.5% on the standard package installation process and is expected to affect approximately 0.5% of all weekly package downloads. Furthermore, it has been used to discover a particularly high-profile typosquatting perpetrator, which was then reported and has since been deprecated by npm. Typosquatting is an important yet preventable problem. This thesis recommends package creators to protect their own packages with a technique called defensive typosquatting and repository maintainers to protect all users through augmentations to their package managers or automated monitoring of the package namespace
CharBot : a simple and effective method for evading DGA classifiers
Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names, which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this paper, we present a novel DGA called CharBot, which is capable of producing large numbers of unregistered domain names that are not detected by state-of-the-art classifiers for real-time detection of the DGAs, including the recently published methods FANCI (a random forest based on human-engineered features) and LSTM.MI (a deep learning approach). The CharBot is very simple, effective, and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently vulnerable to adversarial attacks if they rely only on the domain name string to make a decision. Designing a robust DGA classifier may, therefore, necessitate the use of additional information besides the domain name alone. To the best of our knowledge, the CharBot is the simplest and most efficient black-box adversarial attack against DGA classifiers proposed to date
Lost in Translation: AI-based Generator of Cross-Language Sound-Squatting
Sound-squatting is a phishing attack that tricks users into accessing malicious resources by exploiting similarities in the pronunciation of words. It is an understudied threat that gains traction with the popularity of smart-speakers and the resurgence of content consumption exclusively via audio, such as podcasts. Defending against sound-squatting is complex, and existing solutions rely on manually curated lists of homophones, which limits the search to a few (and mostly existing) words only.
We introduce Sound-squatter, a multi-language AI-based system that generates sound-squatting candidates for a proactive defence that covers over 80\% of exact homophones and further generates thousands of high-quality approximated homophones. Sound-squatter relies on a state-of-art Transformer Network to learn transliteration. We search for Sound-squatter generated cross-language sound-squatting domains over hundreds of millions of emitted TLS certificates compared with other types of squatting candidates.
Our finding reveals that around 6% of generated sound-squatting candidates have emitted TLS certificates, compared to 8% of other types of squatting candidates. We believe \Sound-squatter uncovers the usage of multilingual sound-squatting phenomenon on the Internet and it is a crucial asset for proactive protection against sound-squatting