54 research outputs found
Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse
Domain squatting is a common adversarial practice where attackers register
domain names that are purposefully similar to popular domains. In this work, we
study a specific type of domain squatting called "combosquatting," in which
attackers register domains that combine a popular trademark with one or more
phrases (e.g., betterfacebook[.]com, youtube-live[.]com). We perform the first
large-scale, empirical study of combosquatting by analyzing more than 468
billion DNS records---collected from passive and active DNS data sources over
almost six years. We find that almost 60% of abusive combosquatting domains
live for more than 1,000 days, and even worse, we observe increased activity
associated with combosquatting year over year. Moreover, we show that
combosquatting is used to perform a spectrum of different types of abuse
including phishing, social engineering, affiliate abuse, trademark abuse, and
even advanced persistent threats. Our results suggest that combosquatting is a
real problem that requires increased scrutiny by the security community.Comment: ACM CCS 1
Measuring CDNs susceptible to Domain Fronting
Domain fronting is a network communication technique that involves leveraging
(or abusing) content delivery networks (CDNs) to disguise the final destination
of network packets by presenting them as if they were intended for a different
domain than their actual endpoint. This technique can be used for both benign
and malicious purposes, such as circumventing censorship or hiding
malware-related communications from network security systems. Since domain
fronting has been known for a few years, some popular CDN providers have
implemented traffic filtering approaches to curb its use at their CDN
infrastructure. However, it remains unclear to what extent domain fronting has
been mitigated.
To better understand whether domain fronting can still be effectively used,
we propose a systematic approach to discover CDNs that are still prone to
domain fronting. To this end, we leverage passive and active DNS traffic
analysis to pinpoint domain names served by CDNs and build an automated tool
that can be used to discover CDNs that allow domain fronting in their
infrastructure. Our results reveal that domain fronting is feasible in 22 out
of 30 CDNs that we tested, including some major CDN providers like Akamai and
Fastly. This indicates that domain fronting remains widely available and can be
easily abused for malicious purposes
SENet: Visual Detection of Online Social Engineering Attack Campaigns
Social engineering (SE) aims at deceiving users into performing actions that
may compromise their security and privacy. These threats exploit weaknesses in
human's decision making processes by using tactics such as pretext, baiting,
impersonation, etc. On the web, SE attacks include attack classes such as
scareware, tech support scams, survey scams, sweepstakes, etc., which can
result in sensitive data leaks, malware infections, and monetary loss. For
instance, US consumers lose billions of dollars annually due to various SE
attacks. Unfortunately, generic social engineering attacks remain understudied,
compared to other important threats, such as software vulnerabilities and
exploitation, network intrusions, malicious software, and phishing. The few
existing technical studies that focus on social engineering are limited in
scope and mostly focus on measurements rather than developing a generic
defense. To fill this gap, we present SEShield, a framework for in-browser
detection of social engineering attacks. SEShield consists of three main
components: (i) a custom security crawler, called SECrawler, that is dedicated
to scouting the web to collect examples of in-the-wild SE attacks; (ii) SENet,
a deep learning-based image classifier trained on data collected by SECrawler
that aims to detect the often glaring visual traits of SE attack pages; and
(iii) SEGuard, a proof-of-concept extension that embeds SENet into the web
browser and enables real-time SE attack detection. We perform an extensive
evaluation of our system and show that SENet is able to detect new instances of
SE attacks with a detection rate of up to 99.6% at 1% false positive, thus
providing an effective first defense against SE attacks on the web
Understanding Malvertising Through Ad-Injecting Browser Extensions
Malvertising is a malicious activity that leverages advertising to distribute various forms of malware. Because advertising is the key revenue generator for numerous Internet companies, large ad networks, such as Google, Yahoo and Microsoft, invest a lot of effort to mitigate malicious ads from their ad networks. This drives adversaries to look for alternative methods to deploy malvertising. In this paper, we show that browser extensions that use ads as their monetization strategy often facilitate the deployment of malver-tising. Moreover, while some extensions simply serve ads from ad networks that support malvertising, other extensions maliciously alter the content of visited webpages to force users into installing malware. To measure the extent of these behaviors we developed Expector, a system that automatically inspects and identifies browser extensions that inject ads, and then classifies these ads as malicious or benign based on their landing pages. Using Expector, we auto-matically inspected over 18,000 Chrome browser extensions. We found 292 extensions that inject ads, and detected 56 extensions that participate in malvertising using 16 different ad networks and with a total user base of 602,417
Practical Attacks Against Graph-based Clustering
Graph modeling allows numerous security problems to be tackled in a general
way, however, little work has been done to understand their ability to
withstand adversarial attacks. We design and evaluate two novel graph attacks
against a state-of-the-art network-level, graph-based detection system. Our
work highlights areas in adversarial machine learning that have not yet been
addressed, specifically: graph-based clustering techniques, and a global
feature space where realistic attackers without perfect knowledge must be
accounted for (by the defenders) in order to be practical. Even though less
informed attackers can evade graph clustering with low cost, we show that some
practical defenses are possible.Comment: ACM CCS 201
BOTection: bot detection by building Markov Chain models of bots network behavior
This paper was presented at the 15th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2020), 5-9 October 2020, Taipei, Taiwan. This is the accepted manuscript version of the paper. The final version is available online from the Association for Computing Machinery at: https://doi.org/10.1145/3320269.3372202.Botnets continue to be a threat to organizations, thus various machine learning-based botnet detectors have been proposed. However, the capability of such systems in detecting new or unseen botnets is crucial to ensure its robustness against the rapid evolution of botnets. Moreover, it prolongs the effectiveness of the system in detecting bots, avoiding frequent and time-consuming classifier re-training. We present BOTection, a privacy-preserving bot detection system that models the bot network flow behavior as a Markov Chain. The Markov Chain state transitions capture the bots' network behavior using high-level flow features as states, producing content-agnostic and encryption resilient behavioral features. These features are used to train a classifier to first detect flows produced by bots, and then identify their bot families. We evaluate our system on a dataset of over 7M malicious flows from 12 botnet families, showing its capability of detecting bots' network traffic with 99.78% F-measure and classifying it to a malware family with a 99.09% F-measure. Notably, due to the modeling of general bot network behavior by the Markov Chains, BOTection can detect traffic belonging to unseen bot families with an F-measure of 93.03% making it robust against malware evolution.Accepted manuscrip
Vamo: Towards a fully automated malware clustering validity analysis
ABSTRACT Malware clustering is commonly applied by malware analysts to cope with the increasingly growing number of distinct malware variants collected every day from the Internet. While malware clustering systems can be useful for a variety of applications, assessing the quality of their results is intrinsically hard. In fact, clustering can be viewed as an unsupervised learning process over a dataset for which the complete ground truth is usually not available. Previous studies propose to evaluate malware clustering results by leveraging the labels assigned to the malware samples by multiple anti-virus scanners (AVs). However, the methods proposed thus far require a (semi-)manual adjustment and mapping between labels generated by different AVs, and are limited to selecting a reference sub-set of samples for which an agreement regarding their labels can be reached across a majority of AVs. This approach may bias the reference set towards "easy to cluster" malware samples, thus potentially resulting in an overoptimistic estimate of the accuracy of the malware clustering results. In this paper we propose VAMO, a system that provides a fully automated quantitative analysis of the validity of malware clustering results. Unlike previous work, VAMO does not seek a majority voting-based consensus across different AV labels, and does not discard the malware samples for which such a consensus cannot be reached. Rather, VAMO explicitly deals with the inconsistencies typical of multiple AV labels to build a more representative reference set, compared to majority voting-based approaches. Furthermore, VAMO avoids the need of a (semi-)manual mapping between AV labels from different scanners that was required in previous work. Through an extensive evaluation in a controlled setting and a real-world application, we show that VAMO outperforms majority voting-based approaches, and provides a better way for malware analysts to automatically assess the quality of their malware clustering results
Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems
Unsupervised or unlabeled learning approaches for network anomaly detection have been recently proposed. In particular, recent work on unlabeled anomaly detection focused on high speed classification based on simple payload statistics. For example, PAYL, an anomaly IDS, measures the occurrence frequency in the payload of n-grams. A simple model of normal traffic is then constructed according to this description of the packets ’ content. It has been demonstrated that anomaly detectors based on payload statistics can be “evaded ” by mimicry attacks using byte substitution and padding techniques. In this paper we propose a new approach to construct high speed payload-based anomaly IDS intended to be accurate and hard to evade. We propose a new technique to extract the features from the payload. We use a feature clustering algorithm originally proposed for text classification problems to reduce the dimensionality of the feature space. Accuracy and hardness of evasion are obtained by constructing our anomaly-based IDS using an ensemble of one-class SVM classifiers that work on different feature spaces.
- …