661 research outputs found
Padding Ain't Enough: Assessing the Privacy Guarantees of Encrypted DNS
DNS over TLS (DoT) and DNS over HTTPS (DoH) encrypt DNS to guard user privacy
by hiding DNS resolutions from passive adversaries. Yet, past attacks have
shown that encrypted DNS is still sensitive to traffic analysis. As a
consequence, RFC 8467 proposes to pad messages prior to encryption, which
heavily reduces the characteristics of encrypted traffic. In this paper, we
show that padding alone is insufficient to counter DNS traffic analysis. We
propose a novel traffic analysis method that combines size and timing
information to infer the websites a user visits purely based on encrypted and
padded DNS traces. To this end, we model DNS sequences that capture the
complexity of websites that usually trigger dozens of DNS resolutions instead
of just a single DNS transaction. A closed world evaluation based on the Alexa
top-10k websites reveals that attackers can deanonymize at least half of the
test traces in 80.2% of all websites, and even correctly label all traces for
32.0% of the websites. Our findings undermine the privacy goals of
state-of-the-art message padding strategies in DoT/DoH. We conclude by showing
that successful mitigations to such attacks have to remove the entropy of
inter-arrival timings between query responses
A distributed alerting service for open digital library software
Alerting for Digital Libraries (DL) is an important and useful feature for the library users. To date, two independent services and a few publisher-hosted proprietary services have been developed. Here, we address the problem of integrating alerting as functionality into open source software for distributed digital libraries. DL software is one application out of many that constitute so-called meta-software: software where its installation determines the properties of the actual running system (here: the Digital Library system). For this type of application, existing alerting solutions are insufficient; new ways have to be found for supporting a fragmented network of distributed digital library servers. We propose the design and usage of a distributed Directory Service. This paper also introduces our hybrid approach using two networks and a combination of different distributed routing strategies for event filtering
IPv6-specific misconfigurations in the DNS
With the Internet transitioning from IPv4 to IPv6, the number of IPv6-specific DNS records (AAAA) increases. Misconfigurations in these records often go unnoticed, as most systems are provided with connectivity over both IPv4 and IPv6, and automatically fall back to IPv4 in case of connection problems. With IPv6-only networks on the rise, such misconfigurations result in servers or services rendered unreachable. Using long-term active DNS measurements over multiple zones, we qualify and quantify these IPv6-specific misconfigurations. Applying pattern matching on AAAA records revealed which configuration mistakes occur most, the distribution of faulty records per DNS operator, and how these numbers evolved over time. We show that more than 97% of invalid records can be categorized into one of our ten defined main configuration mistakes. Furthermore, we show that while the number and ratio of invalid records decreased over the last two years, the number of DNS operators with at least one faulty AAAA record increased. This emphasizes the need for easily applicable checks in DNS management systems, for which we provide recommendations in the conclusions of this work
Rusty Clusters? Dusting an IPv6 Research Foundation
The long-running IPv6 Hitlist service is an important foundation for IPv6
measurement studies. It helps to overcome infeasible, complete address space
scans by collecting valuable, unbiased IPv6 address candidates and regularly
testing their responsiveness. However, the Internet itself is a quickly
changing ecosystem that can affect longrunning services, potentially inducing
biases and obscurities into ongoing data collection means. Frequent analyses
but also updates are necessary to enable a valuable service to the community.
In this paper, we show that the existing hitlist is highly impacted by the
Great Firewall of China, and we offer a cleaned view on the development of
responsive addresses. While the accumulated input shows an increasing bias
towards some networks, the cleaned set of responsive addresses is well
distributed and shows a steady increase.
Although it is a best practice to remove aliased prefixes from IPv6 hitlists,
we show that this also removes major content delivery networks. More than 98%
of all IPv6 addresses announced by Fastly were labeled as aliased and
Cloudflare prefixes hosting more than 10M domains were excluded. Depending on
the hitlist usage, e.g., higher layer protocol scans, inclusion of addresses
from these providers can be valuable.
Lastly, we evaluate different new address candidate sources, including target
generation algorithms to improve the coverage of the current IPv6 Hitlist. We
show that a combination of different methodologies is able to identify 5.6M
new, responsive addresses. This accounts for an increase by 174% and combined
with the current IPv6 Hitlist, we identify 8.8M responsive addresses
DNS weighted footprints for web browsing analytics
The monetization of the large amount of data that ISPs have of their users is still in early stages. Specifically, the knowledge of the websites that specific users or aggregates of users visit opens new opportunities of business, after the convenient sanitization. However, the construction of accurate DNS-based web-user profiles on large networks is a challenge not only because the requirements that capturing traffic entails, but also given the use of DNS caches, the proliferation of botnets and the complexity of current websites (i.e., when a user visit a website a set of self-triggered DNS queries for banners, from both same company and third parties services, as well for some preloaded and prefetching contents are in place). In this way, we propose to count the intentional visits users make to websites by means of DNS weighted footprints. Such novel approach consists of considering that a website was actively visited if an empirical-estimated fraction of the DNS queries of both the own website and the set of self-triggered websites are found. This approach has been coded in a final system named DNSprints. After its parameterization (i.e., balancing the importance of a website in a footprint with respect to the total set of footprints), we have measured that our proposal is able to identify visits and their durations with false and true positives rates between 2 and 9% and over 90%, respectively, at throughputs between 800,000 and 1.4 million DNS packets per second in diverse scenarios, thus proving both its refinement and applicabilityThe authors would like to acknowledge funding received through TRAFICA (TEC2015-69417-C2-1-R) grant from the Spanish R&D programme. The authors thank Víctor Uceda for his collaboration in the early stages of this wor
Inline detection of DGA domains using side information
Malware applications typically use a command and control (C&C) server to manage bots to perform malicious activities. Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names that can be used to establish a communication between an infected bot and the C&C server. In recent years, machine learning based systems have been widely used to detect DGAs. There are several well known state-of-the-art classifiers in the literature that can detect DGA domain names in real-time applications with high predictive performance. However, these DGA classifiers are highly vulnerable to adversarial attacks in which adversaries purposely craft domain names to evade DGA detection classifiers. In our work, we focus on hardening DGA classifiers against adversarial attacks. To this end, we train and evaluate state-of-the-art deep learning and random forest (RF) classifiers for DGA detection using side information that is harder for adversaries to manipulate than the domain name itself. Additionally, the side information features are selected such that they are easily obtainable in practice to perform inline DGA detection. The performance and robustness of these models is assessed by exposing them to one day of real-traffic data as well as domains generated by adversarial attack algorithms. We found that the DGA classifiers that rely on both the domain name and side information have high performance and are more robust against adversaries
Latency-Based Anycast Geolocation: Algorithms, Software, and Datasets
International audienceUse of IP-layer anycast has increased in the last few years beyond the DNS realm. Yet, existing measurement techniques to identify and enumerate anycast replicas exploit specifics of the DNS protocol, which limits their applicability to this particular service. With this paper, we not only propose and thoroughly validate a protocol-agnostic technique for anycast replicas discovery and geolocation, but also provide the community with open source software and datasets to replicate our experimental results, as well as facilitating the development of new techniques such as ours. In particular, our proposed method achieves thorough enumer-ation and city-level geolocalization of anycast instances from a set of known vantage points. The algorithm features an iterative workflow, pipelining enumeration (an optimization problem using latency as input) and geolocalization (a classification problem using side channel information such as city population) of anycast replicas. Results of a thorough validation campaign show our algorithm to be robust to measurement noise, and very lightweight as it requires only a handful of latency measurements
- …