30 research outputs found
Beyond Counting: New Perspectives on the Active IPv4 Address Space
In this study, we report on techniques and analyses that enable us to capture
Internet-wide activity at individual IP address-level granularity by relying on
server logs of a large commercial content delivery network (CDN) that serves
close to 3 trillion HTTP requests on a daily basis. Across the whole of 2015,
these logs recorded client activity involving 1.2 billion unique IPv4
addresses, the highest ever measured, in agreement with recent estimates.
Monthly client IPv4 address counts showed constant growth for years prior, but
since 2014, the IPv4 count has stagnated while IPv6 counts have grown. Thus, it
seems we have entered an era marked by increased complexity, one in which the
sole enumeration of active IPv4 addresses is of little use to characterize
recent growth of the Internet as a whole.
With this observation in mind, we consider new points of view in the study of
global IPv4 address activity. Our analysis shows significant churn in active
IPv4 addresses: the set of active IPv4 addresses varies by as much as 25% over
the course of a year. Second, by looking across the active addresses in a
prefix, we are able to identify and attribute activity patterns to network
restructurings, user behaviors, and, in particular, various address assignment
practices. Third, by combining spatio-temporal measures of address utilization
with measures of traffic volume, and sampling-based estimates of relative host
counts, we present novel perspectives on worldwide IPv4 address activity,
including empirical observation of under-utilization in some areas, and
complete utilization, or exhaustion, in others.Comment: in Proceedings of ACM IMC 201
Tracking Users across the Web via TLS Session Resumption
User tracking on the Internet can come in various forms, e.g., via cookies or
by fingerprinting web browsers. A technique that got less attention so far is
user tracking based on TLS and specifically based on the TLS session resumption
mechanism. To the best of our knowledge, we are the first that investigate the
applicability of TLS session resumption for user tracking. For that, we
evaluated the configuration of 48 popular browsers and one million of the most
popular websites. Moreover, we present a so-called prolongation attack, which
allows extending the tracking period beyond the lifetime of the session
resumption mechanism. To show that under the observed browser configurations
tracking via TLS session resumptions is feasible, we also looked into DNS data
to understand the longest consecutive tracking period for a user by a
particular website. Our results indicate that with the standard setting of the
session resumption lifetime in many current browsers, the average user can be
tracked for up to eight days. With a session resumption lifetime of seven days,
as recommended upper limit in the draft for TLS version 1.3, 65% of all users
in our dataset can be tracked permanently.Comment: 11 page
Entropy/IP: Uncovering Structure in IPv6 Addresses
In this paper, we introduce Entropy/IP: a system that discovers Internet
address structure based on analyses of a subset of IPv6 addresses known to be
active, i.e., training data, gleaned by readily available passive and active
means. The system is completely automated and employs a combination of
information-theoretic and machine learning techniques to probabilistically
model IPv6 addresses. We present results showing that our system is effective
in exposing structural characteristics of portions of the IPv6 Internet address
space populated by active client, service, and router addresses.
In addition to visualizing the address structure for exploration, the system
uses its models to generate candidate target addresses for scanning. For each
of 15 evaluated datasets, we train on 1K addresses and generate 1M candidates
for scanning. We achieve some success in 14 datasets, finding up to 40% of the
generated addresses to be active. In 11 of these datasets, we find active
network identifiers (e.g., /64 prefixes or `subnets') not seen in training.
Thus, we provide the first evidence that it is practical to discover subnets
and hosts by scanning probabilistically selected areas of the IPv6 address
space not known to contain active hosts a priori.Comment: Paper presented at the ACM IMC 2016 in Santa Monica, USA
(https://dl.acm.org/citation.cfm?id=2987445). Live Demo site available at
http://www.entropy-ip.com
Scanning the internet for liveness
Internet-wide scanning depends on a notion of liveness: does a target IP address respond to a probe packet? However, the interpretation of such responses, or lack of them, is nuanced and depends on multiple factors, including: how we probed, how different protocols in the network stack interact, the presence of filtering policies near the target, and temporal churn in IP responsiveness. Although often neglected, these factors can significantly affect the results of active measurement studies. We develop a taxonomy of liveness which we employ to develop a method to perform concurrent IPv4 scans using ICMP, five TCP-based, and two UDP-based protocols, comprehensively capturing all responses to our probes, including negative and cross-layer responses. Leveraging our methodology, we present a systematic analysis of liveness and how it manifests in active scanning campaigns, yielding practical insights and methodological improvements for the design and the execution of active Internet measurement studies.</jats:p
Efficient Passive ICS Device Discovery and Identification by MAC Address Correlation
Owing to a growing number of attacks, the assessment of Industrial Control
Systems (ICSs) has gained in importance. An integral part of an assessment is
the creation of a detailed inventory of all connected devices, enabling
vulnerability evaluations. For this purpose, scans of networks are crucial.
Active scanning, which generates irregular traffic, is a method to get an
overview of connected and active devices. Since such additional traffic may
lead to an unexpected behavior of devices, active scanning methods should be
avoided in critical infrastructure networks. In such cases, passive network
monitoring offers an alternative, which is often used in conjunction with
complex deep-packet inspection techniques. There are very few publications on
lightweight passive scanning methodologies for industrial networks. In this
paper, we propose a lightweight passive network monitoring technique using an
efficient Media Access Control (MAC) address-based identification of industrial
devices. Based on an incomplete set of known MAC address to device
associations, the presented method can guess correct device and vendor
information. Proving the feasibility of the method, an implementation is also
introduced and evaluated regarding its efficiency. The feasibility of
predicting a specific device/vendor combination is demonstrated by having
similar devices in the database. In our ICS testbed, we reached a host
discovery rate of 100% at an identification rate of more than 66%,
outperforming the results of existing tools.Comment: http://dx.doi.org/10.14236/ewic/ICS2018.
On the Origin of Scanning: The Impact of Location on Internet-Wide Scans
Fast IPv4 scanning has enabled researchers to answer a wealth of security and networking questions. Yet, despite widespread use, there has been little validation of the methodology’s accuracy, including whether a single scan provides sufficient coverage. In this paper, we analyze how scan origin affects the results of Internet-wide scans by completing three HTTP, HTTPS, and SSH scans from seven geographically and topologically diverse networks. We find that individual origins miss an average 1.6–8.4% of HTTP, 1.5–4.6% of HTTPS, and 8.3–18.2% of SSH hosts. We analyze why origins see different hosts, and show how permanent and temporary blocking, packet loss, geographic biases, and transient outages affect scan results. We discuss the implications for scanning and provide recommendations for future studies
A haystack full of needles: scalable detection of IoT devices in the wild
Consumer Internet of Things (IoT) devices are extremely popular, providing users with rich and diverse functionalities, from voice assistants to home appliances. These functionalities often come with significant privacy and security risks, with notable recent large scale coordinated global attacks disrupting large service providers. Thus, an important first step to address these risks is to know what IoT devices are where in a network. While some limited solutions exist, a key question is whether device discovery can be done by Internet service providers that only see sampled flow statistics. In particular, it is challenging for an ISP to efficiently and effectively track and trace activity from IoT devices deployed by its millions of subscribers --all with sampled network data. In this paper, we develop and evaluate a scalable methodology to accurately detect and monitor IoT devices at subscriber lines with limited, highly sampled data in-the-wild. Our findings indicate that millions of IoT devices are detectable and identifiable within hours, both at a major ISP as well as an IXP, using passive, sparsely sampled network flow headers. Our methodology is able to detect devices from more than 77% of the studied IoT manufacturers, including popular devices such as smart speakers. While our methodology is effective for providing network analytics, it also highlights significant privacy consequences
Evolution of communities of software: using tensor decompositions to compare software ecosystems
© 2019 The Authors. Published by Springer. This is an open access article available under a Creative Commons licence.
The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.1007/s41109-019-0193-5Modern software development is often a collaborative effort involving many authors through the re-use and sharing of code through software libraries. Modern software “ecosystems” are complex socio-technical systems which can be represented as a multilayer dynamic network. Many of these libraries and software packages are open-source and developed in the open on sites such as , so there is a large amount of data available about these networks. Studying these networks could be of interest to anyone choosing or designing a programming language. In this work, we use tensor factorisation to explore the dynamics of communities of software, and then compare these dynamics between languages on a dataset of approximately 1 million software projects. We hope to be able to inform the debate on software dependencies that has been recently re-ignited by the malicious takeover of the npm package and other incidents through giving a clearer picture of the structure of software dependency networks, and by exploring how the choices of language designers—for example, in the size of standard libraries, or the standards to which packages are held before admission to a language ecosystem is granted—may have shaped their language ecosystems. We establish that adjusted mutual information is a valid metric by which to assess the number of communities in a tensor decomposition and find that there are striking differences between the communities found across different software ecosystems and that communities do experience large and interpretable changes in activity over time. The differences between the elm and R software ecosystems, which see some communities decline over time, and the more conventional software ecosystems of Python, Java and JavaScript, which do not see many declining communities, are particularly marked.OAB’s work was supported as part of an Engineering and Physical Sciences Research Council (EPSRC) grant, project reference EP/I028099/1.Published versio