10 research outputs found
6Rover: Leveraging Reinforcement Learning-based Address Pattern Mining Approach for Discovering Active Targets in IPv6 Unseeded Space
The discovery of active IPv6 addresses represents a pivotal challenge in IPv6
network survey, as it is a prerequisite for downstream tasks such as network
topology measurements and security analysis. With the rapid spread of IPv6
networks in recent years, many researchers have focused on improving the hit
rate, efficiency, and coverage of IPv6 scanning methods, resulting in
considerable advancements. However, existing approaches remain heavily
dependent on seed addresses, thereby limiting their effectiveness in unseeded
prefixes. Consequently, this paper proposes 6Rover, a reinforcement
learning-based model for active address discovery in unseeded environments. To
overcome the reliance on seeded addresses, 6Rover constructs patterns with
higher generality that reflects the actual address allocation strategies of
network administrators, thereby avoiding biased transfers of patterns from
seeded to unseeded prefixes. After that, 6Rover employs a multi-armed bandit
model to optimize the probing resource allocation when applying patterns to
unseeded spaces. It models the challenge of discovering optimal patterns in
unseeded spaces as an exploration-exploitation dilemma, and progressively
uncover the potential patterns applied in unseeded spaces, leading to the
efficient discovery of active addresses without seed address as the prior
knowledge. Experiments on large-scale unseeded datasets show that 6Rover has a
higher hit rate than existing methods in the absence of any seed addresses as
prior knowledge. In real network environments, 6Rover achieved a 5% - 8% hit
rate in seedless spaces with 100 million budget scale, representing an
approximate 200\% improvement over the existing state-of-the-art methods
Rusty Clusters? Dusting an IPv6 Research Foundation
The long-running IPv6 Hitlist service is an important foundation for IPv6
measurement studies. It helps to overcome infeasible, complete address space
scans by collecting valuable, unbiased IPv6 address candidates and regularly
testing their responsiveness. However, the Internet itself is a quickly
changing ecosystem that can affect longrunning services, potentially inducing
biases and obscurities into ongoing data collection means. Frequent analyses
but also updates are necessary to enable a valuable service to the community.
In this paper, we show that the existing hitlist is highly impacted by the
Great Firewall of China, and we offer a cleaned view on the development of
responsive addresses. While the accumulated input shows an increasing bias
towards some networks, the cleaned set of responsive addresses is well
distributed and shows a steady increase.
Although it is a best practice to remove aliased prefixes from IPv6 hitlists,
we show that this also removes major content delivery networks. More than 98%
of all IPv6 addresses announced by Fastly were labeled as aliased and
Cloudflare prefixes hosting more than 10M domains were excluded. Depending on
the hitlist usage, e.g., higher layer protocol scans, inclusion of addresses
from these providers can be valuable.
Lastly, we evaluate different new address candidate sources, including target
generation algorithms to improve the coverage of the current IPv6 Hitlist. We
show that a combination of different methodologies is able to identify 5.6M
new, responsive addresses. This accounts for an increase by 174% and combined
with the current IPv6 Hitlist, we identify 8.8M responsive addresses
Discovering the IPv6 Network Periphery
We consider the problem of discovering the IPv6 network periphery, i.e., the
last hop router connecting endhosts in the IPv6 Internet. Finding the IPv6
periphery using active probing is challenging due to the IPv6 address space
size, wide variety of provider addressing and subnetting schemes, and
incomplete topology traces. As such, existing topology mapping systems can miss
the large footprint of the IPv6 periphery, disadvantaging applications ranging
from IPv6 census studies to geolocation and network resilience. We introduce
"edgy," an approach to explicitly discover the IPv6 network periphery, and use
it to find >~64M IPv6 periphery router addresses and >~87M links to these last
hops -- several orders of magnitude more than in currently available IPv6
topologies. Further, only 0.2% of edgy's discovered addresses are known to
existing IPv6 hitlists
Stratosphere: Finding Vulnerable Cloud Storage Buckets
Misconfigured cloud storage buckets have leaked hundreds of millions of
medical, voter, and customer records. These breaches are due to a combination
of easily-guessable bucket names and error-prone security configurations,
which, together, allow attackers to easily guess and access sensitive data. In
this work, we investigate the security of buckets, finding that prior studies
have largely underestimated cloud insecurity by focusing on simple,
easy-to-guess names. By leveraging prior work in the password analysis space,
we introduce Stratosphere, a system that learns how buckets are named in
practice in order to efficiently guess the names of vulnerable buckets. Using
Stratosphere, we find wide-spread exploitation of buckets and vulnerable
configurations continuing to increase over the years. We conclude with
recommendations for operators, researchers, and cloud providers.Comment: Proceedings of the 24th International Symposium on Research in
Attacks, Intrusions and Defenses. 202
In the IP of the Beholder: Strategies for Active IPv6 Topology Discovery
Existing methods for active topology discovery within the IPv6 Internet
largely mirror those of IPv4. In light of the large and sparsely populated
address space, in conjunction with aggressive ICMPv6 rate limiting by routers,
this work develops a different approach to Internet-wide IPv6 topology mapping.
We adopt randomized probing techniques in order to distribute probing load,
minimize the effects of rate limiting, and probe at higher rates. Second, we
extensively analyze the efficiency and efficacy of various IPv6 hitlists and
target generation methods when used for topology discovery, and synthesize new
target lists based on our empirical results to provide both breadth (coverage
across networks) and depth (to find potential subnetting). Employing our
probing strategy, we discover more than 1.3M IPv6 router interface addresses
from a single vantage point. Finally, we share our prober implementation,
synthesized target lists, and discovered IPv6 topology results
Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists
Network measurements are an important tool in understanding the Internet. Due
to the expanse of the IPv6 address space, exhaustive scans as in IPv4 are not
possible for IPv6. In recent years, several studies have proposed the use of
target lists of IPv6 addresses, called IPv6 hitlists.
In this paper, we show that addresses in IPv6 hitlists are heavily clustered.
We present novel techniques that allow IPv6 hitlists to be pushed from quantity
to quality. We perform a longitudinal active measurement study over 6 months,
targeting more than 50 M addresses. We develop a rigorous method to detect
aliased prefixes, which identifies 1.5 % of our prefixes as aliased, pertaining
to about half of our target addresses. Using entropy clustering, we group the
entire hitlist into just 6 distinct addressing schemes. Furthermore, we perform
client measurements by leveraging crowdsourcing.
To encourage reproducibility in network measurement research and to serve as
a starting point for future IPv6 studies, we publish source code, analysis
tools, and data.Comment: See https://ipv6hitlist.github.io for daily IPv6 hitlists, historical
data, and additional analyse
Addressless: A New Internet Server Model to Prevent Network Scanning
Eliminating unnecessary exposure is a principle of server security. The huge
IPv6 address space enhances security by making scanning infeasible, however,
with recent advances of IPv6 scanning technologies, network scanning is again
threatening server security. In this paper, we propose a new model named
addressless server, which separates the server into an entrance module and a
main service module, and assigns an IPv6 prefix instead of an IPv6 address to
the main service module. The entrance module generates a legitimate IPv6
address under this prefix by encrypting the client address, so that the client
can access the main server on a destination address that is different in each
connection. In this way, the model provides isolation to the main server,
prevents network scanning, and minimizes exposure. Moreover it provides a novel
framework that supports flexible load balancing, high-availability, and other
desirable features. The model is simple and does not require any modification
to the client or the network. We implement a prototype and experiments show
that our model can prevent the main server from being scanned at a slight
performance cost
Characterizing the IoT ecosystem at scale
Internet of Things (IoT) devices are extremely popular with home, business, and industrial users. To provide their services, they typically rely on a backend server in- frastructure on the Internet, which collectively form the IoT Ecosystem. This ecosys- tem is rapidly growing and offers users an increasing number of services. It also has been a source and target of significant security and privacy risks. One notable exam- ple is the recent large-scale coordinated global attacks, like Mirai, which disrupted large service providers. Thus, characterizing this ecosystem yields insights that help end-users, network operators, policymakers, and researchers better understand it, obtain a detailed view, and keep track of its evolution. In addition, they can use these insights to inform their decision-making process for mitigating this ecosystem’s security and privacy risks. In this dissertation, we characterize the IoT ecosystem at scale by (i) detecting the IoT devices in the wild, (ii) conducting a case study to measure how deployed IoT devices can affect users’ privacy, and (iii) detecting and measuring the IoT backend infrastructure. To conduct our studies, we collaborated with a large European Internet Service Provider (ISP) and a major European Internet eXchange Point (IXP). They rou- tinely collect large volumes of passive, sampled data, e.g., NetFlow and IPFIX, for their operational purposes. These data sources help providers obtain insights about their networks, and we used them to characterize the IoT ecosystem at scale. We start with IoT devices and study how to track and trace their activity in the wild. We developed and evaluated a scalable methodology to accurately detect and monitor IoT devices with limited, sparsely sampled data in the ISP and IXP. Next, we conduct a case study to measure how a myriad of deployed devices can affect the privacy of ISP subscribers. Unfortunately, we found that the privacy of a substantial fraction of IPv6 end-users is at risk. We noticed that a single device at home that encodes its MAC address into the IPv6 address could be utilized as a tracking identifier for the entire end-user prefix—even if other devices use IPv6 privacy extensions. Our results showed that IoT devices contribute the most to this privacy leakage. Finally, we focus on the backend server infrastructure and propose a methodology to identify and locate IoT backend servers operated by cloud services and IoT vendors. We analyzed their IoT traffic patterns as observed in the ISP. Our analysis sheds light on their diverse operational and deployment strategies. The need for issuing a priori unknown network-wide queries against large volumes of network flow capture data, which we used in our studies, motivated us to develop Flowyager. It is a system built on top of existing traffic capture utilities, and it relies on flow summarization techniques to reduce (i) the storage and transfer cost of flow captures and (ii) query response time. We deployed a prototype of Flowyager at both the IXP and ISP.Internet-of-Things-Geräte (IoT) sind aus vielen Haushalten, Büroräumen und In- dustrieanlagen nicht mehr wegzudenken. Um ihre Dienste zu erbringen, nutzen IoT- Geräte typischerweise auf eine Backend-Server-Infrastruktur im Internet, welche als Gesamtheit das IoT-Ökosystem bildet. Dieses Ökosystem wächst rapide an und bie- tet den Nutzern immer mehr Dienste an. Das IoT-Ökosystem ist jedoch sowohl eine Quelle als auch ein Ziel von signifikanten Risiken für die Sicherheit und Privatsphäre. Ein bemerkenswertes Beispiel sind die jüngsten groß angelegten, koordinierten globa- len Angriffe wie Mirai, durch die große Diensteanbieter gestört haben. Deshalb ist es wichtig, dieses Ökosystem zu charakterisieren, eine ganzheitliche Sicht zu bekommen und die Entwicklung zu verfolgen, damit Forscher, Entscheidungsträger, Endnutzer und Netzwerkbetreibern Einblicke und ein besseres Verständnis erlangen. Außerdem können alle Teilnehmer des Ökosystems diese Erkenntnisse nutzen, um ihre Entschei- dungsprozesse zur Verhinderung von Sicherheits- und Privatsphärerisiken zu verbes- sern. In dieser Dissertation charakterisieren wir die Gesamtheit des IoT-Ökosystems indem wir (i) IoT-Geräte im Internet detektieren, (ii) eine Fallstudie zum Einfluss von benutzten IoT-Geräten auf die Privatsphäre von Nutzern durchführen und (iii) die IoT-Backend-Infrastruktur aufdecken und vermessen. Um unsere Studien durchzuführen, arbeiten wir mit einem großen europäischen Internet- Service-Provider (ISP) und einem großen europäischen Internet-Exchange-Point (IXP) zusammen. Diese sammeln routinemäßig für operative Zwecke große Mengen an pas- siven gesampelten Daten (z.B. als NetFlow oder IPFIX). Diese Datenquellen helfen Netzwerkbetreibern Einblicke in ihre Netzwerke zu erlangen und wir verwendeten sie, um das IoT-Ökosystem ganzheitlich zu charakterisieren. Wir beginnen unsere Analysen mit IoT-Geräten und untersuchen, wie diese im Inter- net aufgespürt und verfolgt werden können. Dazu entwickelten und evaluierten wir eine skalierbare Methodik, um IoT-Geräte mit Hilfe von eingeschränkten gesampelten Daten des ISPs und IXPs präzise erkennen und beobachten können. Als Nächstes führen wir eine Fallstudie durch, in der wir messen, wie eine Unzahl von eingesetzten Geräten die Privatsphäre von ISP-Nutzern beeinflussen kann. Lei- der fanden wir heraus, dass die Privatsphäre eines substantiellen Teils von IPv6- Endnutzern bedroht ist. Wir entdeckten, dass bereits ein einzelnes Gerät im Haus, welches seine MAC-Adresse in die IPv6-Adresse kodiert, als Tracking-Identifikator für das gesamte Endnutzer-Präfix missbraucht werden kann — auch wenn andere Geräte IPv6-Privacy-Extensions verwenden. Unsere Ergebnisse zeigten, dass IoT-Geräte den Großteil dieses Privatsphäre-Verlusts verursachen. Abschließend fokussieren wir uns auf die Backend-Server-Infrastruktur und wir schla- gen eine Methodik zur Identifizierung und Lokalisierung von IoT-Backend-Servern vor, welche von Cloud-Diensten und IoT-Herstellern betrieben wird. Wir analysier- ten Muster im IoT-Verkehr, der vom ISP beobachtet wird. Unsere Analyse gibt Auf- schluss über die unterschiedlichen Strategien, wie IoT-Backend-Server betrieben und eingesetzt werden. Die Notwendigkeit a-priori unbekannte netzwerkweite Anfragen an große Mengen von Netzwerk-Flow-Daten zu stellen, welche wir in in unseren Studien verwenden, moti- vierte uns zur Entwicklung von Flowyager. Dies ist ein auf bestehenden Netzwerkverkehrs- Tools aufbauendes System und es stützt sich auf die Zusammenfassung von Verkehrs- flüssen, um (i) die Kosten für Archivierung und Transfer von Flow-Daten und (ii) die Antwortzeit von Anfragen zu reduzieren. Wir setzten einen Prototypen von Flowyager sowohl im IXP als auch im ISP ein