7 research outputs found
Beyond Counting: New Perspectives on the Active IPv4 Address Space
In this study, we report on techniques and analyses that enable us to capture
Internet-wide activity at individual IP address-level granularity by relying on
server logs of a large commercial content delivery network (CDN) that serves
close to 3 trillion HTTP requests on a daily basis. Across the whole of 2015,
these logs recorded client activity involving 1.2 billion unique IPv4
addresses, the highest ever measured, in agreement with recent estimates.
Monthly client IPv4 address counts showed constant growth for years prior, but
since 2014, the IPv4 count has stagnated while IPv6 counts have grown. Thus, it
seems we have entered an era marked by increased complexity, one in which the
sole enumeration of active IPv4 addresses is of little use to characterize
recent growth of the Internet as a whole.
With this observation in mind, we consider new points of view in the study of
global IPv4 address activity. Our analysis shows significant churn in active
IPv4 addresses: the set of active IPv4 addresses varies by as much as 25% over
the course of a year. Second, by looking across the active addresses in a
prefix, we are able to identify and attribute activity patterns to network
restructurings, user behaviors, and, in particular, various address assignment
practices. Third, by combining spatio-temporal measures of address utilization
with measures of traffic volume, and sampling-based estimates of relative host
counts, we present novel perspectives on worldwide IPv4 address activity,
including empirical observation of under-utilization in some areas, and
complete utilization, or exhaustion, in others.Comment: in Proceedings of ACM IMC 201
Measuring the Internet during Covid-19 to Evaluate Work-from-Home
The Covid-19 pandemic has radically changed our lives. Under different
circumstances, people react to it in various ways. One way is to work-from-home
since lockdown has been announced in many regions around the world. For some
places, however, we don't know if people really work from home due to the lack
of information. Since there are lots of uncertainties, it would be helpful for
us to understand what really happen in these places if we can detect the
reaction to the Covid-19 pandemic. Working from home indicates that people have
changed the way they interact with the Internet. People used to access the
Internet in the company or at school during the day. Now it is more likely that
they access the Internet at home in the daytime. Therefore, the network usage
changes in one place can be used to indicate if people in this place actually
work from home. In this work, we reuse and analyze Trinocular outages data
(around 5.1M responsive /24 blocks) over 6 months to find network usage changes
by a new designed algorithm. We apply the algorithm to sets of /24 blocks in
several cities and compare the detected network usage changes with real world
covid-19 events to verify if the algorithm can capture the changes reacting to
the Covid-19 pandemic. By applying the algorithm to all measurable /24 blocks
to detect network usages changes, we conclude that network usage can be an
indicator of the reaction to the Covid-19 pandemic
The End of the Canonical IoT Botnet: A Measurement Study of Mirai's Descendants
Since the burgeoning days of IoT, Mirai has been established as the canonical
IoT botnet. Not long after the public release of its code, researchers found
many Mirai variants compete with one another for many of the same vulnerable
hosts. Over time, the myriad Mirai variants evolved to incorporate unique
vulnerabilities, defenses, and regional concentrations. In this paper, we ask:
have Mirai variants evolved to the point that they are fundamentally distinct?
We answer this question by measuring two of the most popular Mirai descendants:
Hajime and Mozi. To actively scan both botnets simultaneously, we developed a
robust measurement infrastructure, BMS, and ran it for more than eight months.
The resulting datasets show that these two popular botnets have diverged in
their evolutions from their common ancestor in multiple ways: they have
virtually no overlapping IP addresses, they exhibit different behavior to
network events such as diurnal rate limiting in China, and more. Collectively,
our results show that there is no longer one canonical IoT botnet. We discuss
the implications of this finding for researchers and practitioners
Recommended from our members
Efficient Latent Semantic Extraction from Cross Domain Data with Declarative Language
With large amounts of data continuously generated by intelligence devices, efficient analysis of huge data collections to unearth valuable insights has become one of the most elusive challenges for both academia and industry. The key elements to establishing a scalable analyzing framework should involve (1) an intuitive interface to describe the desired outcome, (2) a well-crafted model that integrates all available information sources to derive the optimal outcome and (3) an efficient algorithm that performs the data integration and extraction within a reasonable amount of time. In this dissertation, we address these challenges by proposing (1) a cross-language interface for a succinct expression of recursive queries, (2) a domain specific neural network model that can incorporate information of multiple modalities, and (3) a sample efficient training method that can be used even for extremely-large output-class classifiers. Our contributions in this thesis are thus threefold: First, for the ubiquitous recursive queries in advanced data analytics, on top of BigDatalog and Apache Spark, we design a succinct and expressive analytics tool encapsulating the functionality and classical algorithms of Datalog, a quintessential logic programming language. We provide the Logical Library (LLib), a Spark MLlib-like high-level API supporting a wide range of recursive algorithms and the Logical DataFrame (LFrame), an extension to Spark DataFrame supporting both relational and logical operations. The LLib and LFrame enable smooth collaborations between logical applications and other Spark libraries and cross-language logical programming in Scala, Java, or Python. Second, we utilize variants of recurrent neural network (RNN) to incorporate some enlightening sequential information overlooked by the conventional works in two different domains including Spoken Language Understanding (SLU) and Internet Embedding (IE). In SLU, we address the problem caused by solely relying on the first best interpretation (hypothesis) of an audio command through a series of new architectures comprising bidirectional LSTM and pooling layers to jointly utilize the other hypotheses' texts or embedding vectors, which are neglected but with valuable information missed by the first best hypothesis. In IE, we propose the DIP, an extension of RNN, to build up the internet coordinate system with the IP address sequences, which are also unnoticed in conventional distance-based internet embedding algorithms but encode structural information of the network. Both DIP and the integration of all hypotheses bring significant performance improvements for the corresponding downstream tasks. Finally, we investigate the training algorithm for multi-class classifiers with a large output-class size, which is common in deep neural networks and typically implemented as a softmax final layer with one output neuron per each class. To avoid expensive computing the intractable normalizing constant of softmax for each training data point, we analyze the well-known negative sampling and improve it to the amplified negative sampling algorithm, which gains much higher performance with lower training cost
Analyzing Internet reliability remotely with probing-based techniques
Internet reliability for home users is increasingly important as a variety of services that we use migrate to the Internet. Yet, we lack authoritative measures of residential Internet reliability. Measuring reliability requires the detection of Internet outage events experienced by home users. But residential Internet outages are rare events. Further, they can affect relatively few users. Thus, detecting residential Internet outages requires broad and longitudinal measurements of individual users' Internet connections. However, such measurements of Internet reliability are challenging to obtain accurately and at scale.
Probing-based remote outage detection techniques can scale but their accuracy is questionable. These techniques detect Internet outages across time as well as across the IPv4 address space by sending active probes, such as pings and traceroutes, to users' IP addresses and use probe responses to infer Internet connectivity. However, they can infer false outages since their foundational assumption can sometimes be invalid: that the lack of response to an active probe is indicative of failure. In this dissertation, I show how to use probing-based techniques to measure residential Internet reliability by defending the following thesis: It is possible to remotely and accurately detect substantial outages experienced by any device with a stable public IP address that typically responds to active probes and use these outages to compare reliability across ISPs, media-types, geographical areas, and weather conditions.
In the first part of the dissertation, I address the inaccuracy of probing-based techniques' detected outages and show how to use probe responses to correctly detect outages. I illustrate two scenarios where the lack of response to an active probe is not indicative of failure. In the first scenario, responses are delayed beyond the prober's timeout, leading these techniques to infer packet-loss instead of delay. In the second scenario, these techniques can falsely infer packet-loss when the address they are probing gets dynamically reassigned. I examine how often delayed responses and dynamic reassignment occur across ISPs to quantify the inaccuracy of these techniques. I show how outages can be inferred correctly even in networks with dynamic reassignment using complementary datasets that can reveal whether an address was dynamically reassigned before, during, and after a detected outage for that address.
In the second part of the dissertation, I motivate why the detection of individual addresses' outages is necessary for analyzing residential reliability. An individual address typically represents one residential customer; therefore, detecting outages for individual addresses can allow capturing even small outages. Prior probing-based techniques focus upon the detection of edge network outages affecting a substantial set of addresses belonging to a BGP prefix or to a /24 address block. Here, I quantitatively demonstrate the extent to which prior techniques can miss residential outages. I show that even individual address outages occur rarely in most networks. When multiple simultaneous outages of related individual addresses occur, there is likely a common underlying cause. With this insight, I develop and evaluate an approach to find outage events that are statistically unlikely to have occurred independently. I show that the majority of such events do not affect entire /24 address blocks or BGP prefixes, and are therefore not likely to be detected by existing techniques which look for outages at these granularities.
In the final part of the dissertation, I show how to use individual addresses' outages detected by probing-based techniques to assess Internet reliability across media-types, geographical areas, and weather conditions. Individual outages are not direct measures of reliability: they can occur independently because users disable equipment or can be observed falsely due to dynamic address renumbering. I use the insight that the statistical change in outage rate in different challenging environments (e.g., thunderstorm) can quantitatively expose actual outage “inflation”. I show how to study the effect of challenging environments upon the reliability of a group of addresses by analyzing the inflation in outage rate for that group during its presence.
This dissertation's contributions will help achieve comprehensive measurements of Internet reliability that can be used to identify vulnerable networks and their challenges, inform which enhancements can help networks improve reliability, and evaluate the efficacy of deployed enhancements over time