5 research outputs found

    IMPROVING ACCURACY AND EFFICIENCY OF NETWORK MEASUREMENT BY IDENTIFYING HOMOGENEOUS IPV4 ADDRESSES

    Get PDF
    Active Internet measurement relies on responses to active probes such as ICMP Echo Request or TCP SYN messages. Active Internet measurement is very useful in that it enables researchers to measure the Internet without privileged data from ISPs. Researchers use active measurement to study Internet topology, route dynamics and link bandwidth by sending many packets through selected links, and measure RTTs and reliability through probing many addresses. A fundamental challenge in active measurement design is in allocating and limiting measurement traffic by carefully choosing where measurements are sent and how many samples are taken per measurement. It is important to minimize measurement loads because heavy measurement traffic may appear malicious. If network operators consider measurement traffic as attacks, then they can blacklist the sources of measurement traffic and thus affect the completeness and accuracy of the measurement. Another challenge of active measurement is that biases can occur due to no responses from or biased selection of destinations. Biases can cause misleading conclusions and thus should be minimized. In this dissertation, I develop a general approach to reducing measurement loads and biases of active Internet measurement based on the insight that they can be reduced by letting Internet addresses represent larger aggregates. I first develop a technique that identifies and aggregates topologically proximate addresses. The technique called Hobbit compares traceroute results to measure topological proximity. Hobbit deals with load-balanced paths that can cause incorrect inferences of topological proximity by distinguishing between route differences due to load balancing and due to distinct route entries. Hobbit also makes a unique contribution that it can aggregate even discontiguous addresses. This contribution is important in that fragmented allocations of IPv4 addresses are common in the Internet. I apply Hobbit to IPv4 addresses and identify 0.51M aggregates of addresses (i.e. Hobbit blocks) that contain 1.77M /24 blocks. I evaluate the homogeneity of Hobbit blocks using RTTs and show that Hobbit blocks are as homogeneous as /24s even though their sizes are generally larger than /24s. I then demonstrate that Hobbit blocks improve the efficiency of Internet topology mapping by comparing strategies that select destinations from Hobbit and /24 blocks. I also quantify the efficiency improvement of latency estimation that can be achieved by using Hobbit blocks. I show that Hobbit blocks tend to be stable over time and analyze the measurement cost of Hobbit block generation. I finally demonstrate that Hobbit blocks can improve the representativeness of network measurement. I develop a methodology that measures the representativeness of measurement and show that active Internet measurement may not be representative even if the entire IPv4 space is probed. By using Hobbit blocks, I adapt weighting adjustment, which is a common bias correction technique in surveys, to active Internet measurement. I evaluate the weighting adjustment using various kinds of samples and show that the weighting adjustment reduces biases in most cases. If Hobbit blocks are given, the weighting adjustment incurs no measurement cost. I make Hobbit blocks publicly available and update them every month for researchers who want to perform weighting adjustment or to improve the efficiency of network measurement

    In the IP of the Beholder: Strategies for Active IPv6 Topology Discovery

    Get PDF
    Existing methods for active topology discovery within the IPv6 Internet largely mirror those of IPv4. In light of the large and sparsely populated address space, in conjunction with aggressive ICMPv6 rate limiting by routers, this work develops a different approach to Internet-wide IPv6 topology mapping. We adopt randomized probing techniques in order to distribute probing load, minimize the effects of rate limiting, and probe at higher rates. Second, we extensively analyze the efficiency and efficacy of various IPv6 hitlists and target generation methods when used for topology discovery, and synthesize new target lists based on our empirical results to provide both breadth (coverage across networks) and depth (to find potential subnetting). Employing our probing strategy, we discover more than 1.3M IPv6 router interface addresses from a single vantage point. Finally, we share our prober implementation, synthesized target lists, and discovered IPv6 topology results

    Longitudinal Study of an IP Geolocation Database

    Full text link
    IP geolocation - the process of mapping network identifiers to physical locations - has myriad applications. We examine a large collection of snapshots from a popular geolocation database and take a first look at its longitudinal properties. We define metrics of IP geo-persistence, prevalence, coverage, and movement, and analyse 10 years of geolocation data at different location granularities. Across different classes of IP addresses, we find that significant location differences can exist even between successive instances of the database - a previously underappreciated source of potential error when using geolocation data: 47% of end users IP addresses move by more than 40 km in 2019. To assess the sensitivity of research results to the instance of the geo database, we reproduce prior research that depended on geolocation lookups. In this case study, which analyses geolocation database performance on routers, we demonstrate impact of these temporal effects: median distance from ground truth shifted from 167 km to 40 km when using a two months apart snapshot. Based on our findings, we make recommendations for best practices when using geolocation databases in order to best encourage reproducibility and sound measurement.Comment: Technical Report related to a paper appeared in Network Traffic Measurement and Analysis Conference (TMA 2021

    Evaluating and Improving Internet Load Balancing with Large-Scale Latency Measurements

    Full text link
    Load balancing is used in the Internet to distribute load across resources at different levels, from global load balancing that distributes client requests across servers at the Internet level to path-level load balancing that balances traffic across load-balanced paths. These load balancing algorithms generally work under certain assumptions on performance similarity. Specifically, global load balancing divides the Internet address space into client aggregations and assumes that clients in the same aggregation have similar performance to the same server; load-balanced paths are generally selected for load balancing as if they have similar performance. However, as performance similarity is typically achieved with similarity in path properties, e.g., topology and hop count, which do not necessarily lead to similar performance, performance between clients in the same aggregation and between load-balanced paths could differ significantly. This dissertation evaluates and improves global and path-level load balancing in terms of performance similarity. We achieve this with large-scale latency measurements, which not only allow us to systematically identify and evaluate the performance issues of Internet load balancing at scale, but also enable us to develop data-driven approaches to improve the performance. Specifically, this dissertation consists of three parts. First, we study the issues of existing client aggregations for global load balancing and then design AP-atoms, a data-driven client aggregation learned from passive large-scale latency measurements. Second, we show that the latency imbalance between load-balanced paths, previously deemed insignificant, is now both significant and prevalent. We present Flipr, a network prober that actively collects large-scale latency measurements to characterize the latency imbalance issue. Lastly, we design another network prober, Congi, that can detect congestion at scale and use Congi to study the congestion imbalance problem at scale. For both latency and congestion imbalance, we demonstrate that they could greatly affect the performance of various applications.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168012/1/yibo_1.pd
    corecore