1,524 research outputs found
HLOC: Hints-Based Geolocation Leveraging Multiple Measurement Frameworks
Geographically locating an IP address is of interest for many purposes. There
are two major ways to obtain the location of an IP address: querying commercial
databases or conducting latency measurements. For structural Internet nodes,
such as routers, commercial databases are limited by low accuracy, while
current measurement-based approaches overwhelm users with setup overhead and
scalability issues. In this work we present our system HLOC, aiming to combine
the ease of database use with the accuracy of latency measurements. We evaluate
HLOC on a comprehensive router data set of 1.4M IPv4 and 183k IPv6 routers.
HLOC first extracts location hints from rDNS names, and then conducts
multi-tier latency measurements. Configuration complexity is minimized by using
publicly available large-scale measurement frameworks such as RIPE Atlas. Using
this measurement, we can confirm or disprove the location hints found in domain
names. We publicly release HLOC's ready-to-use source code, enabling
researchers to easily increase geolocation accuracy with minimum overhead.Comment: As published in TMA'17 conference:
http://tma.ifip.org/main-conference
Measuring the Relationships between Internet Geography and RTT
When designing distributed systems and Internet protocols, designers can benefit from statistical models of the Internet that can be used to estimate their performance. However, it is frequently impossible for these models to include every property of interest. In these cases, model builders have to select a reduced subset of network properties, and the rest will have to be estimated from those available. In this paper we present a technique for the analysis of Internet round trip times (RTT) and its relationship with other geographic and network properties. This technique is applied on a novel dataset comprising ∼19 million RTT measurements derived from ∼200 million RTT samples between ∼54 thousand DNS servers. Our main contribution is an information-theoretical analysis that allows us to determine the amount of information that a given subset of geographic or network variables (such as RTT or great circle distance between geolocated hosts) gives about other variables of interest. We then provide bounds on the error that can be expected when using statistical estimators for the variables of interest based on subsets of other variables
Passport: Enabling Accurate Country-Level Router Geolocation using Inaccurate Sources
When does Internet traffic cross international borders? This question has
major geopolitical, legal and social implications and is surprisingly difficult
to answer. A critical stumbling block is a dearth of tools that accurately map
routers traversed by Internet traffic to the countries in which they are
located. This paper presents Passport: a new approach for efficient, accurate
country-level router geolocation and a system that implements it. Passport
provides location predictions with limited active measurements, using machine
learning to combine information from IP geolocation databases, router
hostnames, whois records, and ping measurements. We show that Passport
substantially outperforms existing techniques, and identify cases where paths
traverse countries with implications for security, privacy, and performance
Passport: enabling accurate country-level router geolocation using inaccurate sources
When does Internet traffic cross international borders? This question has major geopolitical, legal and social implications and is surprisingly difficult to answer. A critical stumbling block is a dearth of tools that accurately map routers traversed by Internet traffic to the countries in which they are located. This paper presents Passport: a new approach for efficient, accurate country-level router geolocation and a system that implements it. Passport provides location predictions with limited active measurements, using machine learning to combine information from IP geolocation databases, router hostnames, whois records, and ping measurements. We show that Passport substantially outperforms existing techniques, and identify cases where paths traverse countries with implications for security, privacy, and performance.First author draf
Smartphone-based geolocation of Internet hosts
The location of Internet hosts is frequently used in distributed applications and networking services. Examples include customized advertising, distribution of content, and position-based security. Unfortunately the relationship between an IP address and its position is in general very weak. This motivates the study of measurement-based IP geolocation techniques, where the position of the target host is actively estimated using the delays between a number of landmarks and the target itself. This paper discusses an IP geolocation method based on crowdsourcing where the smartphones of users operate as landmarks. Since smartphones rely on wireless connections, a specific delay-distance model was derived to capture the characteristics of this novel operating scenario
Beyond Counting: New Perspectives on the Active IPv4 Address Space
In this study, we report on techniques and analyses that enable us to capture
Internet-wide activity at individual IP address-level granularity by relying on
server logs of a large commercial content delivery network (CDN) that serves
close to 3 trillion HTTP requests on a daily basis. Across the whole of 2015,
these logs recorded client activity involving 1.2 billion unique IPv4
addresses, the highest ever measured, in agreement with recent estimates.
Monthly client IPv4 address counts showed constant growth for years prior, but
since 2014, the IPv4 count has stagnated while IPv6 counts have grown. Thus, it
seems we have entered an era marked by increased complexity, one in which the
sole enumeration of active IPv4 addresses is of little use to characterize
recent growth of the Internet as a whole.
With this observation in mind, we consider new points of view in the study of
global IPv4 address activity. Our analysis shows significant churn in active
IPv4 addresses: the set of active IPv4 addresses varies by as much as 25% over
the course of a year. Second, by looking across the active addresses in a
prefix, we are able to identify and attribute activity patterns to network
restructurings, user behaviors, and, in particular, various address assignment
practices. Third, by combining spatio-temporal measures of address utilization
with measures of traffic volume, and sampling-based estimates of relative host
counts, we present novel perspectives on worldwide IPv4 address activity,
including empirical observation of under-utilization in some areas, and
complete utilization, or exhaustion, in others.Comment: in Proceedings of ACM IMC 201
Teleoperation of passivity-based model reference robust control over the internet
This dissertation offers a survey of a known theoretical approach and novel experimental results in establishing a live communication medium through the internet to host a virtual communication environment for use in Passivity-Based Model Reference Robust Control systems with delays. The controller which is used as a carrier to support a robust communication between input-to-state stability is designed as a control strategy that passively compensates for position errors that arise during contact tasks and strives to achieve delay-independent stability for controlling of aircrafts or other mobile objects. Furthermore the controller is used for nonlinear systems, coordination of multiple agents, bilateral teleoperation, and collision avoidance thus maintaining a communication link with an upper bound of constant delay is crucial for robustness and stability of the overall system. For utilizing such framework an elucidation can be formulated by preparing site survey for analyzing not only the geographical distances separating the nodes in which the teleoperation will occur but also the communication parameters that define the virtual topography that the data will travel through. This survey will first define the feasibility of the overall operation since the teleoperation will be used to sustain a delay based controller over the internet thus obtaining a hypothetical upper bound for the delay via site survey is crucial not only for the communication system but also the delay is required for the design of the passivity-based model reference robust control. Following delay calculation and measurement via site survey, bandwidth tests for unidirectional and bidirectional communication is inspected to ensure that the speed is viable to maintain a real-time connection. Furthermore from obtaining the results it becomes crucial to measure the consistency of the delay throughout a sampled period to guarantee that the upper bound is not breached at any point within the communication to jeopardize the robustness of the controller. Following delay analysis a geographical and topological overview of the communication is also briefly examined via a trace-route to understand the underlying nodes and their contribution to the delay and round-trip consistency. To accommodate the communication channel for the controller the input and output data from both nodes need to be encapsulated within a transmission control protocol via a multithreaded design of a robust program within the C language. The program will construct a multithreaded client-server relationship in which the control data is transmitted. For added stability and higher level of security the channel is then encapsulated via an internet protocol security by utilizing a protocol suite for protecting the communication by authentication and encrypting each packet of the session using negotiation of cryptographic keys during each session
Confounds and Consequences in Geotagged Twitter Data
Twitter is often used in quantitative studies that identify
geographically-preferred topics, writing styles, and entities. These studies
rely on either GPS coordinates attached to individual messages, or on the
user-supplied location field in each profile. In this paper, we compare these
data acquisition techniques and quantify the biases that they introduce; we
also measure their effects on linguistic analysis and text-based geolocation.
GPS-tagging and self-reported locations yield measurably different corpora, and
these linguistic differences are partially attributable to differences in
dataset composition by age and gender. Using a latent variable model to induce
age and gender, we show how these demographic variables interact with geography
to affect language use. We also show that the accuracy of text-based
geolocation varies with population demographics, giving the best results for
men above the age of 40.Comment: final version for EMNLP 201
- …