100 research outputs found

    HLOC: Hints-Based Geolocation Leveraging Multiple Measurement Frameworks

    Full text link
    Geographically locating an IP address is of interest for many purposes. There are two major ways to obtain the location of an IP address: querying commercial databases or conducting latency measurements. For structural Internet nodes, such as routers, commercial databases are limited by low accuracy, while current measurement-based approaches overwhelm users with setup overhead and scalability issues. In this work we present our system HLOC, aiming to combine the ease of database use with the accuracy of latency measurements. We evaluate HLOC on a comprehensive router data set of 1.4M IPv4 and 183k IPv6 routers. HLOC first extracts location hints from rDNS names, and then conducts multi-tier latency measurements. Configuration complexity is minimized by using publicly available large-scale measurement frameworks such as RIPE Atlas. Using this measurement, we can confirm or disprove the location hints found in domain names. We publicly release HLOC's ready-to-use source code, enabling researchers to easily increase geolocation accuracy with minimum overhead.Comment: As published in TMA'17 conference: http://tma.ifip.org/main-conference

    Queries to Google Search as Predictors of Migration Flows from Latin America to Spain

    Get PDF
    This study evaluates the relationship between the changes in proportion of migration-related queries reported by Google Trends and changes in volume of migration flows between origin and destination countries. The study assesses if cost-free Google Trends improves the prediction of international migratory flows, and whether it could be proposed as a tool for organizations and policymakers. Previous research has used the activity of email users and other online services to track human mobility. At the same time, IP geolocation linked to Google Search has proven to be efficient in geographically tracking outbreaks of illnesses, as well as predicting changes in economic indicators and travel patterns. This research draws from both experiences. It uses a regression analysis of time series data to compare the popularity of migration related queries introduced to Google Search in Colombia, Argentina and Peru, to changes in a quantity of residents’ registrations in Spain, performed by immigrants proceeding from these countries between the years 2005 and 2010. The results show a significant correlation and weak to moderate predictability for the lags of several months depending on the particular country. The findings demonstrate that trends in queries to Google Search provided by Google Trends might constitute a useful predictor of migration flows. At the same time, it indicates the need for further technological developments to improve analytical capacities

    Factor analysis of Internet traffic destinations from similar source networks

    Full text link
    This article is (©) Emerald Group Publishing and permission has been granted for this version to appear here (http://www.emeraldinsight.com/doi/full/10.1108/10662241211199951). Emerald does not grant permission for this article to be further copied/distributed or hosted elsewhere without the express permission from Emerald Group Publishing Limited.Purpose – This study aims to assess whether similar user populations in the Internet produce similar geographical traffic destination patterns on a per-country basis. Design/methodology/approach – We have collected a country-wide NetFlow trace, which encompasses the whole Spanish academic network, which comprises more than 350 institutions and one million users, during four months. Such trace comprises several similar campus networks in terms of population size and structure. To compare their behaviors, we propose a mixture model, which is primarily based on the Zipf-Mandelbrot power law to capture the heavy-tailed nature of the per-country traffic distribution. Then, factor analysis is performed to understand the relation between the response variable, number of bytes or packets per day, with dependent variables such as the source IP network, traffic direction, and country. Findings – Surprisingly, the results show that the geographical distribution is strongly dependent on the source IP network. Furthermore, even though there are thousands of users in a typical campus network, it turns out that the aggregation level which is required to observe a stable geographical pattern is even larger. Consequently, our results show a slow convergence rate to the domain of attraction of the model, specifically, we have found that at least 35 days worth of data are necessary to reach stability of the model’s estimated parameters. Practical implications – Based on these findings, conclusions drawn for one network cannot be directly extrapolated to different ones. Therefore, ISPs’ traffic measurement campaigns should include an extensive set of networks to cope with the space diversity, and also encompass a significant period of time due to the large transient time. Originality/value – Current state of the art includes some analysis of geographical patterns, but not comparisons between networks with similar populations. Such comparison can be useful for the design of Content Distribution Networks and the cost-optimization of peering agreements.This work has been partially funded by the Spanish Ministry of Education and Science under project ANFORA (TEC2009-13385), European Union CELTIC initiative program under project TRAMMS, European Union project OneLab, and the F.P.U. and F.P.I. Research Fellowship programs of Spain. The authors would also like to thank the anonymous reviewers who helped us to improve the quality of the paper

    A deep dive into the accuracy of IP geolocation databases and its impact on online advertising

    Get PDF
    The quest for every time more personalized Internet experience relies on the enriched contextual information about each user. Online advertising also follows this approach. Among the context information that advertising stakeholders leverage, location information is certainly one of them. However, when this information is not directly available from the end users, advertising stakeholders infer it using geolocation databases, matching IP addresses to a position on earth. The accuracy of this approach has often been questioned in the past: however, the reality check on an advertising stakeholder shows that this technique accounts for a large fraction of the served advertisements. In this paper, we revisit the work in the field, that is mostly from almost one decade ago, through the lenses of big data. More specifically, we, i) benchmark two commercial Internet geolocation databases, evaluate the quality of their information using a ground-truth database of user positions containing over 2 billion samples, ii) analyze the internals of these databases, devising a theoretical upper bound for the quality of the Internet geolocation approach, and iii) we run an empirical study that unveils the monetary impact of this technology by considering the costs associated with a real-world ad impressions dataset.This work was supported in part by European Union's Horizon 2020 innovation action programme under the PIMCITY Project under Grant 871370, in part by TESTABLE Project under Grant 101019206, in part by Agencia Estatal de Investigacion (AEI) under the ACHILLES Project under Grants PID2019-104207RB-I00/AEI/10.13039/501100011033, in part by the Spanish Ministry of Economic Affairs and Digital Transformation and European Union-Next GenerationEU through the UNICO 5G I+D 6G-RIEMANN-FR, in part by the agreement between the Community of Madrid and the Universidad Carlos III de Madrid for the funding of research projects on SARS-CoV-2 and COVID-19 disease, through project name Multi-source and multi-method prediction to support COVID-19 policy decision making, which was supported with REACT-EU funds from the European regional development fund a way of making Europe, and in part by TAPTAP-UC3M Chair in advanced AI and Data Science applied to advertising and marketing

    On the network geography of the Internet

    Full text link
    Abstract—The geographic layout of the physical Internet inherently determines important network properties and traffic characteristics. To give insight into the geography of the Internet, we examine the spatial properties of the topology and routing. To represent the network we conducted a geographically dispersed traceroute campaign, and embedded the extracted topology into the geographic space by applying a novel IP geolocalization service, called Spotter. In this paper we present the frequency analysis of link lengths, quantify path circuitousness and explore the symmetry of end-to-end Internet routes. I

    The queries to Google Search as predictors of migration flows from Latin America to Spain

    Get PDF
    Recently, the development of global network and ITC technology provided new opportunities to improve the estimations and predictability of migration flows. The activity of users of e-mail and other web-based services was compared in time and space in order to track international human mobility. At the same time, the IP based geolocation linked to Google Search proved to be efficient in geographically tracking the outbreaks of several illnesses, and also in predicting changes in economic indicators and travel patterns. This research draws from both experiences. It compares the popularity of migration-to-Spain related queries introduced to Google Search in Argentina, Colombia and Peru, to changes in a quantity of residents’ registrations in Spain, performed by immigrants proceeding from these countries between the years 2005 and 2010. Following the preliminary visual trend analysis, the time series are pre-whitened in order to formally test for a time-shifted correlation and predictability not-influenced by a general series trend. The analysis was performed on the datasets of queries popularity derived from Google Trends and anonymized micro-data of Residential Variation Statistics based on the Municipal Register of Spain. The predicted lags of one or more months that showed to be significantly correlated according to the Cross-Correlation Function have been further used to evaluate its predictability with regression analysis. The results show a significant correlation and weak to moderate predictability for the lags of several months depending on the particular country. The findings support the assumption that popularity of queries to Google Search provided by Google Trends might constitute a useful predictor of migration flows while at the same time it indicates further developments necessary in order to improve its analytical capacities

    Una tecnica per la geolocalizzazione di host Internet basata su crowdsourcing e dispositivi mobili

    Get PDF
    Il rilevamento della posizione geografica di un host Internet è un argomento di studio particolarmente interessante per le molteplici applicazioni, sia in ambito di ricerca che in ambito commerciale. Negli studi condotti fino ad oggi sono stati utilizzati sistemi dedicati per il sondaggio e il rilevamento della posizione degli host, ottenendo buoni risultati nelle regioni del mondo dove questi sistemi sono presenti (tipicamente Europa e Nord America). In questa tesi viene presentato un nuovo approccio al problema basato sull'utilizzo di dispositivi mobili connessi a reti wireless, utilizzando i dati raccolti in crowdsourcing dal progetto Portolan. Analizzando i dati collezionati in due anni di attività è stato elaborato un metodo di geolocalizzazione che tenesse conto delle limitazioni imposte dall'uso di dispositivi mobili. L'obiettivo è disporre di una quantità di sonde superiore di vari ordini di grandezza rispetto ai precedenti metodi, specialmente nelle zone del mondo dove le reti di ricerca non sono ancora sviluppate
    corecore