16 research outputs found

    Rusty Clusters? Dusting an IPv6 Research Foundation

    Get PDF
    The long-running IPv6 Hitlist service is an important foundation for IPv6 measurement studies. It helps to overcome infeasible, complete address space scans by collecting valuable, unbiased IPv6 address candidates and regularly testing their responsiveness. However, the Internet itself is a quickly changing ecosystem that can affect longrunning services, potentially inducing biases and obscurities into ongoing data collection means. Frequent analyses but also updates are necessary to enable a valuable service to the community. In this paper, we show that the existing hitlist is highly impacted by the Great Firewall of China, and we offer a cleaned view on the development of responsive addresses. While the accumulated input shows an increasing bias towards some networks, the cleaned set of responsive addresses is well distributed and shows a steady increase. Although it is a best practice to remove aliased prefixes from IPv6 hitlists, we show that this also removes major content delivery networks. More than 98% of all IPv6 addresses announced by Fastly were labeled as aliased and Cloudflare prefixes hosting more than 10M domains were excluded. Depending on the hitlist usage, e.g., higher layer protocol scans, inclusion of addresses from these providers can be valuable. Lastly, we evaluate different new address candidate sources, including target generation algorithms to improve the coverage of the current IPv6 Hitlist. We show that a combination of different methodologies is able to identify 5.6M new, responsive addresses. This accounts for an increase by 174% and combined with the current IPv6 Hitlist, we identify 8.8M responsive addresses

    Machine Learning and Big Data Methodologies for Network Traffic Monitoring

    Get PDF
    Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

    Calibration of DART Radiative Transfer Model with Satellite Images for Simulating Albedo and Thermal Irradiance Images and 3D Radiative Budget of Urban Environment

    Get PDF
    Remote sensing is increasingly used for managing urban environment. In this context, the H2020 project URBANFLUXES aims to improve our knowledge on urban anthropogenic heat fluxes, with the specific study of three cities: London, Basel and Heraklion. Usually, one expects to derive directly 2 major urban parameters from remote sensing: the albedo and thermal irradiance. However, the determination of these two parameters is seriously hampered by complexity of urban architecture. For example, urban reflectance and brightness temperature are far from isotropic and are spatially heterogeneous. Hence, radiative transfer models that consider the complexity of urban architecture when simulating remote sensing signals are essential tools. Even for these sophisticated models, there is a major constraint for an operational use of remote sensing: the complex 3D distribution of optical properties and temperatures in urban environments. Here, the work is conducted with the DART (Discrete Anisotropic Radiative Transfer) model. It is a comprehensive physically based 3D radiative transfer model that simulates optical signals at the entrance of imaging spectro-radiometers and LiDAR scanners on board of satellites and airplanes, as well as the 3D radiative budget, of urban and natural landscapes for any experimental (atmosphere, topography,…) and instrumental (sensor altitude, spatial resolution, UV to thermal infrared,…) configuration. Paul Sabatier University distributes free licenses for research activities. This paper presents the calibration of DART model with high spatial resolution satellite images (Landsat 8, Sentinel 2, etc.) that are acquired in the visible (VIS) / near infrared (NIR) domain and in the thermal infrared (TIR) domain. Here, the work is conducted with an atmospherically corrected Landsat 8 image and Bale city, with its urban database. The calibration approach in the VIS/IR domain encompasses 5 steps for computing the 2D distribution (image) of urban albedo at satellite spatial resolution. (1) DART simulation of satellite image at very high spatial resolution (e.g., 50cm) per satellite spectral band. Atmosphere conditions are specific to the satellite image acquisition. (2) Spatial resampling of DART image at the coarser spatial resolution of the available satellite image, per spectral band. (3) Iterative derivation of the urban surfaces (roofs, walls, streets, vegetation,…) optical properties as derived from pixel-wise comparison of DART and satellite images, independently per spectral band. (4) Computation of the band albedo image of the city, per spectral band. (5) Computation of the image of the city albedo and VIS/NIR exitance, as an integral over all satellite spectral bands. In order to get a time series of albedo and VIS/NIR exitance, even in the absence of satellite images, ECMWF information about local irradiance and atmosphere conditions are used. A similar approach is used for calculating the city thermal exitance using satellite images acquired in the thermal infrared domain. Finally, DART simulations that are conducted with the optical properties derived from remote sensing images give also the 3D radiative budget of the city at any date including the date of the satellite image acquisition
    corecore