226 research outputs found

    SeLINA: a Self-Learning Insightful Network Analyzer

    Get PDF
    Understanding the behavior of a network from a large scale traffic dataset is a challenging problem. Big data frameworks offer scalable algorithms to extract information from raw data, but often require a sophisticated fine-tuning and a detailed knowledge of machine learning algorithms. To streamline this process, we propose SeLINA (Self-Learning Insightful Network Analyzer), a generic, self-tuning, simple tool to extract knowledge from network traffic measurements. SeLINA includes different data analytics techniques providing self-learning capabilities to state-of-the-art scalable approaches, jointly with parameter auto-selection to off-load the network expert from parameter tuning. We combine both unsupervised and supervised approaches to mine data with a scalable approach. SeLINA embeds mechanisms to check if the new data fits the model, to detect possible changes in the traffic, and to, possibly automatically, trigger model rebuilding. The result is a system that offers human-readable models of the data with minimal user intervention, supporting domain experts in extracting actionable knowledge and highlighting possibly meaningful interpretations. SeLINA's current implementation runs on Apache Spark. We tested it on large collections of realworld passive network measurements from a nationwide ISP, investigating YouTube and P2P traffic. The experimental results confirmed the ability of SeLINA to provide insights and detect changes in the data that suggest further analyse

    SeLINA: a Self-Learning Insightful Network Analyzer

    Get PDF
    Understanding the behavior of a network from a large scale traffic dataset is a challenging problem. Big data frameworks offer scalable algorithms to extract information from raw data, but often require a sophisticated fine-tuning and a detailed knowledge of machine learning algorithms. To streamline this process, we propose SeLINA (Self-Learning Insightful Network Analyzer), a generic, self-tuning, simple tool to extract knowledge from network traffic measurements. SeLINA includes different data analytics techniques providing self-learning capabilities to state-of-the-art scalable approaches, jointly with parameter auto-selection to off-load the network expert from parameter tuning. We combine both unsupervised and supervised approaches to mine data with a scalable approach. SeLINA embeds mechanisms to check if the new data fits the model, to detect possible changes in the traffic, and to, possibly automatically, trigger model rebuilding. The result is a system that offers human-readable models of the data with minimal user intervention, supporting domain experts in extracting actionable knowledge and highlighting possibly meaningful interpretations. SeLINA’s current implementation runs on Apache Spark. We tested it on large collections of realworld passive network measurements from a nationwide ISP, investigating YouTube and P2P traffic. The experimental results confirmed the ability of SeLINA to provide insights and detect changes in the data that suggest further analyses

    Machine Learning and Big Data Methodologies for Network Traffic Monitoring

    Get PDF
    Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

    Simulation and data analysis of peer-to-peer traffic for live video streaming

    Get PDF
    Evaluating and testing changes or configurations to peer-to-peer systems or even understanding their behaviour can be complicated. One approach is to simulate a large peer-to-peer system and visualise the results. In this master's thesis a study is performed to understand how an actual implementation of a hybrid peer-to-peer live video streaming system behaves and performs under different scenarios. The behaviour and performance of a hybrid live video streaming system consisting of an unstructured mesh-pull-based P2P network and a classic content delivery network solution is studied by simulating the system with different scenarios such as flash crowds and flash disconnects. The simulation system includes a network model taking latency and bandwidth into consideration. As expected the mesh-based system performed well under user churn. Although the system consisted of approximately 80% free-riders the utilisation of the content distribution network was reduced by 95% on average. The data analysis was successful in improving the system's overall performance. Furthermore, the visualisations and data analysis were used to understand the system's behaviour

    Profiling and Identification of Web Applications in Computer Network

    Get PDF
    Characterising network traffic is a critical step for detecting network intrusion or misuse. The traditional way to identify the application associated with a set of traffic flows uses port number and DPI (Deep Packet Inspection), but it is affected by the use of dynamic ports and encryption. The research community proposed models for traffic classification that determined the most important requirements and recommendations for a successful approach. The suggested alternatives could be categorised into four techniques: port-based, packet payload based, host behavioural, and statistical-based. The traditional way to identifying traffic flows typically focuses on using IANA assigned port numbers and deep packet inspection (DPI). However, an increasing number of Internet applications nowadays that frequently use dynamic post assignments and encryption data traffic render these techniques in achieving real-time traffic identification. In recent years, two other techniques have been introduced, focusing on host behaviour and statistical methods, to avoid these limitations. The former technique is based on the idea that hosts generate different communication patterns at the transport layer; by extracting these behavioural patterns, activities and applications can be classified. However, it cannot correctly identify the application names, classifying both Yahoo and Gmail as email. Thereby, studies have focused on using statistical features approach for identifying traffic associated with applications based on machine learning algorithms. This method relies on characteristics of IP flows, minimising the overhead limitations associated with other schemes. Classification accuracy of statistical flow-based approaches, however, depends on the discrimination ability of the traffic features used. NetFlow represents the de-facto standard in monitoring and analysing network traffic, but the information it provides is not enough to describe the application behaviour. The primary challenge is to describe the activity within entirely and among network flows to understand application usage and user behaviour. This thesis proposes novel features to describe precisely a web application behaviour in order to segregate various user activities. Extracting the most discriminative features, which characterise web applications, is a key to gain higher accuracy without being biased by either users or network circumstances. This work investigates novel and superior features that characterize a behaviour of an application based on timing of arrival packets and flows. As part of describing the application behaviour, the research considered the on/off data transfer, defining characteristics for many typical applications, and the amount of data transferred or exchanged. Furthermore, the research considered timing and patterns for user events as part of a network application session. Using an extended set of traffic features output from traffic captures, a supervised machine learning classifier was developed. To this effect, the present work customised the popular tcptrace utility to generate classification features based on traffic burstiness and periods of inactivity for everyday Internet usage. A C5.0 decision tree classifier is applied using the proposed features for eleven different Internet applications, generated by ten users. Overall, the newly proposed features reported a significant level of accuracy (~98%) in classifying the respective applications. Afterwards, uncontrolled data collected from a real environment for a group of 20 users while accessing different applications was used to evaluate the proposed features. The evaluation tests indicated that the method has an accuracy of 87% in identifying the correct network application.Iraqi cultural Attach

    Video-on-Demand over Internet: a survey of existing systems and solutions

    Get PDF
    Video-on-Demand is a service where movies are delivered to distributed users with low delay and free interactivity. The traditional client/server architecture experiences scalability issues to provide video streaming services, so there have been many proposals of systems, mostly based on a peer-to-peer or on a hybrid server/peer-to-peer solution, to solve this issue. This work presents a survey of the currently existing or proposed systems and solutions, based upon a subset of representative systems, and defines selection criteria allowing to classify these systems. These criteria are based on common questions such as, for example, is it video-on-demand or live streaming, is the architecture based on content delivery network, peer-to-peer or both, is the delivery overlay tree-based or mesh-based, is the system push-based or pull-based, single-stream or multi-streams, does it use data coding, and how do the clients choose their peers. Representative systems are briefly described to give a summarized overview of the proposed solutions, and four ones are analyzed in details. Finally, it is attempted to evaluate the most promising solutions for future experiments. Résumé La vidéo à la demande est un service où des films sont fournis à distance aux utilisateurs avec u

    Design and Implementation of a Communication Protocol to Improve Multimedia QoS and QoE in Wireless Ad Hoc Networks

    Full text link
    [EN] This dissertation addresses the problem of multimedia delivery over multi-hop ad hoc wireless networks, and especially over wireless sensor networks. Due to their characteristics of low power consumption, low processing capacity and low memory capacity, they have major difficulties in achieving optimal quality levels demanded by end users in such communications. In the first part of this work, it has been carried out a study to determine the behavior of a variety of multimedia streams and how they are affected by the network conditions when they are transmitted over topologies formed by devices of different technologies in multi hop wireless ad hoc mode. To achieve this goal, we have performed experimental tests using a test bench, which combine the main codecs used in audio and video streaming over IP networks with different sound and video captures representing the characteristic patterns of multimedia services such as phone calls, video communications, IPTV and video on demand (VOD). With the information gathered in the laboratory, we have been able to establish the correlation between the induced changes in the physical and logical topology and the network parameters that measure the quality of service (QoS) of a multimedia transmission, such as latency, jitter or packet loss. At this stage of the investigation, a study was performed to determine the state of the art of the proposed protocols, algorithms, and practical implementations that have been explicitly developed to optimize the multimedia transmission over wireless ad hoc networks, especially in ad hoc networks using clusters of nodes distributed over a geographic area and wireless sensor networks. Next step of this research was the development of an algorithm focused on the logical organization of clusters formed by nodes capable of adapting to the circumstances of real-time traffic. The stated goal was to achieve the maximum utilization of the resources offered by the set of nodes that forms the network, allowing simultaneously sending reliably and efficiently all types of content through them, and mixing conventional IP data traffic with multimedia traffic with stringent QoS and QoE requirements. Using the information gathered in the previous phase, we have developed a network architecture that improves overall network performance and multimedia streaming. In parallel, it has been designed and programmed a communication protocol that allows implementing the proposal and testing its operation on real network infrastructures. In the last phase of this thesis we have focused our work on sending multimedia in wireless sensor networks (WSN). Based on the above results, we have adapted both the architecture and the communication protocol for this particular type of network, whose use has been growing hugely in recent years.[ES] Esta tesis doctoral aborda el problema de la distribución de contenidos multimedia a través de redes inalámbricas ad hoc multisalto, especialmente las redes inalámbricas de sensores que, debido a sus características de bajo consumo energético, baja capacidad de procesamiento y baja capacidad de memoria, plantean grandes dificultades para alcanzar los niveles de calidad óptimos que exigen los usuarios finales en dicho tipo de comunicaciones. En la primera parte de este trabajo se ha llevado a cabo un estudio para determinar el comportamiento de una gran variedad de flujos multimedia y como se ven afectados por las condiciones de la red cuando son transmitidos a través topologías formadas por dispositivos de diferentes tecnologías que se comunican en modo ad hoc multisalto inalámbrico. Para ello, se han realizado pruebas experimentales sobre una maqueta de laboratorio, combinando los principales códecs empleados en la transmisión de audio y video a través de redes IP con diversas capturas de sonido y video que representan patrones característicos de servicios multimedia tales como las llamadas telefónicas, videoconferencias, IPTV o video bajo demanda (VOD). Con la información reunida en el laboratorio se ha podido establecer la correlación entre los cambios inducidos en la topología física y lógica de la red con los parámetros que miden la calidad de servicio (QoS) de una transmisión multimedia, tales como la latencia el jitter o la pérdida de paquetes. En esta fase de la investigación se realiza un estudio para determinar el estado del arte de las propuestas de desarrollo e implementación de protocolos y algoritmos que se han generado de forma explícita para optimizar la transmisión de tráfico multimedia sobre redes ad hoc inalámbricas, especialmente en las redes inalámbricas de sensores y redes ad hoc utilizando clústeres de nodos distribuidos en un espacio geográfico. El siguiente paso en la investigación ha consistido en el desarrollo de un algoritmo propio para la organización lógica de clústeres formados por nodos capaces de adaptarse a las circunstancias del tráfico en tiempo real. El objetivo planteado es conseguir un aprovechamiento máximo de los recursos ofrecidos por el conjunto de nodos que forman la red, permitiendo de forma simultánea el envío de todo tipo de contenidos a través de ellos de forma confiable y eficiente, permitiendo la convivencia de tráfico de datos IP convencional con tráfico multimedia con requisitos exigentes de QoS y QoE. A partir de la información conseguida en la fase anterior, se ha desarrollado una arquitectura de red que mejora el rendimiento general de la red y el de las transmisiones multimedia de audio y video en particular. De forma paralela, se ha diseñado y programado un protocolo de comunicación que permite implementar el modelo y testear su funcionamiento sobre infraestructuras de red reales. En la última fase de esta tesis se ha dirigido la atención hacia la transmisión multimedia en las redes de sensores inalámbricos (WSN). Partiendo de los resultados anteriores, se ha adaptado tanto la arquitectura como el protocolo de comunicaciones para este tipo concreto de red, cuyo uso se ha extendido en los últimos años de forma considerable[CA] Esta tesi doctoral aborda el problema de la distribució de continguts multimèdia a través de xarxes sense fil ad hoc multi salt, especialment les xarxes sense fil de sensors que, a causa de les seues característiques de baix consum energètic, baixa capacitat de processament i baixa capacitat de memòria, plantegen grans dificultats per a aconseguir els nivells de qualitat òptims que exigixen els usuaris finals en eixos tipus de comunicacions. En la primera part d'este treball s'ha dut a terme un estudi per a determinar el comportament d'una gran varietat de fluxos multimèdia i com es veuen afectats per les condicions de la xarxa quan són transmesos a través topologies formades per dispositius de diferents tecnologies que es comuniquen en mode ad hoc multi salt sense fil. Per a això, s'han realitzat proves experimentals sobre una maqueta de laboratori, combinant els principals códecs empleats en la transmissió d'àudio i vídeo a través de xarxes IP amb diverses captures de so i vídeo que representen patrons característics de serveis multimèdia com son les cridades telefòniques, videoconferències, IPTV o vídeo baix demanda (VOD). Amb la informació reunida en el laboratori s'ha pogut establir la correlació entre els canvis induïts en la topologia física i lògica de la xarxa amb els paràmetres que mesuren la qualitat de servei (QoS) d'una transmissió multimèdia, com la latència el jitter o la pèrdua de paquets. En esta fase de la investigació es realitza un estudi per a determinar l'estat de l'art de les propostes de desenvolupament i implementació de protocols i algoritmes que s'han generat de forma explícita per a optimitzar la transmissió de tràfic multimèdia sobre xarxes ad hoc sense fil, especialment en les xarxes sense fil de sensors and xarxes ad hoc utilitzant clusters de nodes distribuïts en un espai geogràfic. El següent pas en la investigació ha consistit en el desenvolupament d'un algoritme propi per a l'organització lògica de clusters formats per nodes capaços d'adaptar-se a les circumstàncies del tràfic en temps real. L'objectiu plantejat és aconseguir un aprofitament màxim dels recursos oferits pel conjunt de nodes que formen la xarxa, permetent de forma simultània l'enviament de qualsevol tipus de continguts a través d'ells de forma confiable i eficient, permetent la convivència de tràfic de dades IP convencional amb tràfic multimèdia amb requisits exigents de QoS i QoE. A partir de la informació aconseguida en la fase anterior, s'ha desenvolupat una arquitectura de xarxa que millora el rendiment general de la xarxa i el de les transmissions multimèdia d'àudio i vídeo en particular. De forma paral¿lela, s'ha dissenyat i programat un protocol de comunicació que permet implementar el model i testejar el seu funcionament sobre infraestructures de xarxa reals. En l'última fase d'esta tesi s'ha dirigit l'atenció cap a la transmissió multimèdia en les xarxes de sensors sense fil (WSN). Partint dels resultats anteriors, s'ha adaptat tant l'arquitectura com el protocol de comunicacions per a aquest tipus concret de xarxa, l'ús del qual s'ha estés en els últims anys de forma considerable.Díaz Santos, JR. (2016). Design and Implementation of a Communication Protocol to Improve Multimedia QoS and QoE in Wireless Ad Hoc Networks [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62162TESI

    Effective and Economical Content Delivery and Storage Strategies for Cloud Systems

    Get PDF
    Cloud computing has proved to be an effective infrastructure to host various applications and provide reliable and stable services. Content delivery and storage are two main services provided by the cloud. A high-performance cloud can reduce the cost of both cloud providers and customers, while providing high application performance to cloud clients. Thus, the performance of such cloud-based services is closely related to three issues. First, when delivering contents from the cloud to users or transferring contents between cloud datacenters, it is important to reduce the payment costs and transmission time. Second, when transferring contents between cloud datacenters, it is important to reduce the payment costs to the internet service providers (ISPs). Third, when storing contents in the datacenters, it is crucial to reduce the file read latency and power consumption of the datacenters. In this dissertation, we study how to effectively deliver and store contents on the cloud, with a focus on cloud gaming and video streaming services. In particular, we aim to address three problems. i) Cost-efficient cloud computing system to support thin-client Massively Multiplayer Online Game (MMOG): how to achieve high Quality of Service (QoS) in cloud gaming and reduce the cloud bandwidth consumption; ii) Cost-efficient inter-datacenter video scheduling: how to reduce the bandwidth payment cost by fully utilizing link bandwidth when cloud providers transfer videos between datacenters; iii) Energy-efficient adaptive file replication: how to adapt to time-varying file popularities to achieve a good tradeoff between data availability and efficiency, as well as reduce the power consumption of the datacenters. In this dissertation, we propose methods to solve each of aforementioned challenges on the cloud. As a result, we build a cloud system that has a cost-efficient system to support cloud clients, an inter-datacenter video scheduling algorithm for video transmission on the cloud and an adaptive file replication algorithm for cloud storage system. As a result, the cloud system not only benefits the cloud providers in reducing the cloud cost, but also benefits the cloud customers in reducing their payment cost and improving high cloud application performance (i.e., user experience). Finally, we conducted extensive experiments on many testbeds, including PeerSim, PlanetLab, EC2 and a real-world cluster, which demonstrate the efficiency and effectiveness of our proposed methods. In our future work, we will further study how to further improve user experience in receiving contents and reduce the cost due to content transfer

    On the feasibility of using current data centre infrastructure for latency-sensitive applications

    Get PDF
    It has been claimed that the deployment of fog and edge computing infrastructure is a necessity to make high-performance cloud-based applications a possibility. However, there are a large number of middle-ground latency-sensitive applications such as online gaming, interactive photo editing and multimedia conferencing that require servers deployed closer to users than in globally centralised clouds but do not necessarily need the extreme low-latency provided by a new infrastructure of micro data centres located at the network edge, e.g., in base stations and ISP Points of Presence. In this paper we analyse a snapshot of today's data centres and the distribution of users around the globe and conclude that existing infrastructure provides a sufficiently distributed platform for middle-ground applications requiring a response time of 20-20020\hbox{-}20020-200 ms. However, while placement and selection of edge servers for extreme low-latency applications is a relatively straightforward matter of choosing the closest, providing a high quality of experience for middle-ground latency applications that use the more widespread distribution of today's data centres, as we advocate in this paper, raises new management challenges to develop algorithms for optimising the placement of and the per-request selection between replicated service instances
    corecore