606 research outputs found

    High-speed TCP flow record extraction using GPUs

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-015-1478-9Traffic analysis is an essential part of capacity planning, quality of service assurance, and reinforcement of security in current telecommunication networks. Traffic volume increases with network speed and the analysis of large traffic traces is computationally intensive. The paper presents, for the first time ever, a flow extraction software that allows to obtain complex TCP-aware flow records at 4.4 millions of packets per second in a single GPU. Such TCP flow records include number of retransmissions and duplicates per flow, whose calculation is very challenging to obtain at high-speed. Our software significantly increases the processing performance of the recently proposed high-speed sniffers based on commodity hardware and demonstrates the advantages of applying massively parallel processing devices for traffic analysis

    GPU Accelerated protocol analysis for large and long-term traffic traces

    Get PDF
    This thesis describes the design and implementation of GPF+, a complete general packet classification system developed using Nvidia CUDA for Compute Capability 3.5+ GPUs. This system was developed with the aim of accelerating the analysis of arbitrary network protocols within network traffic traces using inexpensive, massively parallel commodity hardware. GPF+ and its supporting components are specifically intended to support the processing of large, long-term network packet traces such as those produced by network telescopes, which are currently difficult and time consuming to analyse. The GPF+ classifier is based on prior research in the field, which produced a prototype classifier called GPF, targeted at Compute Capability 1.3 GPUs. GPF+ greatly extends the GPF model, improving runtime flexibility and scalability, whilst maintaining high execution efficiency. GPF+ incorporates a compact, lightweight registerbased state machine that supports massively-parallel, multi-match filter predicate evaluation, as well as efficient arbitrary field extraction. GPF+ tracks packet composition during execution, and adjusts processing at runtime to avoid redundant memory transactions and unnecessary computation through warp-voting. GPF+ additionally incorporates a 128-bit in-thread cache, accelerated through register shuffling, to accelerate access to packet data in slow GPU global memory. GPF+ uses a high-level DSL to simplify protocol and filter creation, whilst better facilitating protocol reuse. The system is supported by a pipeline of multi-threaded high-performance host components, which communicate asynchronously through 0MQ messaging middleware to buffer, index, and dispatch packet data on the host system. The system was evaluated using high-end Kepler (Nvidia GTX Titan) and entry level Maxwell (Nvidia GTX 750) GPUs. The results of this evaluation showed high system performance, limited only by device side IO (600MBps) in all tests. GPF+ maintained high occupancy and device utilisation in all tests, without significant serialisation, and showed improved scaling to more complex filter sets. Results were used to visualise captures of up to 160 GB in seconds, and to extract and pre-filter captures small enough to be easily analysed in applications such as Wireshark

    A framework for network traffic analysis using GPUs

    Get PDF
    During the last years the computer networks have become an important part of our society. Networks have kept growing in size and complexity, making more complex its management and traffic monitoring and analysis processes, due to the huge amount of data and calculations involved. In the last decade, several researchers found effective to use graphics processing units (GPUs) rather than a traditional processors (CPU) to boost the execution of some algorithms not related to graphics (GPGPU). In 2006 the GPU chip manufacturer NVIDIA launched CUDA, a library that allows software developers to use their GPUs to perform general purpose algorithm calculations, using the C programming language. This thesis presents a framework which tries to simplify the task of programming network traffic analysis with CUDA to software developers. The objectives of the framework have been abstracting the task of obtaining network packets, simplify the task of creating network analysis programs using CUDA and offering an easy way to reuse the analysis code. Several network traffic analysis have also been developed

    Scaling a convolutional neural network based Flower counting application in a distributed GPU cluster

    Get PDF
    Taking advantage of modern data acquisition techniques, the researchers of P2IRC located at the University of Saskatchewan developed an application to monitor the status of the flower growth during different phases of the blooming period and the yield prediction of canola crops. Though the application could predict the near accurate number of flowers in a few scenarios, its inability to function under challenging situations such as misinterpreting sun reflection or dust along the roadside as flowers have motivated the researchers to find an alternative approach of counting flowers. In addition to being a more accurate version, another goal is for the new application to be faster to infer the number of flowers and scalable in distributed environments. Putting these goals in mind, in this thesis, a Convolutional neural network (CNN) based flower counting application is developed and evaluated taking inspiration from two other previous works where CNN was used for counting heads in dense crowds and predicting the number of bacterial cells from medical imagery. In addition to that, the application addresses the performance and the accuracy goals previously mentioned. Two challenges of using the neural network are (a) the training needs a large volume of data to converge to a low error and (b) the training is computationally expensive and it takes longer time to complete. To address the first challenge, experiments were run with both "ground truth" estimated using a modified version of the previous flower counter, and ground truth from manual annotation. To address the problem of long training time, two distributed versions of the proposed application were created based on two different distributed architectures called Parameter Server and Ring-AllReduce. Moreover, a detailed explanation of the proposed CNN's architecture along with its memory footprints and GPU utilization is also organized as an in-depth case study to help trace the model's memory consumption during training. From different sets of experiments, the new flower counter application is observed more accurate than its previous version and both implementations of its distributed versions successfully reduced the total completion time as a result of being linearly scalable when more workers are added to run the training. The Ring-AllReduce version performed slightly better than the Parameter Server, but the differences were not substantial

    マルチレベル並列化とアプリケーション指向データレイアウトを用いるハードウェアアクセラレータの設計と実装

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 稲葉 雅幸, 東京大学教授 須田 礼仁, 東京大学教授 五十嵐 健夫, 東京大学教授 山西 健司, 東京大学准教授 稲葉 真理, 東京大学講師 中山 英樹University of Tokyo(東京大学

    Online detection of pathological TCP flows with retransmissions in high-speed networks

    Get PDF
    Online Quality of Service (QoS) assessment in high speed networks is one of the key concerns for service providers, namely to detect QoS degradation on-the-fly as soon as possible and avoid customers’ complaints. In this regard, a Key Performance Indicator (KPI) is the number of TCP retransmissions per flow, which is related to packet losses or increased network and/or client/server latency. However, to accurately detect TCP retransmissions the whole sequence number list should be tracked which is a challenging task in multi-Gb/s networks. In this paper we show that the simplest approach of counting as a retransmission a packet whose sequence number is smaller than the previous one is enough to detect pathological flows with severe retransmissions. Such a lightweight approach eliminates the need of tracking the whole TCP flow history, which severely restricts traffic analysis throughput. Our findings show that low False Positive Rates (FPR) and False Negative Rates (FNR) can be achieved in the detection of such pathological flows with severe retransmissions, which are of paramount importance for QoS monitoring. Most importantly, we show that live detection of such pathological flows at 10 Gb/s rate per processing core is feasibleThis work has been partially funded by the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund under the projects TRÁFICA (MINECO/ FEDER TEC2015-69417-C2-1-R), Preproceso Inteligente de Tráfico (MINECO / FEDER TEC2015-69417-C2-2-R) and RACING DRONES (MINECO / FEDER RTC-2016-4744-7
    corecore