7 research outputs found

    GPU Accelerated protocol analysis for large and long-term traffic traces

    Get PDF
    This thesis describes the design and implementation of GPF+, a complete general packet classification system developed using Nvidia CUDA for Compute Capability 3.5+ GPUs. This system was developed with the aim of accelerating the analysis of arbitrary network protocols within network traffic traces using inexpensive, massively parallel commodity hardware. GPF+ and its supporting components are specifically intended to support the processing of large, long-term network packet traces such as those produced by network telescopes, which are currently difficult and time consuming to analyse. The GPF+ classifier is based on prior research in the field, which produced a prototype classifier called GPF, targeted at Compute Capability 1.3 GPUs. GPF+ greatly extends the GPF model, improving runtime flexibility and scalability, whilst maintaining high execution efficiency. GPF+ incorporates a compact, lightweight registerbased state machine that supports massively-parallel, multi-match filter predicate evaluation, as well as efficient arbitrary field extraction. GPF+ tracks packet composition during execution, and adjusts processing at runtime to avoid redundant memory transactions and unnecessary computation through warp-voting. GPF+ additionally incorporates a 128-bit in-thread cache, accelerated through register shuffling, to accelerate access to packet data in slow GPU global memory. GPF+ uses a high-level DSL to simplify protocol and filter creation, whilst better facilitating protocol reuse. The system is supported by a pipeline of multi-threaded high-performance host components, which communicate asynchronously through 0MQ messaging middleware to buffer, index, and dispatch packet data on the host system. The system was evaluated using high-end Kepler (Nvidia GTX Titan) and entry level Maxwell (Nvidia GTX 750) GPUs. The results of this evaluation showed high system performance, limited only by device side IO (600MBps) in all tests. GPF+ maintained high occupancy and device utilisation in all tests, without significant serialisation, and showed improved scaling to more complex filter sets. Results were used to visualise captures of up to 160 GB in seconds, and to extract and pre-filter captures small enough to be easily analysed in applications such as Wireshark

    Efficient tree-based content-based routing schemes

    Get PDF
    This thesis is about routing and forwarding for inherently multicast communication such as the communication typical of information-centric networks. The notion of Information-Centric Networking (ICN) is an evolution of the Internet from the current host-centric architecture to a new architecture in which communication is based on “named information”. The ambitious goal of ICN is to effectively support the exchange and use of information in an ever more connected world, with billions of devices, many of which are mobile, producing and consuming large amounts of data. ICN is intended to support scalable content distribution, mobility, and security, for such applications as video on demand and networks of sensors or the so-called Internet of Things. Many ICN architectures have emerged in the past decade, and the ICN community has made significant progress in terms of infrastructure, test-bed deployments, and application case studies. And yet, despite the impressive research effort, the fundamental problems of routing and forwarding remain open. In particular, none of the proposed architectures has developed truly scalable name-based routing schemes and efficient name-based forwarding algorithms. This is not surprising, since the problem of routing based on names, in its most general formulation, is known to be fundamentally difficult. In general, one would want to support application-defined names (as opposed to network-defined addresses) with a compact routing scheme (small routing tables) that uses optimal paths and minimizes congestion, and that admits to a fast forwarding algorithm. Furthermore, one would want to construct this routing scheme with a decentralized and incremental protocol for administrative autonomy and efficient dynamic updates. However, there are clear theoretical limits that simply make it impossible to achieve all these goals. In this thesis we explore the design space of routing and forwarding in an information-centric network. Our purpose is to develop routing schemes and forwarding algorithms that combine many desirable properties. We consider two forms of addressing, one tied to network locations, and one based on more expressive content descriptors. We then consider trees as basic routing structures, and with those we develop routing schemes that are intended to minimize path lengths and congestion, separately or together. For one of these schemes based on expressive content descriptors, we also develop a fast forwarding algorithm specialized for massively parallel architectures such as GPUs. In summary, this thesis presents two efficient and scalable routing algorithms for two different types of networks, plus one scalable forwarding algorithm. We summarize each individual contribution below: Low-congestion geographic routing for wireless networks. We develop a low-congestion, multicast routing scheme designed specifically for wireless networks. The scheme supports geographical multicast routing, meaning routing to a set of nodes addressed by their physical position. The scheme builds a geometric minimum spanning tree connecting the source to all the destinations. Then, for each edge in this tree, the scheme routes a message through a random intermediate node, chosen independently of the set of multicast requests. The intermediate node is chosen in the vicinity of the corresponding edge such that congestion is reduced without stretching routes by more than a constant factor. Multi-tree scheme for content-based routing in ICN. We develop a tree-based routing scheme designed for large-scale wired networks such as the Internet. The scheme supports two forms of addresses: application-defined content descriptors, and network-defined locators. We first show that the scheme is effective in terms of stretch and congestion on the current AS-level Internet graph even with only a few spanning trees. Then we show that our content descriptors, which consist of sets of tags and that are more expressive than the name prefixes used in mainstream ICN, aggregate well in practice under our scheme. We also explain in detail how to use descriptors and locators, together with unique content identifiers, to support the efficient transmission and sharing of information through scalable and loop-free routes. Tag-based forwarding (partial matching) algorithm on GPUs. To accompany our ICN routing scheme, we develop a fast forwarding algorithm that matches incoming packets against forwarding tables with tens of millions of entries. To achieve high performance, we develop a practical solution for the partial matching problem that lies at the heart of this forwarding scheme. This solution amounts to a massively parallel algorithm specifically designed for a hybrid CPU/GPU architecture

    Just-in-time Analytics Over Heterogeneous Data and Hardware

    Get PDF
    Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of datasets to gain insights. At the same time, data variety increases continuously across multiple axes. First, data comes in multiple formats, such as the binary tabular data of a DBMS, raw textual files, and domain-specific formats. Second, different datasets follow different data models, such as the relational and the hierarchical one. Data location also varies: Some datasets reside in a central "data lake", whereas others lie in remote data sources. In addition, users execute widely different analysis tasks over all these data types. Finally, the process of gathering and integrating diverse datasets introduces several inconsistencies and redundancies in the data, such as duplicate entries for the same real-world concept. In summary, heterogeneity significantly affects the way data analysis is performed. In this thesis, we aim for data virtualization: Abstracting data out of its original form and manipulating it regardless of the way it is stored or structured, without a performance penalty. To achieve data virtualization, we design and implement systems that i) mask heterogeneity through the use of heterogeneity-aware, high-level building blocks and ii) offer fast responses through on-demand adaptation techniques. Regarding the high-level building blocks, we use a query language and algebra to handle multiple collection types, such as relations and hierarchies, express transformations between these collection types, as well as express complex data cleaning tasks over them. In addition, we design a location-aware compiler and optimizer that masks away the complexity of accessing multiple remote data sources. Regarding on-demand adaptation, we present a design to produce a new system per query. The design uses customization mechanisms that trigger runtime code generation to mimic the system most appropriate to answer a query fast: Query operators are thus created based on the query workload and the underlying data models; the data access layer is created based on the underlying data formats. In addition, we exploit emerging hardware by customizing the system implementation based on the available heterogeneous processors â CPUs and GPGPUs. We thus pair each workload with its ideal processor type. The end result is a just-in-time database system that is specific to the query, data, workload, and hardware instance. This thesis redesigns the data management stack to natively cater for data heterogeneity and exploit hardware heterogeneity. Instead of centralizing all relevant datasets, converting them to a single representation, and loading them in a monolithic, static, suboptimal system, our design embraces heterogeneity. Overall, our design decouples the type of performed analysis from the original data layout; users can perform their analysis across data stores, data models, and data formats, but at the same time experience the performance offered by a custom system that has been built on demand to serve their specific use case

    Media gateway utilizando um GPU

    Get PDF
    Mestrado em Engenharia de Computadores e Telemátic

    LASSO – an observatorium for the dynamic selection, analysis and comparison of software

    Full text link
    Mining software repositories at the scale of 'big code' (i.e., big data) is a challenging activity. As well as finding a suitable software corpus and making it programmatically accessible through an index or database, researchers and practitioners have to establish an efficient analysis infrastructure and precisely define the metrics and data extraction approaches to be applied. Moreover, for analysis results to be generalisable, these tasks have to be applied at a large enough scale to have statistical significance, and if they are to be repeatable, the artefacts need to be carefully maintained and curated over time. Today, however, a lot of this work is still performed by human beings on a case-by-case basis, with the level of effort involved often having a significant negative impact on the generalisability and repeatability of studies, and thus on their overall scientific value. The general purpose, 'code mining' repositories and infrastructures that have emerged in recent years represent a significant step forward because they automate many software mining tasks at an ultra-large scale and allow researchers and practitioners to focus on defining the questions they would like to explore at an abstract level. However, they are currently limited to static analysis and data extraction techniques, and thus cannot support (i.e., help automate) any studies which involve the execution of software systems. This includes experimental validations of techniques and tools that hypothesise about the behaviour (i.e., semantics) of software, or data analysis and extraction techniques that aim to measure dynamic properties of software. In this thesis a platform called LASSO (Large-Scale Software Observatorium) is introduced that overcomes this limitation by automating the collection of dynamic (i.e., execution-based) information about software alongside static information. It features a single, ultra-large scale corpus of executable software systems created by amalgamating existing Open Source software repositories and a dedicated DSL for defining abstract selection and analysis pipelines. Its key innovations are integrated capabilities for searching for selecting software systems based on their exhibited behaviour and an 'arena' that allows their responses to software tests to be compared in a purely data-driven way. We call the platform a 'software observatorium' since it is a place where the behaviour of large numbers of software systems can be observed, analysed and compared
    corecore