28 research outputs found

    Security Applications of GPUs

    Get PDF
    Despite the recent advances in software security hardening techniques, vulnerabilities can always be exploited if the attackers are really determined. Regardless the protection enabled, successful exploitation can always be achieved, even though admittedly, today, it is much harder than it was in the past. Since securing software is still under ongoing research, the community investigates detection methods in order to protect software. Three of the most promising such methods are monitoring the (i) network, (ii) the filesystem, and (iii) the host memory, for possible exploitation. Whenever a malicious operation is detected then the monitor should be able to terminate it and/or alert the administrator. In this chapter, we explore how to utilize the highly parallel capabilities of modern commodity graphics processing units (GPUs) in order to improve the performance of different security tools operating at the network, storage, and memory level, and how they can offload the CPU whenever possible. Our results show that modern GPUs can be very efficient and highly effective at accelerating the pattern matching operations of network intrusion detection systems and antivirus tools, as well as for monitoring the integrity of the base computing systems

    Arax: A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators

    Full text link
    Today, using multiple heterogeneous accelerators efficiently from applications and high-level frameworks, such as TensorFlow and Caffe, poses significant challenges in three respects: (a) sharing accelerators, (b) allocating available resources elastically during application execution, and (c) reducing the required programming effort. In this paper, we present Arax, a runtime system that decouples applications from heterogeneous accelerators within a server. First, Arax maps application tasks dynamically to available resources, managing all required task state, memory allocations, and task dependencies. As a result, Arax can share accelerators across applications in a server and adjust the resources used by each application as load fluctuates over time. dditionally, Arax offers a simple API and includes Autotalk, a stub generator that automatically generates stub libraries for applications already written for specific accelerator types, such as NVIDIA GPUs. Consequently, Arax applications are written once without considering physical details, including the number and type of accelerators. Our results show that applications, such as Caffe, TensorFlow, and Rodinia, can run using Arax with minimum effort and low overhead compared to native execution, about 12% (geometric mean). Arax supports efficient accelerator sharing, by offering up to 20% improved execution times compared to NVIDIA MPS, which supports NVIDIA GPUs only. Arax can transparently provide elasticity, decreasing total application turn-around time by up to 2x compared to native execution without elasticity support

    Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs

    Get PDF
    Pattern matching is an important building block for many security applications, including Network Intrusion Detection Systems (NIDS). As NIDS grow in functionality and complexity, the time overhead and energy consumption of pattern matching become a significant consideration that limits the deployability of such systems, especially on resource-constrained devices.\ua0On the other hand, the emergence of new computing platforms, such as embedded devices with integrated, general-purpose Graphics Processing Units (GPUs), brings new, interesting challenges and opportunities for algorithm design in this setting: how to make use of new architectural features and how to evaluate their effect on algorithm performance. Up to now, work that focuses on pattern matching for such platforms has been limited to specific algorithms in isolation.In this work, we present a systematic and comprehensive benchmark that allows us to co-evaluate both existing and new pattern matching algorithms on heterogeneous devices equipped with embedded GPUs, suitable for medium- to high-level IoT deployments. We evaluate the algorithms on such a heterogeneous device, in close connection with the architectural features of the platform and provide insights on how these features affect the algorithms\u27 behavior. We find that, in our target embedded platform, GPU-based pattern matching algorithms have competitive performance compared to the CPU and consume half as much energy as the CPU-based variants.\ua0Based on these insights, we also propose HYBRID, a new pattern matching approach that efficiently combines techniques from existing approaches and outperforms them by 1.4x, across a range of realistic and synthetic data sets. Our benchmark details the effect of various optimizations, thus providing a path forward to make existing security mechanisms such as NIDS deployable on IoT devices

    G-Safe: Safe GPU Sharing in Multi-Tenant Environments

    Full text link
    Modern GPU applications, such as machine learning (ML) frameworks, can only partially utilize beefy GPUs, leading to GPU underutilization in cloud environments. Sharing GPUs across multiple applications from different users can improve resource utilization and consequently cost, energy, and power efficiency. However, GPU sharing creates memory safety concerns because kernels must share a single GPU address space (GPU context). Previous GPU memory protection approaches have limited deployability because they require specialized hardware extensions or access to source code. This is often unavailable in GPU-accelerated libraries heavily utilized by ML frameworks. In this paper, we present G-Safe, a PTX-level bounds checking approach for GPUs that limits GPU kernels of each application to stay within the memory partition allocated to them. G-Safe relies on three mechanisms: (1) It divides the common GPU address space into separate partitions for different applications. (2) It intercepts and checks data transfers, fencing erroneous operations. (3) It instruments all GPU kernels at the PTX level (available in closed GPU libraries) fencing all kernel memory accesses outside application memory bounds. We implement G-Safe as an external, dynamically linked library that can be pre-loaded at application startup time. G-Safe's approach is transparent to applications and can support real-life, complex frameworks, such as Caffe and PyTorch, that issue billions of GPU kernels. Our evaluation shows that the overhead of G-Safe compared to native (unprotected) for such frameworks is between 4\% - 12\% and on average 9\%

    Towards specification of a software architecture for cross-sectoral big data applications

    Get PDF
    The proliferation of Big Data applications puts pressure on improving and optimizing the handling of diverse datasets across different domains. Among several challenges, major difficulties arise in data-sensitive domains like banking, telecommunications, etc., where strict regulations make very difficult to upload and experiment with real data on external cloud resources. In addition, most Big Data research and development efforts aim to address the needs of IT experts, while Big Data analytics tools remain unavailable to non-expert users to a large extent. In this paper, we report on the work-in-progress carried out in the context of the H2020 project I-BiDaaS (Industrial-Driven Big Data as a Self-service Solution) which aims to address the above challenges. The project will design and develop a novel architecture stack that can be easily configured and adjusted to address cross-sectoral needs, helping to resolve data privacy barriers in sensitive domains, and at the same time being usable by non-experts. This paper discusses and motivates the need for Big Data as a self-service, reviews the relevant literature, and identifies gaps with respect to the challenges described above. We then present the I-BiDaaS paradigm for Big Data as a self-service, position it in the context of existing references, and report on initial work towards the conceptual specification of the I-BiDaaS software architecture.This work is supported by the IBiDaaS project, funded by the European Commission under Grant Agreement No. 780787.Peer ReviewedPostprint (author's final draft

    Επιτάχυνση της επεξεργασίας πακέτων δεδομένων δικτύου με χρήση καρτών γραφικών

    No full text
    The need for differentiated services (such as firewalls, network intrusion detection/prevention systems and traffic classification applications) that lie in the core of the Internet, instead of the end points, constantly increases. These services need to perform complex packet processing operations at upper networking layers, which, unfortunately, are not supported by traditional edge routers. To address this evolution, specialized network appliances (called "middleboxes") are deployed, which typically perform complex packet processing operations - ranging from deep packet inspection operations to packet encryption and redundancy elimination. Packet-processing implemented in software promises to enable the fast deployment of new, sophisticated processing without the need to buy and deploy expensive new equipment. In this thesis, we propose to increase the throughput of packet processing operations by using Graphics Processing Units (GPUs). GPUs have evolved to massively parallel computational devices, containing hundreds of processing cores that can be used for general-purpose computing beyond graphics rendering. GPUs, however, have a different set of constraints and properties that can prevent existing software from obtaining the improved throughput benefits GPUs can provide.This dissertation analyzes the tradeoffs of using modern graphics processors for stateful packet processing and describes the software techniques needed to improve its performance. First, we present a deep study into accelerating packet processing operations using discrete modern graphics cards. Second, we present a broader multi-parallel stateful packet processing architecture that carefully parallelizes network traffic processing and analysis at three levels, using multi-queue network interfaces (NICs), multiple CPUs, and multiple GPUs. Last, we explore the design of a GPU-based stateful packet processing framework, identifying a modular mechanism for writing GPU-based packet processing applications, eliminating excessive data transfers as well as redundant work found in monolithic GPU-assisted applications. Our experimental results demonstrate that properly architecting stateful packet processing software for modern GPU architectures can drastically improve throughput compared to a multi-core CPU implementation.Η ανάγκη για διαφορετικού τύπου υπηρεσίες στον πυρήνα του Internet (όπως firewalls, συστήματα ανίχνευσης και πρόληψης επιθέσεων,συστήματα κατηγοριοποίησης της κίνησης του δικτύου, κλπ.), συνεχώς αυξάνει. Οι υπηρεσίες αυτές πρέπει να εκτελούν περίπλοκεςεπεξεργασίες στα πακέτα δεδομένων σε όλα τα επίπεδα της στοίβας δικτύου, οι οποίες, όμως, δεν υποστηρίζονται από τους παραδοσιακούςδρομολογητές. Για το λόγο αυτό, έχουν αναπτυχθεί εξειδικευμένες συσκευές (οι οποίες ονομάζονται "middleboxes").Οι συσκευές αυτές εκτελούν διάφορες μορφές επεξεργασίας - οι οποίες περιλαμβάνουν, για παράδειγμα, τη λεπτομερειακή ανάλυσητων πακέτων, τη κρυπτογράφηση τους και τη συμπίεση. Επιπλέον, η επεξεργασία των πακέτων που βασίζεται αποκλειστικά σελογισμικό υπόσχεται την ταχεία ανάπτυξη νέων, εξελιγμένων μορφών επεξεργασίας, χωρίς την ανάγκη για αγορά ακριβού εξοπλισμού.Σε αυτή την διατριβή, προτείνουμε τη χρήση μονάδων επεξεργασίας γραφικών (GPUs) προκειμένου να αυξήσουμε την απόδοση τωνεφαρμογών επεξεργασίας πακέτων δικτύου. Οι GPUs έχουν εξελιχθεί σε υπολογιστικές συσκευές οι οποίες προσφέρουν μαζικόπαραλληλισμό, καθώς περιέχουν εκατοντάδες πυρήνες επεξεργασίας που μπορούν να χρησιμοποιηθούν για υπολογισμούς γενικής χρήσης,πέρα από την χρήση τους σε εφαρμογές γραφικών. Ωστόσο, λόγω των διαφορετικών ιδιοτήτων και περιορισμών των GPUs, δεν είναιπάντα εύκολη υπόθεση να βελτιωθεί η απόδοση του υπάρχοντος λογισμικού.Η παρούσα διατριβή αναλύει τη χρήση GPUs για την επεξεργασία πακέτων δικτύου (με εποπτεία κατάστασης) και περιγράφειτεχνικές που απαιτούνται ώστε να βελτιωθεί η επίδοση της. Αρχικά, παρουσιάζουμε μια μελέτη σχετικά με την επιτάχυνση τηςεπεξεργασίας πακέτων χρησιμοποιώντας διακριτές κάρτες γραφικών. Δεύτερον, παρουσιάζουμε μια ευρύτερη αρχιτεκτονικήεπεξεργασίας πακέτων που παραλληλοποιεί προσεκτικά την επεξεργασία και ανάλυση των πακέτων σε τρία επίπεδα,χρησιμοποιώντας (i) διασυνδέσεις δικτύου με πολλαπλές ουρές, (ii) πολλαπλούς επεξεργαστές (CPUs), και (iii) πολλαπλές GPUs.Τέλος, διερευνούμε το σχεδιασμό ενός framework για την επεξεργασία πακέτων βασισμένο σε GPUs, το οποίο προσδιορίζειέναν αρθρωτό (modular) μηχανισμό για τον προγραμματισμό εφαρμογών επεξεργασίας πακέτων, το οποίο περιορίζει τις μεταφορέςδεδομένων και τις περιττές επεξεργασίες. Τα πειραματικά αποτελέσματα μας δείχνουν ότι η ορθή υλοποίηση εφαρμογών επεξεργασίαςπακέτων βασισμένες σε GPUs, μπορεί να βελτιώσει σημαντικά την απόδοση σε σύγκριση με μία πολυ-πύρινη CPU
    corecore