28 research outputs found
Security Applications of GPUs
Despite the recent advances in software security hardening techniques, vulnerabilities can always be exploited if the attackers are really determined. Regardless the protection enabled, successful exploitation can always be achieved, even though admittedly, today, it is much harder than it was in the past. Since securing software is still under ongoing research, the community investigates detection methods in order to protect software. Three of the most promising such methods are monitoring the (i) network, (ii) the filesystem, and (iii) the host memory, for possible exploitation. Whenever a malicious operation is detected then the monitor should be able to terminate it and/or alert the administrator. In this chapter, we explore how to utilize the highly parallel capabilities of modern commodity graphics processing units (GPUs) in order to improve the performance of different security tools operating at the network, storage, and memory level, and how they can offload the CPU whenever possible. Our results show that modern GPUs can be very efficient and highly effective at accelerating the pattern matching operations of network intrusion detection systems and antivirus tools, as well as for monitoring the integrity of the base computing systems
Arax: A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators
Today, using multiple heterogeneous accelerators efficiently from
applications and high-level frameworks, such as TensorFlow and Caffe, poses
significant challenges in three respects: (a) sharing accelerators, (b)
allocating available resources elastically during application execution, and
(c) reducing the required programming effort. In this paper, we present Arax, a
runtime system that decouples applications from heterogeneous accelerators
within a server. First, Arax maps application tasks dynamically to available
resources, managing all required task state, memory allocations, and task
dependencies. As a result, Arax can share accelerators across applications in a
server and adjust the resources used by each application as load fluctuates
over time. dditionally, Arax offers a simple API and includes Autotalk, a stub
generator that automatically generates stub libraries for applications already
written for specific accelerator types, such as NVIDIA GPUs. Consequently, Arax
applications are written once without considering physical details, including
the number and type of accelerators. Our results show that applications, such
as Caffe, TensorFlow, and Rodinia, can run using Arax with minimum effort and
low overhead compared to native execution, about 12% (geometric mean). Arax
supports efficient accelerator sharing, by offering up to 20% improved
execution times compared to NVIDIA MPS, which supports NVIDIA GPUs only. Arax
can transparently provide elasticity, decreasing total application turn-around
time by up to 2x compared to native execution without elasticity support
Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs
Pattern matching is an important building block for many security applications, including Network Intrusion Detection Systems (NIDS). As NIDS grow in functionality and complexity, the time overhead and energy consumption of pattern matching become a significant consideration that limits the deployability of such systems, especially on resource-constrained devices.\ua0On the other hand, the emergence of new computing platforms, such as embedded devices with integrated, general-purpose Graphics Processing Units (GPUs), brings new, interesting challenges and opportunities for algorithm design in this setting: how to make use of new architectural features and how to evaluate their effect on algorithm performance. Up to now, work that focuses on pattern matching for such platforms has been limited to specific algorithms in isolation.In this work, we present a systematic and comprehensive benchmark that allows us to co-evaluate both existing and new pattern matching algorithms on heterogeneous devices equipped with embedded GPUs, suitable for medium- to high-level IoT deployments. We evaluate the algorithms on such a heterogeneous device, in close connection with the architectural features of the platform and provide insights on how these features affect the algorithms\u27 behavior. We find that, in our target embedded platform, GPU-based pattern matching algorithms have competitive performance compared to the CPU and consume half as much energy as the CPU-based variants.\ua0Based on these insights, we also propose HYBRID, a new pattern matching approach that efficiently combines techniques from existing approaches and outperforms them by 1.4x, across a range of realistic and synthetic data sets. Our benchmark details the effect of various optimizations, thus providing a path forward to make existing security mechanisms such as NIDS deployable on IoT devices
G-Safe: Safe GPU Sharing in Multi-Tenant Environments
Modern GPU applications, such as machine learning (ML) frameworks, can only
partially utilize beefy GPUs, leading to GPU underutilization in cloud
environments. Sharing GPUs across multiple applications from different users
can improve resource utilization and consequently cost, energy, and power
efficiency. However, GPU sharing creates memory safety concerns because kernels
must share a single GPU address space (GPU context). Previous GPU memory
protection approaches have limited deployability because they require
specialized hardware extensions or access to source code. This is often
unavailable in GPU-accelerated libraries heavily utilized by ML frameworks. In
this paper, we present G-Safe, a PTX-level bounds checking approach for GPUs
that limits GPU kernels of each application to stay within the memory partition
allocated to them. G-Safe relies on three mechanisms: (1) It divides the common
GPU address space into separate partitions for different applications. (2) It
intercepts and checks data transfers, fencing erroneous operations. (3) It
instruments all GPU kernels at the PTX level (available in closed GPU
libraries) fencing all kernel memory accesses outside application memory
bounds. We implement G-Safe as an external, dynamically linked library that can
be pre-loaded at application startup time. G-Safe's approach is transparent to
applications and can support real-life, complex frameworks, such as Caffe and
PyTorch, that issue billions of GPU kernels. Our evaluation shows that the
overhead of G-Safe compared to native (unprotected) for such frameworks is
between 4\% - 12\% and on average 9\%
Towards specification of a software architecture for cross-sectoral big data applications
The proliferation of Big Data applications puts pressure on improving and optimizing the handling of diverse datasets across different domains. Among several challenges, major difficulties arise in data-sensitive domains like banking, telecommunications, etc., where strict regulations make very difficult to upload and experiment with real data on external cloud resources. In addition, most Big Data research and development efforts aim to address the needs of IT experts, while Big Data analytics tools remain unavailable to non-expert users to a large extent. In this paper, we report on the work-in-progress carried out in the context of the H2020 project I-BiDaaS (Industrial-Driven Big Data as a Self-service Solution) which aims to address the above challenges. The project will design and develop a novel architecture stack that can be easily configured and adjusted to address cross-sectoral needs, helping to resolve data privacy barriers in sensitive domains, and at the same time being usable by non-experts. This paper discusses and motivates the need for Big Data as a self-service, reviews the relevant literature, and identifies gaps with respect to the challenges described above. We then present the I-BiDaaS paradigm for Big Data as a self-service, position it in the context of existing references, and report on initial work towards the conceptual specification of the I-BiDaaS software architecture.This work is supported by the IBiDaaS project, funded by the European Commission under Grant Agreement No. 780787.Peer ReviewedPostprint (author's final draft
Recommended from our members
Cybersecurity for industrial Internet of Things: architecture, models and lessons learned
Modern industrial systems now, more than ever, require secure and efficient ways of communication. The trend of making connected, smart architectures is beginning to show in various fields of the industry such as manufacturing and logistics. The number of IoT (Internet of Things) devices used in such systems is naturally increasing and industry leaders want to define business processes which are reliable, reproducible, and can be effortlessly monitored. With the rise in number of connected industrial systems, the number of used IoT devices also grows and with that some challenges arise. Cybersecurity in these types of systems is crucial for their wide adoption. Without safety in communication and threat detection and prevention techniques, it can be very difficult to use smart, connected systems in the industry setting. In this paper we describe two real-world examples of such systems while focusing on our architectural choices and lessons learned. We demonstrate our vision for implementing a connected industrial system with secure data flow and threat detection and mitigation strategies on real-world data and IoT devices. While our system is not an off-the-shelf product, our architecture design and results show advantages of using technologies such as Deep Learning for threat detection and Blockchain enhanced communication in industrial IoT systems and how these technologies can be implemented. We demonstrate empirical results of various components of our system and also the performance of our system as-a-whole
Επιτάχυνση της επεξεργασίας πακέτων δεδομένων δικτύου με χρήση καρτών γραφικών
The need for differentiated services (such as firewalls, network intrusion detection/prevention systems and traffic classification applications) that lie in the core of the Internet, instead of the end points, constantly increases. These services need to perform complex packet processing operations at upper networking layers, which, unfortunately, are not supported by traditional edge routers. To address this evolution, specialized network appliances (called "middleboxes") are deployed, which typically perform complex packet processing operations - ranging from deep packet inspection operations to packet encryption and redundancy elimination. Packet-processing implemented in software promises to enable the fast deployment of new, sophisticated processing without the need to buy and deploy expensive new equipment. In this thesis, we propose to increase the throughput of packet processing operations by using Graphics Processing Units (GPUs). GPUs have evolved to massively parallel computational devices, containing hundreds of processing cores that can be used for general-purpose computing beyond graphics rendering. GPUs, however, have a different set of constraints and properties that can prevent existing software from obtaining the improved throughput benefits GPUs can provide.This dissertation analyzes the tradeoffs of using modern graphics processors for stateful packet processing and describes the software techniques needed to improve its performance. First, we present a deep study into accelerating packet processing operations using discrete modern graphics cards. Second, we present a broader multi-parallel stateful packet processing architecture that carefully parallelizes network traffic processing and analysis at three levels, using multi-queue network interfaces (NICs), multiple CPUs, and multiple GPUs. Last, we explore the design of a GPU-based stateful packet processing framework, identifying a modular mechanism for writing GPU-based packet processing applications, eliminating excessive data transfers as well as redundant work found in monolithic GPU-assisted applications. Our experimental results demonstrate that properly architecting stateful packet processing software for modern GPU architectures can drastically improve throughput compared to a multi-core CPU implementation.Η ανάγκη για διαφορετικού τύπου υπηρεσίες στον πυρήνα του Internet (όπως firewalls, συστήματα ανίχνευσης και πρόληψης επιθέσεων,συστήματα κατηγοριοποίησης της κίνησης του δικτύου, κλπ.), συνεχώς αυξάνει. Οι υπηρεσίες αυτές πρέπει να εκτελούν περίπλοκεςεπεξεργασίες στα πακέτα δεδομένων σε όλα τα επίπεδα της στοίβας δικτύου, οι οποίες, όμως, δεν υποστηρίζονται από τους παραδοσιακούςδρομολογητές. Για το λόγο αυτό, έχουν αναπτυχθεί εξειδικευμένες συσκευές (οι οποίες ονομάζονται "middleboxes").Οι συσκευές αυτές εκτελούν διάφορες μορφές επεξεργασίας - οι οποίες περιλαμβάνουν, για παράδειγμα, τη λεπτομερειακή ανάλυσητων πακέτων, τη κρυπτογράφηση τους και τη συμπίεση. Επιπλέον, η επεξεργασία των πακέτων που βασίζεται αποκλειστικά σελογισμικό υπόσχεται την ταχεία ανάπτυξη νέων, εξελιγμένων μορφών επεξεργασίας, χωρίς την ανάγκη για αγορά ακριβού εξοπλισμού.Σε αυτή την διατριβή, προτείνουμε τη χρήση μονάδων επεξεργασίας γραφικών (GPUs) προκειμένου να αυξήσουμε την απόδοση τωνεφαρμογών επεξεργασίας πακέτων δικτύου. Οι GPUs έχουν εξελιχθεί σε υπολογιστικές συσκευές οι οποίες προσφέρουν μαζικόπαραλληλισμό, καθώς περιέχουν εκατοντάδες πυρήνες επεξεργασίας που μπορούν να χρησιμοποιηθούν για υπολογισμούς γενικής χρήσης,πέρα από την χρήση τους σε εφαρμογές γραφικών. Ωστόσο, λόγω των διαφορετικών ιδιοτήτων και περιορισμών των GPUs, δεν είναιπάντα εύκολη υπόθεση να βελτιωθεί η απόδοση του υπάρχοντος λογισμικού.Η παρούσα διατριβή αναλύει τη χρήση GPUs για την επεξεργασία πακέτων δικτύου (με εποπτεία κατάστασης) και περιγράφειτεχνικές που απαιτούνται ώστε να βελτιωθεί η επίδοση της. Αρχικά, παρουσιάζουμε μια μελέτη σχετικά με την επιτάχυνση τηςεπεξεργασίας πακέτων χρησιμοποιώντας διακριτές κάρτες γραφικών. Δεύτερον, παρουσιάζουμε μια ευρύτερη αρχιτεκτονικήεπεξεργασίας πακέτων που παραλληλοποιεί προσεκτικά την επεξεργασία και ανάλυση των πακέτων σε τρία επίπεδα,χρησιμοποιώντας (i) διασυνδέσεις δικτύου με πολλαπλές ουρές, (ii) πολλαπλούς επεξεργαστές (CPUs), και (iii) πολλαπλές GPUs.Τέλος, διερευνούμε το σχεδιασμό ενός framework για την επεξεργασία πακέτων βασισμένο σε GPUs, το οποίο προσδιορίζειέναν αρθρωτό (modular) μηχανισμό για τον προγραμματισμό εφαρμογών επεξεργασίας πακέτων, το οποίο περιορίζει τις μεταφορέςδεδομένων και τις περιττές επεξεργασίες. Τα πειραματικά αποτελέσματα μας δείχνουν ότι η ορθή υλοποίηση εφαρμογών επεξεργασίαςπακέτων βασισμένες σε GPUs, μπορεί να βελτιώσει σημαντικά την απόδοση σε σύγκριση με μία πολυ-πύρινη CPU