82 research outputs found

    API Comparison of CPU-To-GPU Command Offloading Latency on Embedded Platforms (Artifact)

    Get PDF
    High-performance heterogeneous embedded platforms allow offloading of parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). A time-predictable characterization of task submission is a must in real-time applications. We provide a profiler of the time spent by the CPU for submitting stereotypical GP-GPU workload shaped as a Deep Neural Network of parameterized complexity. The submission is performed using the latest API available: NVIDIA CUDA, including its various techniques, and Vulkan. Complete automation for the test on Jetson Xavier is also provided by scripts that install software dependencies, run the experiments, and collect results in a PDF report

    Novel Methodologies for Predictable CPU-To-GPU Command Offloading

    Get PDF
    There is an increasing industrial and academic interest towards a more predictable characterization of real-time tasks on high-performance heterogeneous embedded platforms, where a host system offloads parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). In this paper, we analyze an important aspect that has not yet been considered in the real-time literature, and that may significantly affect real-time performance if not properly treated, i.e., the time spent by the CPU for submitting GP-GPU operations. We will show that the impact of CPU-to-GPU kernel submissions may be indeed relevant for typical real-time workloads, and that it should be properly factored in when deriving an integrated schedulability analysis for the considered platforms. This is the case when an application is composed of many small and consecutive GPU compute/copy operations. While existing techniques mitigate this issue by batching kernel calls into a reduced number of persistent kernel invocations, in this work we present and evaluate three other approaches that are made possible by recently released versions of the NVIDIA CUDA GP-GPU API, and by Vulkan, a novel open standard GPU API that allows an improved control of GPU command submissions. We will show that this added control may significantly improve the application performance and predictability due to a substantial reduction in CPU-to-GPU driver interactions, making Vulkan an interesting candidate for becoming the state-of-the-art API for heterogeneous Real-Time systems. Our findings are evaluated on a latest generation NVIDIA Jetson AGX Xavier embedded board, executing typical workloads involving Deep Neural Networks of parameterized complexity

    AMD GPUs as an Alternative to NVIDIA for Supporting Real-Time Workloads

    Get PDF

    OpenCL acceleration on FPGA vs CUDA on GPU

    Get PDF

    Fog Computing

    Get PDF
    Everything that is not a computer, in the traditional sense, is being connected to the Internet. These devices are also referred to as the Internet of Things and they are pressuring the current network infrastructure. Not all devices are intensive data producers and part of them can be used beyond their original intent by sharing their computational resources. The combination of those two factors can be used either to perform insight over the data closer where is originated or extend into new services by making available computational resources, but not exclusively, at the edge of the network. Fog computing is a new computational paradigm that provides those devices a new form of cloud at a closer distance where IoT and other devices with connectivity capabilities can offload computation. In this dissertation, we have explored the fog computing paradigm, and also comparing with other paradigms, namely cloud, and edge computing. Then, we propose a novel architecture that can be used to form or be part of this new paradigm. The implementation was tested on two types of applications. The first application had the main objective of demonstrating the correctness of the implementation while the other application, had the goal of validating the characteristics of fog computing.Tudo o que não é um computador, no sentido tradicional, está sendo conectado à Internet. Esses dispositivos também são chamados de Internet das Coisas e estão pressionando a infraestrutura de rede atual. Nem todos os dispositivos são produtores intensivos de dados e parte deles pode ser usada além de sua intenção original, compartilhando seus recursos computacionais. A combinação desses dois fatores pode ser usada para realizar processamento dos dados mais próximos de onde são originados ou estender para a criação de novos serviços, disponibilizando recursos computacionais periféricos à rede. Fog computing é um novo paradigma computacional que fornece a esses dispositivos uma nova forma de nuvem a uma distância mais próxima, onde “Things” e outros dispositivos com recursos de conectividade possam delegar processamento. Nesta dissertação, exploramos fog computing e também comparamos com outros paradigmas, nomeadamente cloud e edge computing. Em seguida, propomos uma nova arquitetura que pode ser usada para formar ou fazer parte desse novo paradigma. A implementação foi testada em dois tipos de aplicativos. A primeira aplicação teve o objetivo principal de demonstrar a correção da implementação, enquanto a outra aplicação, teve como objetivo validar as características de fog computing

    Managing Event-Driven Applications in Heterogeneous Fog Infrastructures

    Get PDF
    The steady increase in digitalization propelled by the Internet of Things (IoT) has led to a deluge of generated data at unprecedented pace. Thereby, the promise to realize data-driven decision-making is a major innovation driver in a myriad of industries. Based on the widely used event processing paradigm, event-driven applications allow to analyze data in the form of event streams in order to extract relevant information in a timely manner. Most recently, graphical flow-based approaches in no-code event processing systems have been introduced to significantly lower technological entry barriers. This empowers non-technical citizen technologists to create event-driven applications comprised of multiple interconnected event-driven processing services. Still, today’s event-driven applications are focused on centralized cloud deployments that come with inevitable drawbacks, especially in the context of IoT scenarios that require fast results, are limited by the available bandwidth, or are bound by the regulations in terms of privacy and security. Despite recent advances in the area of fog computing which mitigate these shortcomings by extending the cloud and moving certain processing closer to the event source, these approaches are hardly established in existing systems. Inherent fog computing characteristics, especially the heterogeneity of resources alongside novel application management demands, particularly the aspects of geo-distribution and dynamic adaptation, pose challenges that are currently insufficiently addressed and hinder the transition to a next generation of no-code event processing systems. The contributions of this thesis enable citizen technologists to manage event-driven applications in heterogeneous fog infrastructures along the application life cycle. Therefore, an approach for a holistic application management is proposed which abstracts citizen technologists from underlying technicalities. This allows to evolve present event processing systems and advances the democratization of event-driven application management in fog computing. Individual contributions of this thesis are summarized as follows: 1. A model, manifested in a geo-distributed system architecture, to semantically describe characteristics specific to node resources, event-driven applications and their management to blend application-centric and infrastructure-centric realms. 2. Concepts for geo-distributed deployment and operation of event-driven applications alongside strategies for flexible event stream management. 3. A methodology to support the evolution of event-driven applications including methods to dynamically reconfigure, migrate and offload individual event-driven processing services at run-time. The contributions are introduced, applied and evaluated along two scenarios from the manufacturing and logistics domain

    Data Parallel C++

    Get PDF
    Learn how to accelerate C++ programs using data parallelism. This open access book enables C++ programmers to be at the forefront of this exciting and important new development that is helping to push computing to new levels. It is full of practical advice, detailed explanations, and code examples to illustrate key topics. Data parallelism in C++ enables access to parallel resources in a modern heterogeneous system, freeing you from being locked into any particular computing device. Now a single C++ application can use any combination of devices—including GPUs, CPUs, FPGAs and AI ASICs—that are suitable to the problems at hand. This book begins by introducing data parallelism and foundational topics for effective use of the SYCL standard from the Khronos Group and Data Parallel C++ (DPC++), the open source compiler used in this book. Later chapters cover advanced topics including error handling, hardware-specific programming, communication and synchronization, and memory model considerations. Data Parallel C++ provides you with everything needed to use SYCL for programming heterogeneous systems. What You'll Learn Accelerate C++ programs using data-parallel programming Target multiple device types (e.g. CPU, GPU, FPGA) Use SYCL and SYCL compilers Connect with computing’s heterogeneous future via Intel’s oneAPI initiative Who This Book Is For Those new data-parallel programming and computer programmers interested in data-parallel programming using C++

    Data Parallel C++

    Get PDF
    Learn how to accelerate C++ programs using data parallelism. This open access book enables C++ programmers to be at the forefront of this exciting and important new development that is helping to push computing to new levels. It is full of practical advice, detailed explanations, and code examples to illustrate key topics. Data parallelism in C++ enables access to parallel resources in a modern heterogeneous system, freeing you from being locked into any particular computing device. Now a single C++ application can use any combination of devices—including GPUs, CPUs, FPGAs and AI ASICs—that are suitable to the problems at hand. This book begins by introducing data parallelism and foundational topics for effective use of the SYCL standard from the Khronos Group and Data Parallel C++ (DPC++), the open source compiler used in this book. Later chapters cover advanced topics including error handling, hardware-specific programming, communication and synchronization, and memory model considerations. Data Parallel C++ provides you with everything needed to use SYCL for programming heterogeneous systems. What You'll Learn Accelerate C++ programs using data-parallel programming Target multiple device types (e.g. CPU, GPU, FPGA) Use SYCL and SYCL compilers Connect with computing’s heterogeneous future via Intel’s oneAPI initiative Who This Book Is For Those new data-parallel programming and computer programmers interested in data-parallel programming using C++

    Comparison of OTA update frameworks for Linux based IoT devices

    Get PDF
    Remotely updating IoT devices introduces several issues concerning the device and the delivery of updates. The devices should be able to recover from failed updates to a functional state. Delivery of updates should scale with the size of the fleet in addition to the possibility of monitoring the state of the devices. The complexity of the updates has an effect on how difficult these previous issues are to overcome. Only updating parts of the software is simpler and possible to implement in a way where the integrity of the operating system is not endangered. Updating the entire operating system makes it possible to deliver more overarching updates remotely, but failures in installation can endanger the integrity of the system. This paper introduces an use case based on an existing IoT system and compares three update frameworks, SWUpdate, Mender and BalenaOS, for their suitability for providing remote updates. The comparison is first done based on relevant documentation and other literature and by further examining one with a proof of concept style prototype. This work only evaluates the frameworks in prototype use cases. In production use all of them would require additional work to integrate with the required security features. Of the three frameworks, BalenaOS provides most of the required features while avoiding some of the shortcomings of the others. However some of the advantages it has over other solutions come with significant vendor lock-in. Additionally getting started with prototyping systems with BalenaOS is somewhat difficult, if additional libraries have to be added to the root operating system

    Contributions to Edge Computing

    Get PDF
    Efforts related to Internet of Things (IoT), Cyber-Physical Systems (CPS), Machine to Machine (M2M) technologies, Industrial Internet, and Smart Cities aim to improve society through the coordination of distributed devices and analysis of resulting data. By the year 2020 there will be an estimated 50 billion network connected devices globally and 43 trillion gigabytes of electronic data. Current practices of moving data directly from end-devices to remote and potentially distant cloud computing services will not be sufficient to manage future device and data growth. Edge Computing is the migration of computational functionality to sources of data generation. The importance of edge computing increases with the size and complexity of devices and resulting data. In addition, the coordination of global edge-to-edge communications, shared resources, high-level application scheduling, monitoring, measurement, and Quality of Service (QoS) enforcement will be critical to address the rapid growth of connected devices and associated data. We present a new distributed agent-based framework designed to address the challenges of edge computing. This actor-model framework implementation is designed to manage large numbers of geographically distributed services, comprised from heterogeneous resources and communication protocols, in support of low-latency real-time streaming applications. As part of this framework, an application description language was developed and implemented. Using the application description language a number of high-order management modules were implemented including solutions for resource and workload comparison, performance observation, scheduling, and provisioning. A number of hypothetical and real-world use cases are described to support the framework implementation
    • …
    corecore