550 research outputs found

    Intel-oneAPI para Computaciรณn Heterogรฉnea

    Get PDF
    Trabajo de Fin de Grado en Ingenierรญa Informรกtica, Facultad de Informรกtica UCM, Departamento de Departamento de Arquitectura de Computadores y Automรกtica, Curso 2020/2021"oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architecturesโ€”for faster application performance, more productivity, and greater innovation." -www.oneapi.com The Intel DPC++ Compatibility Tool is a component of the Intel oneAPI base toolkit. This tool automatically transforms CUDA code into Data Parallel C++ (DPC++) assisting in the migration process. This project consists of an analysis of the DPC++ Compatibility Tool, considering the manual intervention required and the problems encountered while migrating the Rodinia benchmarks. And a comparative study of the performance obtained by the migrated code."oneAPI es un modelo de programaciรณn unificado, abierto y basado en estรกndares, que ofrece una experiencia de desarrollador comรบn en todas las arquitecturas de aceleradores, para un rendimiento de aplicaciones mรกs rรกpido, mรกs productividad y una mayor innovaciรณn." -www.oneapi.com La herramienta de compatibilidad DPC++ de Intel es un componente del oneAPI Base Toolkit. esta herramienta transforma automรกticamente cรณdigo CUDA en Data Parallel C++ (DPC++) ayudando en el proceso de migraciรณn. Este proyecto consiste en un anรกlisis de la herramienta de compatibilidad DPC++, considerando la intervenciรณn manual requerida y los problemas encontrados al migrar los benchmarks de Rodinia. Y un estudio comparativo del rendimiento obtenido por el cรณdigo migrado.Depto. de Arquitectura de Computadores y AutomรกticaFac. de InformรกticaTRUEunpu

    ์—ฃ์ง€ ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์„ ์œ„ํ•œ ์—ฐ์‚ฐ ์˜คํ”„๋กœ๋”ฉ ์‹œ์Šคํ…œ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€,2020. 2. ๋ฌธ์ˆ˜๋ฌต.The purpose of my dissertation is to build lightweight edge computing systems which provide seamless offloading services even when users move across multiple edge servers. I focused on two specific application domains: 1) web applications and 2) DNN applications. I propose an edge computing system which offload computations from web-supported devices to edge servers. The proposed system exploits the portability of web apps, i.e., distributed as source code and runnable without installation, when migrating the execution state of web apps. This significantly reduces the complexity of state migration, allowing a web app to migrate within a few seconds. Also, the proposed system supports offloading of webassembly, a standard low-level instruction format for web apps, having achieved up to 8.4x speedup compared to offloading of pure JavaScript codes. I also propose incremental offloading of neural network (IONN), which simultaneously offloads DNN execution while deploying a DNN model, thus reducing the overhead of DNN model deployment. Also, I extended IONN to support large-scale edge server environments by proactively migrating DNN layers to edge servers where mobile users are predicted to visit. Simulation with open-source mobility dataset showed that the proposed system could significantly reduce the overhead of deploying a DNN model.๋ณธ ๋…ผ๋ฌธ์˜ ๋ชฉ์ ์€ ์‚ฌ์šฉ์ž๊ฐ€ ์ด๋™ํ•˜๋Š” ๋™์•ˆ์—๋„ ์›ํ™œํ•œ ์—ฐ์‚ฐ ์˜คํ”„๋กœ๋”ฉ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒฝ๋Ÿ‰ ์—ฃ์ง€ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์›น ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ ์ธ๊ณต์‹ ๊ฒฝ๋ง (DNN: Deep Neural Network) ์ด๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋„๋ฉ”์ธ์—์„œ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ฒซ์งธ, ์›น ์ง€์› ์žฅ์น˜์—์„œ ์—ฃ์ง€ ์„œ๋ฒ„๋กœ ์—ฐ์‚ฐ์„ ์˜คํ”„๋กœ๋“œํ•˜๋Š” ์—ฃ์ง€ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ œ์•ˆ๋œ ์‹œ์Šคํ…œ์€ ์›น ์•ฑ์˜ ์‹คํ–‰ ์ƒํƒœ๋ฅผ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ํ•  ๋•Œ ์›น ์•ฑ์˜ ๋†’์€ ์ด์‹์„ฑ(์†Œ์Šค ์ฝ”๋“œ๋กœ ๋ฐฐํฌ๋˜๊ณ  ์„ค์น˜ํ•˜์ง€ ์•Š๊ณ  ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Œ)์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ƒํƒœ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์˜ ๋ณต์žก์„ฑ์ด ํฌ๊ฒŒ ์ค„์—ฌ์„œ ์›น ์•ฑ์ด ๋ช‡ ์ดˆ ๋‚ด์— ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ œ์•ˆ๋œ ์‹œ์Šคํ…œ์€ ์›น ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์œ„ํ•œ ํ‘œ์ค€ ์ €์ˆ˜์ค€ ์ธ์ŠคํŠธ๋Ÿญ์…˜์ธ ์›น ์–ด์…ˆ๋ธ”๋ฆฌ ์˜คํ”„๋กœ๋“œ๋ฅผ ์ง€์›ํ•˜์—ฌ ์ˆœ์ˆ˜ํ•œ JavaScript ์ฝ”๋“œ ์˜คํ”„๋กœ๋“œ์™€ ๋น„๊ตํ•˜์—ฌ ์ตœ๋Œ€ 8.4 ๋ฐฐ์˜ ์†๋„ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‘˜์งธ, DNN ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์—ฃ์ง€ ์„œ๋ฒ„์— ๋ฐฐํฌํ•  ๋•Œ, DNN ๋ชจ๋ธ์„ ์ „์†กํ•˜๋Š” ๋™์•ˆ DNN ์—ฐ์‚ฐ์„ ์˜คํ”„๋กœ๋“œ ํ•˜์—ฌ ๋น ๋ฅด๊ฒŒ ์„ฑ๋Šฅํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ์ ์ง„์  ์˜คํ”„๋กœ๋“œ ๋ฐฉ์‹์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ชจ๋ฐ”์ผ ์‚ฌ์šฉ์ž๊ฐ€ ๋ฐฉ๋ฌธ ํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋˜๋Š” ์—ฃ์ง€ ์„œ๋ฒ„๋กœ DNN ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์ „์— ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ํ•˜์—ฌ ์ฝœ๋“œ ์Šคํƒ€ํŠธ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆ ํ•ฉ๋‹ˆ๋‹ค. ์˜คํ”ˆ ์†Œ์Šค ๋ชจ๋นŒ๋ฆฌํ‹ฐ ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ, DNN ๋ชจ๋ธ์„ ๋ฐฐํฌํ•˜๋ฉด์„œ ๋ฐœ์ƒํ•˜๋Š” ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์ œ์•ˆ ํ•˜๋Š” ๋ฐฉ์‹์ด ํฌ๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.Chapter 1. Introduction 1 1.1 Offloading Web App Computations to Edge Servers 1 1.2 Offloading DNN Computations to Edge Servers 3 Chapter 2. Seamless Offloading of Web App Computations 7 2.1 Motivation: Computation-Intensive Web Apps 7 2.2 Mobile Web Worker System 10 2.2.1 Review of HTML5 Web Worker 10 2.2.2 Mobile Web Worker System 11 2.3 Migrating Web Worker 14 2.3.1 Runtime State of Web Worker 15 2.3.2 Snapshot of Mobile Web Worker 16 2.3.3 End-to-End Migration Process 21 2.4 Evaluation 22 2.4.1 Experimental Environment 22 2.4.2 Migration Performance 24 2.4.3 Application Execution Performance 27 Chapter 3. IONN: Incremental Offloading of Neural Network Computations 30 3.1 Motivation: Overhead of Deploying DNN Model 30 3.2 Background 32 3.2.1 Deep Neural Network 33 3.2.2 Offloading of DNN Computations 33 3.3 IONN For DNN Edge Computing 35 3.4 DNN Partitioning 37 3.4.1 Neural Network (NN) Execution Graph 38 3.4.2 Partitioning Algorithm 40 3.4.3 Handling DNNs with Multiple Paths. 43 3.5 Evaluation 45 3.5.1 Experimental Environment 45 3.5.2 DNN Query Performance 46 3.5.3 Accuracy of Prediction Functions 48 3.5.4 Energy Consumption. 49 Chapter 4. PerDNN: Offloading DNN Computations to Pervasive Edge Servers 51 4.1 Motivation: Cold Start Issue 51 4.2 Proposed Offloading System: PerDNN 52 4.2.1 Edge Server Environment 53 4.2.2 Overall Architecture 54 4.2.3 GPU-aware DNN Partitioning 56 4.2.4 Mobility Prediction 59 4.3 Evaluation 63 4.3.1 Performance Gain of Single Client 64 4.3.2 Large-Scale Simulation 65 Chapter 5. RelatedWorks 73 Chapter 6. Conclusion. 78 Chapter 5. RelatedWorks 73 Chapter 6. Conclusion 78 Bibliography 80Docto

    HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC

    Full text link
    This paper presents HALO 1.0, an open-ended extensible multi-agent software framework that implements a set of proposed hardware-agnostic accelerator orchestration (HALO) principles. HALO implements a novel compute-centric message passing interface (C^2MPI) specification for enabling the performance-portable execution of a hardware-agnostic host application across heterogeneous accelerators. The experiment results of evaluating eight widely used HPC subroutines based on Intel Xeon E5-2620 CPUs, Intel Arria 10 GX FPGAs, and NVIDIA GeForce RTX 2080 Ti GPUs show that HALO 1.0 allows for a unified control flow for host programs to run across all the computing devices with a consistently top performance portability score, which is up to five orders of magnitude higher than the OpenCL-based solution.Comment: 21 page

    Efficient hardware implementations of bio-inspired networks

    Get PDF
    The human brain, with its massive computational capability and power efficiency in small form factor, continues to inspire the ultimate goal of building machines that can perform tasks without being explicitly programmed. In an effort to mimic the natural information processing paradigms observed in the brain, several neural network generations have been proposed over the years. Among the neural networks inspired by biology, second-generation Artificial or Deep Neural Networks (ANNs/DNNs) use memoryless neuron models and have shown unprecedented success surpassing humans in a wide variety of tasks. Unlike ANNs, third-generation Spiking Neural Networks (SNNs) closely mimic biological neurons by operating on discrete and sparse events in time called spikes, which are obtained by the time integration of previous inputs. Implementation of data-intensive neural network models on computers based on the von Neumann architecture is mainly limited by the continuous data transfer between the physically separated memory and processing units. Hence, non-von Neumann architectural solutions are essential for processing these memory-intensive bio-inspired neural networks in an energy-efficient manner. Among the non-von Neumann architectures, implementations employing non-volatile memory (NVM) devices are most promising due to their compact size and low operating power. However, it is non-trivial to integrate these nanoscale devices on conventional computational substrates due to their non-idealities, such as limited dynamic range, finite bit resolution, programming variability, etc. This dissertation demonstrates the architectural and algorithmic optimizations of implementing bio-inspired neural networks using emerging nanoscale devices. The first half of the dissertation focuses on the hardware acceleration of DNN implementations. A 4-layer stochastic DNN in a crossbar architecture with memristive devices at the cross point is analyzed for accelerating DNN training. This network is then used as a baseline to explore the impact of experimental memristive device behavior on network performance. Programming variability is found to have a critical role in determining network performance compared to other non-ideal characteristics of the devices. In addition, noise-resilient inference engines are demonstrated using stochastic memristive DNNs with 100 bits for stochastic encoding during inference and 10 bits for the expensive training. The second half of the dissertation focuses on a novel probabilistic framework for SNNs using the Generalized Linear Model (GLM) neurons for capturing neuronal behavior. This work demonstrates that probabilistic SNNs have comparable perform-ance against equivalent ANNs on two popular benchmarks - handwritten-digit classification and human activity recognition. Considering the potential of SNNs in energy-efficient implementations, a hardware accelerator for inference is proposed, termed as Spintronic Accelerator for Probabilistic SNNs (SpinAPS). The learning algorithm is optimized for a hardware friendly implementation and uses first-to-spike decoding scheme for low latency inference. With binary spintronic synapses and digital CMOS logic neurons for computations, SpinAPS achieves a performance improvement of 4x in terms of GSOPS/W/mm2^2 when compared to a conventional SRAM-based design. Collectively, this work demonstrates the potential of emerging memory technologies in building energy-efficient hardware architectures for deep and spiking neural networks. The design strategies adopted in this work can be extended to other spike and non-spike based systems for building embedded solutions having power/energy constraints
    • โ€ฆ
    corecore