440 research outputs found

    Multi-tier GPU virtualization for deep learning in cloud-edge systems

    Get PDF
    Accelerator virtualization offers several advantages in the context of cloud-edge computing. Relatively weak user devices can enhance performance when running workloads by accessing virtualized accelerators available on other resources in the cloud-edge continuum. However, cloud-edge systems are heterogeneous, often leading to compatibility issues arising from various hardware and software stacks present in the system. One mechanism to alleviate this issue is using containers for deploying workloads. Containers isolate applications and their dependencies and store them as images that can run on any device. In addition, user devices may move during the course of application execution, and thus mechanisms such as container migration are required to move running workloads from one resource to another in the network. Furthermore, an optimal destination will need to be determined when migrating between virtual accelerators. Scheduling and placement strategies are incorporated to choose the best possible location depending on the workload requirements. This paper presents AVEC , a framework for accelerator virtualization in cloud-edge computing. The AVEC framework enables the offloading of deep learning workloads for inference from weak user devices to computationally more powerful devices in a cloud-edge network. AVEC incorporates a mechanism that efficiently manages and schedules the virtualization of accelerators. It also supports migration between accelerators to enable stateless container migration. The experimental analysis highlights that AVEC can achieve up to 7x speedup by offloading applications to remote resources. Furthermore, AVEC features a low migration downtime that is less than 5 seconds.PostprintPeer reviewe

    On the Functional Test of Special Function Units in GPUs

    Get PDF
    The Graphics Processing Units (GPUs) usage has extended from graphic applications to others where their high computational power is exploited (e.g., to implement Artificial Intelligence algorithms). These complex applications usually need highly intensive computations based on floating-point transcendental functions. GPUs may efficiently compute these functions in hardware using ad hoc Special Function Units (SFUs). However, a permanent fault in such units could be very critical (e.g., in safety-critical automotive applications). Thus, test methodologies for SFUs are strictly required to achieve the target reliability and safety levels. In this work, we present a functional test method based on a Software-Based Self-Test (SBST) approach targeting the SFUs in GPUs. This method exploits different approaches to build a test program and applies several optimization strategies to exploit the GPU parallelism to speed up the test procedure and reduce the required memory. The effectiveness of this methodology was proven by resorting to an open-source GPU model (FlexGripPlus) compatible with NVIDIA GPUs. The experimental results show that the proposed technique achieves 90.75% of fault coverage and up to 94.26% of Testable Fault Coverage, reducing the required memory and test duration with respect to pseudorandom strategies proposed by other authors

    Toward Reliable, Secure, and Energy-Efficient Multi-Core System Design

    Get PDF
    Computer hardware researchers have perennially focussed on improving the performance of computers while stipulating the energy consumption under a strict budget. While several innovations over the years have led to high performance and energy efficient computers, more challenges have also emerged as a fallout. For example, smaller transistor devices in modern multi-core systems are afflicted with several reliability and security concerns, which were inconceivable even a decade ago. Tackling these bottlenecks happens to negatively impact the power and performance of the computers. This dissertation explores novel techniques to gracefully solve some of the pressing challenges of the modern computer design. Specifically, the proposed techniques improve the reliability of on-chip communication fabric under a high power supply noise, increase the energy-efficiency of low-power graphics processing units, and demonstrate an unprecedented security loophole of the low-power computing paradigm through rigorous hardware-based experiments

    Parallel prediction of radio propagation

    Get PDF
    Tese de mestrado. Engenharia Informática e Computação. Cranfield University. School of Engineering. 201

    MRI Data Processing Acceleration on GPU

    Get PDF
    Tato bakalářská práce byla vypracována v průběhu studijního pobytu na Universita della Svizzera italiana ve Švýcarsku. Identifikace trajektorií neuronových vláken uvnitř lidského mozku má velký význam v mnoha lékařských aplikacích, jako neurologická diagnostika, neuro-navigace, léčba epilepsie, chirurgické operace a tak dále. Za použití dat z MRI, metod postavených na Markovských řetězích a Monte Carlu mohou být možné trajektorie vypočítany a ty nejpravděpodobnější zobrazeny. Tyto informace o trajektoriích mohou sloužit jako vstup pro pokročilé metody lékařské diagnotiky a léčby. Vzhledem k obrovskému množství dat a velkého počtu iterací toto může být časově náročný proces. Za účely, jako jsou statistická analýza a/nebo porovnávání několika datových sad a/nebo pacientů, požadavky na výpočetní čas jsou enormní. Rychlejší diagnóza může také přinést nasazení léčby dříve. Nyní existuje jen velmi málo implementací softwaru pro neurální traktografii. Implementací softwaru pro pravděpodobnostní neurální traktografii je ještě méně. Nynější implementace, provádějící všechny operace postupně na CPU, jsou značně pomalé. Účelem této práce je poskytnout efektivní implementaci, která vvyužíva GPU. Za účelem implementace na GPU, je poskytnuto porovnaní technologíí CUDA a OpenCL.This BSc Thesis was performed during a study stay at the Universita della Svizzera italiana, Swiss. The identification of trajectories of neuron fibres within the human brain is of great importance in many medical applications as the neural diagnostics, neuronavigation, treatment of epilepsy, surgical removal of tumors and etc. By using diffusion MRI-data as input, and by employing Monte-Carlo like methods, possible trajectories are generated and the most likely ones can be visualized. These can serve as input for advanced medical diagnosis and treatments. Due to the huge amount of data to be analyzed and many iterations, this is a time consuming process. For the purposes such as statistical analysis and comparsion over several datasets or several patients, computational time requirements are enourmous. Faster diagnosis can improve routine throughput and provide earlier treatment of illness. At this time, there exists only a very few implementations of neural tractography sof tware. For probabilistic neural tractography is the list of software even thiner. Today's implementations using standard serial CPU execution suffer from high time consumption. The goal is to provide an efficient implementation which makes use of GPGPUs and exploits parallelism in the method. For the GPU implementation, a comparsion of CUDA and OpenCL technologies will be provided, using the more suitable one.
    corecore