15 research outputs found

    New Techniques for On-line Testing and Fault Mitigation in GPUs

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    A survey of near-data processing architectures for neural networks

    Get PDF
    Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key bottlenecks in the design of computing systems, the interest in unconventional approaches such as Near-Data Processing (NDP), machine learning, and especially neural network (NN)-based accelerators has grown significantly. Emerging memory technologies, such as ReRAM and 3D-stacked, are promising for efficiently architecting NDP-based accelerators for NN due to their capabilities to work as both high-density/low-energy storage and in/near-memory computation/search engine. In this paper, we present a survey of techniques for designing NDP architectures for NN. By classifying the techniques based on the memory technology employed, we underscore their similarities and differences. Finally, we discuss open challenges and future perspectives that need to be explored in order to improve and extend the adoption of NDP architectures for future computing platforms. This paper will be valuable for computer architects, chip designers, and researchers in the area of machine learning.This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, and the ICREA Academia program.Peer ReviewedPostprint (published version

    Design And Analysis Of Memory Management Techniques For Next-Generation Gpus

    Get PDF
    Graphics Processing Unit (GPU)-based architectures have become the default accelerator choice for a large number of data-parallel applications because they are able to provide high compute throughput at a competitive power budget. Unlike CPUs which typically have limited multi-threading capability, GPUs execute large numbers of threads concurrently to achieve high thread-level parallelism (TLP). While the computation of each thread requires its corresponding data to be loaded from or stored to the memory, the key to supporting the high TLP of GPUs lies in the high bandwidth provided by the GPU memory system. However, with the continuous scaling of GPUs, the challenges of designing an efficient GPU memory system have become two-fold. On one hand, to keep the growing compute and memory resources highly utilized, co-locating two or more kernels in the GPU has become an inevitable trend. One of the major roadblocks in achieving the maximum benefits of multi-application execution is the difficulty to design mechanisms that can efficiently and fairly manage the application interference in the shared caches and the main memory. On the other hand, to maintain the continuous scaling of GPU performance, the increasing energy consumption of the memory system has become a major problem because of its limited power budget. This limitation of the GPU memory energy restricts its maximum theoretical bandwidth and in turn limits the overall throughput. To address the aforementioned challenges, this dissertation proposes three different approaches. First, this dissertation shows that high efficiency and fairness can be achieved for GPU multi-programming with novel TLP management techniques. We propose a new metric, effective bandwidth (EB), to accurately estimate the shared resources in the GPU memory hierarchy. Meanwhile, we propose pattern-based searching scheme (PBS) that can quickly and accurately achieve efficiency or fairness via managing the TLP of each application. Second, to reduce data movement and improve GPU throughput, this dissertation develops Address-Stride Assisted Approximate Value Predictor (ASAP) for GPUs. We show that by utilizing address stride and value stride correlation present in GPGPU applications, significant data movement reduction and throughput improvement can be achieved at a much lower application quality loss and hardware overhead. ASAP achieves this by predicting load values if it detects strides in their corresponding addresses. Third, this dissertation shows that GPU memory energy can be significantly reduced by utilizing novel memory scheduling techniques. We propose a lazy memory scheduler which significantly improves the row buffer locality of GPU memory by leveraging the latency and error tolerance of GPGPU applications. Finally, our new work targets data movement reduction with flexible data precisions. We present initial results to motivate novel data types and architectural support to dynamically reduce the data size transferred per each memory operation. Altogether, this dissertation develops several innovative techniques to improve the GPU memory system efficiency, which are necessary for enabling the development of next-generation GPUs

    Adaptive Intelligent Systems for Extreme Environments

    Get PDF
    As embedded processors become powerful, a growing number of embedded systems equipped with artificial intelligence (AI) algorithms have been used in radiation environments to perform routine tasks to reduce radiation risk for human workers. On the one hand, because of the low price, commercial-off-the-shelf devices and components are becoming increasingly popular to make such tasks more affordable. Meanwhile, it also presents new challenges to improve radiation tolerance, the capability to conduct multiple AI tasks and deliver the power efficiency of the embedded systems in harsh environments. There are three aspects of research work that have been completed in this thesis: 1) a fast simulation method for analysis of single event effect (SEE) in integrated circuits, 2) a self-refresh scheme to detect and correct bit-flips in random access memory (RAM), and 3) a hardware AI system with dynamic hardware accelerators and AI models for increasing flexibility and efficiency. The variances of the physical parameters in practical implementation, such as the nature of the particle, linear energy transfer and circuit characteristics, may have a large impact on the final simulation accuracy, which will significantly increase the complexity and cost in the workflow of the transistor level simulation for large-scale circuits. It makes it difficult to conduct SEE simulations for large-scale circuits. Therefore, in the first research work, a new SEE simulation scheme is proposed, to offer a fast and cost-efficient method to evaluate and compare the performance of large-scale circuits which subject to the effects of radiation particles. The advantages of transistor and hardware description language (HDL) simulations are combined here to produce accurate SEE digital error models for rapid error analysis in large-scale circuits. Under the proposed scheme, time-consuming back-end steps are skipped. The SEE analysis for large-scale circuits can be completed in just few hours. In high-radiation environments, bit-flips in RAMs can not only occur but may also be accumulated. However, the typical error mitigation methods can not handle high error rates with low hardware costs. In the second work, an adaptive scheme combined with correcting codes and refreshing techniques is proposed, to correct errors and mitigate error accumulation in extreme radiation environments. This scheme is proposed to continuously refresh the data in RAMs so that errors can not be accumulated. Furthermore, because the proposed design can share the same ports with the user module without changing the timing sequence, it thus can be easily applied to the system where the hardware modules are designed with fixed reading and writing latency. It is a challenge to implement intelligent systems with constrained hardware resources. In the third work, an adaptive hardware resource management system for multiple AI tasks in harsh environments was designed. Inspired by the “refreshing” concept in the second work, we utilise a key feature of FPGAs, partial reconfiguration, to improve the reliability and efficiency of the AI system. More importantly, this feature provides the capability to manage the hardware resources for deep learning acceleration. In the proposed design, the on-chip hardware resources are dynamically managed to improve the flexibility, performance and power efficiency of deep learning inference systems. The deep learning units provided by Xilinx are used to perform multiple AI tasks simultaneously, and the experiments show significant improvements in power efficiency for a wide range of scenarios with different workloads. To further improve the performance of the system, the concept of reconfiguration was further extended. As a result, an adaptive DL software framework was designed. This framework can provide a significant level of adaptability support for various deep learning algorithms on an FPGA-based edge computing platform. To meet the specific accuracy and latency requirements derived from the running applications and operating environments, the platform may dynamically update hardware and software (e.g., processing pipelines) to achieve better cost, power, and processing efficiency compared to the static system

    Understanding Quantum Technologies 2022

    Full text link
    Understanding Quantum Technologies 2022 is a creative-commons ebook that provides a unique 360 degrees overview of quantum technologies from science and technology to geopolitical and societal issues. It covers quantum physics history, quantum physics 101, gate-based quantum computing, quantum computing engineering (including quantum error corrections and quantum computing energetics), quantum computing hardware (all qubit types, including quantum annealing and quantum simulation paradigms, history, science, research, implementation and vendors), quantum enabling technologies (cryogenics, control electronics, photonics, components fabs, raw materials), quantum computing algorithms, software development tools and use cases, unconventional computing (potential alternatives to quantum and classical computing), quantum telecommunications and cryptography, quantum sensing, quantum technologies around the world, quantum technologies societal impact and even quantum fake sciences. The main audience are computer science engineers, developers and IT specialists as well as quantum scientists and students who want to acquire a global view of how quantum technologies work, and particularly quantum computing. This version is an extensive update to the 2021 edition published in October 2021.Comment: 1132 pages, 920 figures, Letter forma
    corecore