210 research outputs found

    Processor Enhancements for Media Streaming Applications

    Get PDF
    The development of more processing demanding applications on the Internet (video broadcasting) on one hand and the popularity of recent devices at the user level (digital cameras, wireless videophones, ...) on the other hand introduce challenges at several levels. Today, such devices present processing capabilities and bandwidth settings that are inefficient to manage scalable QoS requirements in a typical media delivery framework. In this paper, we present an impact study of such a scalable data representation optimized for QoS (Matching Pursuit 3D algorithms) on processor architectures to achieve the best performance and power efficiency. A review of state of the art techniques for processor architecture enhancement let us expect promising opportunities from the latest developments in the reconfigurable computing research field. We present here the first design steps of an efficient reconfigurable coprocessor especially designed to cope with future video delivery and multimedia processing requirements. Architecture perspectives are proposed with respect to low development cost constraints, backward compatibilty and easy coprocessor usage using an original strategy based on a hardware/software codesign methodolog

    A RISC-V Matrix Multiplier Using Systolic Arrays

    Get PDF
    Many modern day applications can be solved with the usage of machine learning, which involves training a computer to learn on large amounts of data without direct programmer guidance. Conventional computers typically use normal general purpose central processing units, though more specialized tasks may take advantage of more parallel hardware such as graphics processing units. In the pursuit of increased performance to facilitate increasingly more complex machine learning models, researchers in both academia and industry look towards field-programmable gate arrays and application specific integrated circuits for their needs. Various implementations, both theoretical and practical, exist across a wide variety of designs. A custom design, using systolic arrays and built on the existing RISC-V Instruction Set Architecture, will be used to accelerate matrix calculations, with example performance on the MNIST dataset measured

    Memory and information processing in neuromorphic systems

    Full text link
    A striking difference between brain-inspired neuromorphic processors and current von Neumann processors architectures is the way in which memory and processing is organized. As Information and Communication Technologies continue to address the need for increased computational power through the increase of cores within a digital processor, neuromorphic engineers and scientists can complement this need by building processor architectures where memory is distributed with the processing. In this paper we present a survey of brain-inspired processor architectures that support models of cortical networks and deep neural networks. These architectures range from serial clocked implementations of multi-neuron systems to massively parallel asynchronous ones and from purely digital systems to mixed analog/digital systems which implement more biological-like models of neurons and synapses together with a suite of adaptation and learning mechanisms analogous to the ones found in biological nervous systems. We describe the advantages of the different approaches being pursued and present the challenges that need to be addressed for building artificial neural processing systems that can display the richness of behaviors seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed neuromorphic computing platforms and system

    Design and test of a neural microprocessor

    Get PDF
    En aquest projecte, es dissenya un microprocessador neuronal per ser implementat en FPGAs. Aquesta tecnologia consisteix en un processador softcore basat en RISC-V descrit amb SystemVerilog que s'utilitza per controlar un coprocessador encarregat d'executar una xarxa neuronal spiking amb propagació directa descrita amb VHDL. El control es fa amb senyals que es generen a partir d'instruccions SIMD personalitzades definides en una extensió del conjunt d’instruccions RSIC-V. Per fer-ho, es modifica el processador de manera que pugui detectar i descodificar les noves instruccions emmagatzemades a la seva memòria de programa. Per facilitar la tasca de definir el contingut de la memòria del programa, s'utilitza un codi escrit en C i es desenvolupa un conjunt d'instruccions C personalitzades. Aquestes instruccions es basen en l'ús de macros i inline assembly, i la seva finalitat és facilitar i permetre l'ús de les instruccions personalitzades RISC-V en el codi d'alt nivell. Per demostrar el correcte funcionament del projecte, se simula el microprocessador neuronal i després es prova a l'FPGA d'una placa de desenvolupament Nexys 4, amb el coprocessador implementat per resoldre el problema XOR. La implementació del coprocessador es replica amb C i s'executa a l'FPGA utilitzant només el processador predeterminat sense modificar. Finalment, els resultats s'analitzen i es comparen per determinar les compensacions entre els dos enfocaments en termes de temps d'execució, consum d'energia i espai utilitzat.En este proyecto, se diseña un microprocesador neuronal para su implementación en FPGAs. Esta tecnología consiste en un procesador softcore basado en RISC-V descrito con SystemVerilog que se utiliza para controlar a un coprocesador encargado de ejecutar una red neuronal spiking con propagación directa descrita con VHDL. El control se realiza con señales que se generan a partir de instrucciones SIMD personalizadas definidas en una extensión del conjunto de instrucciones RSIC-V. Para ello, se modifica el procesador de forma que pueda detectar y descodificar las nuevas instrucciones almacenadas en su memoria de programa. Para facilitar la tarea de definir el contenido de la memoria del programa, se utiliza un código escrito en C y se desarrolla un conjunto de instrucciones C personalizadas. Estas instrucciones se basan en el uso de macros e inline assembly, y su finalidad es facilitar y permitir el uso de las instrucciones personalizadas RISC-V en el código de alto nivel. Para demostrar el correcto funcionamiento del proyecto, se simula el microprocesador neuronal y después se prueba en la FPGA de una placa de desarrollo Nexys 4, con el coprocesador implementado para resolver el problema XOR. La implementación del coprocesador se replica con C y se ejecuta en la FPGA utilizando sólo el procesador predeterminado sin modifcar. Por último, los resultados se analizan y se comparan para determinar las compensaciones entre ambos enfoques en términos de tiempo de ejecución, consumo de energía y espacio utilizado.In this project, a neural microprocessor is designed to be implemented in FPGAs. This technology consists of a RISC-V-based soft processor described in SystemVerilog that is used to control a coprocessor in charge of executing a feedforward spiking neural network described in VHDL. The control is done with signals that are generated from custom-designed SIMD instructions defined in a RISC-V ISA extension. To do it, the processor is modified such that it can detect and decode the new instructions stored in its program memory. To facilitate the task of defining the program memory contents, a code written in C is used and a set of custom C instructions is developed. These instructions are based on the use of macros and inline assembly, and their purpose is to facilitate and allow the use of the RISC-V custom instructions in the high-level code. To demonstrate the correct operation of the project, the neural microprocessor is simulated and then tested on the FPGA of a Nexys 4 development board, with the coprocessor implemented for solving the XOR problem. The coprocessor implementation is replicated with C and executed in the FPGA using only the default processor without being modified. Finally, the results are analyzed and compared to determine the trade-offs between the two approaches in terms of execution time, power consumption, and utilized space

    Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

    Get PDF
    In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio
    corecore