Search CORE

2 research outputs found

Flexibly programmable opto-electronic analogic CNN computer (POAC) implementation applying an efficient, unconventional optical correlator architecture

Author: Ayoub Ahmed
Orzó László
Roska Tamás
Tőkés Szabolcs
Publication venue
Publication date: 01/01/2003
Field of study

Diseño e implementación sobre hardware reconfigurable de una arquitectura para la emulación en tiempo real de redes neuronales celulares

Author: Martínez Álvarez José Javier
Publication venue: 'Universidad Politecnica de Cartagena'
Publication date: 01/01/2012
Field of study

[SPA] En esta Tesis se propone el diseño y la implementación sobre hardware reconfigurable de una arquitectura para la emulación en tiempo real de redes neuronales celulares (CNN). El proceso de diseño de la arquitectura, comienza con el planteamiento de diferentes métodos de discretización del modelo continuo original de la red CNN. A partir de dichos métodos se obtienen distintas aproximaciones que son simuladas y comparadas entre sí con el fin de comprobar su funcionalidad y determinar cuál de ellas proporciona los mejores resultados con el menor coste computacional. La aproximación con mejores prestaciones es elegida para desarrollar el algoritmo de cómputo que describe la arquitectura hardware de la red CNN. La metodología de desarrollo utilizada, explora diferentes alternativas para optimizar la arquitectura CNN desde el punto de vista de su implementación hardware sobre FPGAs. A partir de la paralelización y adaptación del algoritmo de cómputo se desarrollan dos arquitecturas hardware diferentes denominadas Carthago y Carthagonova. Estas arquitecturas describen el funcionamiento de una Celda CNN, desenrollada en Etapas, que permite emular secuencialmente el procesamiento realizado por las redes CNN. La principal característica de estas arquitecturas es la capacidad que tienen para procesar la información en flujo de datos y en tiempo real. Las soluciones propuestas tiene como principal objetivo conseguir el mejor equilibrio entre la velocidad de procesamiento y el consumo de recursos hardware de la FPGA, así como evitar el uso de dispositivos de memoria externa que reducen la velocidad de procesamiento del sistema e incrementan su tamaño. Se proponen diferentes alternativas para implementar las arquitecturas sobre dispositivos FPGAs. Una de ellas consiste en utilizar una técnica de sincronización self-timed, eficiente en área-tiempo, que es definida mediante un lenguaje de descripción hardware tradicional (VHDL), instanciando primitivas de bajo nivel y realizando el emplazamiento de los componentes de forma manual. Otra alternativa consiste en una descripción en VHDL estructural a nivel RTL y sincronización convencional, donde los componentes self-timed son sustituidos por componentes estándar. Se propone además la implementación de una de las arquitecturas sobre un computador reconfigurable de altas prestaciones (HPRC), compuesto por un microprocesador de propósito general y un coprocesador basado en FPGAs, encargado de acelerar la ejecución de los algoritmos mediante hardware. El particionamiento hardware/software y el proceso de co-diseño se realizan usando las herramientas de desarrollo a nivel de sistema (ESL) de Impulse Accelerated Technologies (Impulse-C) y la plataforma HPRC DS1002 de DRC Computers. Los principales resultados obtenidos de las diferentes implementaciones son mostrados con el fin de demostrar la funcionalidad de las arquitecturas y analizar sus principales prestaciones. Las diferentes combinaciones consideradas, entre técnicas de implementación y las arquitecturas propuestas, muestran que la arquitectura Carthagonova, implementada a nivel estructural, presenta importantes ventajas a considerar. En primer lugar, la arquitectura facilita la emulación de redes CNN complejas, compuestas por cientos de miles de millones de neuronas, sobre sistemas embebidos basados en FPGAs. En segundo lugar, el excelente compromiso alcanzado entre velocidad de procesamiento y consumo de recursos hardware hace que sea una interesante solución a considerar frente a otras alternativas de la literatura. Finalmente, la versatilidad y las prestaciones de la arquitectura diseñada permiten dar soporte al desarrollo de sistemas de procesamiento de vídeo en tiempo real y al diseño de aplicaciones basadas en modelos neuronales bioinspirados. La arquitectura CNN propuesta es utilizada para desarrollar un modelo artificial de la primera sinapsis de la retina, incorporando algunas de las principales características de los circuitos neuronales considerados. El modelo está basado en los campos receptores de las células bipolares y su objetivo es emular, mediante hardware reconfigurable, el procesamiento espacial básico realizado por la retina. Al igual que ocurre en la primera sinapsis de la retina, se observa que el modelo artificial propuesto lleva a cabo la detección del contraste y la discriminación visual de detalles en función de la influencia de los factores de convergencia y de inhibición lateral de los circuitos neuronales implementados. Finalmente, se propone el diseño y la implementación de un sistema de cómputo distribuido, basado en múltiples FPGAs, que permite el desarrollo de aplicaciones embebidas de procesamiento de vídeo en tiempo real con redes CNN multi-capa (ML-CNNs) complejas y de gran tamaño. El sistema procesa la información de vídeo en flujo de datos (en modo progresivo) y proporciona una salida de vídeo estándar compatible con el formato VGA industria. [ENG] This thesis proposes the design and development of a hardware architecture for real-time emulation of non-linear multilayer Cellular Neural Networks (CNN). This approach is focused on CNN implementation on reconfigurable hardware architectures. The architecture design begins from a discrete model obtained from different transformations made to the original continuous model of CNN. Each discrete approach is simulated and compared with the rest of approaches in order to verify their functionality and to find the approach which best emulates the continuous model at minimum computational cost. The best discrete model found is then used to develop the hardware architecture of the CNN. The development methodology used, explores different alternatives to optimize the architecture from the point of view of its hardware implementation on FPGAs. The architectures Carthago and Carthagonova are developed from the hardware adaptation and parallelization of the sequential algorithm that describes the functionality of the selected CNN discrete model. These architectures are based on an unrolling cell which is employed to emulate CNNs with large number of neurons. The key characteristic of these architectures is their capability to process information in real time in a sequential manner. The proposed solution aims to find a suitable tradeoff between area and speed, reducing the use of hardware resources on the FPGA and avoiding the use of external memory devices which make slower the processing rate and higher size and cost. We propose different solutions to the internal implementation of both architectures on an FPGA. The first one is a novel self-timed architecture, areatime efficient. It is described using traditional Hardware Description Languages (HDL) from low level hardware primitives instantiation and manual placement. The second one consists in a high level description in structural VHDL using conventional synchronization, instead of self-timed blocks. We also propose an implementation architecture that makes use of a High Performance Reconfigurable Computer (HPRC), combining general purpose microprocessors with custom hardware accelerators based on FPGAs, to speed up execution time. The hardware/ software partitioning and co-design process are carried out using high level design tools. The architecture Carthago is implemented using Electronic System Level (ESL) tools from Impulse Accelerated Technologies and the DS1002 HPRC platform from DRC Computers. The most relevant results obtained from different implementations are shown in order to verify the functionality of the proposed neural hardware architectures and to analyze their performance. The best combination of architecture and implementation model, the Carthagonova-structural, presents some important advantages. Firstly, the architecture is developed to help emulating highperformance discrete CNNs with hundreds or millions of neurons on embedded FPGA-based systems. Secondly, due to its balanced trade-off between speed and area, this architecture is an interesting alternative to consider among others in the literature. Finally, its versatility facilitates the export of neural hardware architecture to applications such as signal and image processing, implementation of the neurological models inspired on human systems, etc. The hardware architecture proposed is used to build a CNN-based model of the fist synapse of the retina, which incorporates the main neural circuits found in the different retinal regions. The aim of this bioinspired model is to implement the basic spatial processing of the retina in reconfigurable hardware. The model is based on the bipolar cells receptive fields and mimics the retinal architecture achieving its processing capabilities. As occurs in the processing of first synapse of the retina, it is observed that contrast detection and detail resolution are influenced by the convergence factor of neurons and by lateral inhibition, which are specific parameters of each neural circuit. We also propose the design and implementation of an embedded system based on multiple FPGAs that can be used to process real time video streams for applications that require the use of large Multi-Layer CNNs (ML-CNNs). The system processes video in progressive mode and provides a standard VGA output format. The main features of the system are determined by using a distributed computing architecture, based on Processing Modules (PM), which facilitates system expansion and adaptation to new applications. Several FPGA-based processing modules can be cascaded together with a video acquisition stage and an output interface to a frame grabber for video output storage, all sharing a common communication interface. Each PM is composed by an FPGA board that can hold one or more CNN layers. The total computing capacity of the system is determined by the number of MP used and the amount of resources available in the FPGAs. The pre-verified CNN components, the modular architecture, and the expandable hardware platform provide an excellent workbench for fast and confident developing of CNN applications, based on traditional cloned templates, but also time-variant and space-variant templates.Universidad Politécnica de Cartagen

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Digital de la Universidad Politécnica de Cartagena