103 research outputs found

    An end-user platform for FPGA-based design and rapid prototyping of feedforward artificial neural networks with on-chip backpropagation learning

    Get PDF
    The hardware implementation of an artificial neural network (ANN) using field-programmable gate arrays (FPGAs) is a research field that has attracted much interest and attention. With the developments made, the programmer is now forced to face various challenges, such as the need to master various complex hardware-software development platforms, hardware description languages, and advanced ANN knowledge. Moreover, such an implementation is very time consuming. To address these challenges, this paper presents a novel neural design methodology using a holistic modeling approach. Based on the end-user programming concept, the presented solution empowers end users by means of abstracting the low-level hardware functionalities, streamlining the FPGA design process and supporting rapid ANN prototyping. A case study of an ANN as a pattern recognition module of an artificial olfaction system trained to identify four coffee brands is presented. The recognition rate versus training data features and data representation was analyzed extensively

    A Novel Systolic Parallel Hardware Architecture for the FPGA Acceleration of Feedforward Neural Networks

    Get PDF
    New chips for machine learning applications appear, they are tuned for a specific topology, being efficient by using highly parallel designs at the cost of high power or large complex devices. However, the computational demands of deep neural networks require flexible and efficient hardware architectures able to fit different applications, neural network types, number of inputs, outputs, layers, and units in each layer, making the migration from software to hardware easy. This paper describes novel hardware implementing any feedforward neural network (FFNN): multilayer perceptron, autoencoder, and logistic regression. The architecture admits an arbitrary input and output number, units in layers, and a number of layers. The hardware combines matrix algebra concepts with serial-parallel computation. It is based on a systolic ring of neural processing elements (NPE), only requiring as many NPEs as neuron units in the largest layer, no matter the number of layers. The use of resources grows linearly with the number of NPEs. This versatile architecture serves as an accelerator in real-time applications and its size does not affect the system clock frequency. Unlike most approaches, a single activation function block (AFB) for the whole FFNN is required. Performance, resource usage, and accuracy for several network topologies and activation functions are evaluated. The architecture reaches 550 MHz clock speed in a Virtex7 FPGA. The proposed implementation uses 18-bit fixed point achieving similar classification performance to a floating point approach. A reduced weight bit size does not affect the accuracy, allowing more weights in the same memory. Different FFNN for Iris and MNIST datasets were evaluated and, for a real-time application of abnormal cardiac detection, a x256 acceleration was achieved. The proposed architecture can perform up to 1980 Giga operations per second (GOPS), implementing the multilayer FFNN of up to 3600 neurons per layer in a single chip. The architecture can be extended to bigger capacity devices or multi-chip by the simple NPE ring extension

    Run-time reconfiguration for efficient tracking of implanted magnets with a myokinetic control interface applied to robotic hands

    Get PDF
    Tese (doutorado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Mecânica, 2021.Este trabalho introduz a aplicação de soluções de aprendizagem de máquinas visado ao problema do rastreamento de posição do antebraço baseado em sensores magnéticos. Especi ficamente, emprega-se uma estratégia baseada em dados para criar modelos matemáticos que possam traduzir as informações magnéticas medidas em entradas utilizáveis para dispositivos protéticos. Estes modelos são implementados em FPGAs usando operadores customizados de ponto flutuante para otimizar o consumo de hardware e energia, que são importantes em dispositivos embarcados. A arquitetura de hardware é proposta para ser implementada como um sistema com reconfiguração dinâmica parcial, reduzindo potencialmente a utilização de recursos e o consumo de energia da FPGA. A estratégia de dados proposta e sua implemen tação de hardware pode alcançar uma latência na ordem de microssegundos e baixo consumo de energia, o que encoraja mais pesquisas para melhorar os métodos aqui desenvolvidos para outras aplicações.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).This work introduces the application of embedded machine learning solutions for the problem of magnetic sensors-based limb tracking. Namely, we employ a data-driven strat egy to create mathematical models that can translate the magnetic information measured to usable inputs for prosthetic devices. These models are implemented in FPGAs using cus tomized floating-point operations to optimize hardware and energy consumption, which are important in wearable devices. The hardware architecture is proposed to be implemented as a dynamically partial reconfigured system, potentially reducing resource utilization and power consumption of the FPGA. The proposed data-driven strategy and its hardware implementa tion can achieve a latency in the order of microseconds and low energy consumption, which encourages further research on improving the methods herein devised for other application

    A neural network face detector design using bit-width reduced FPU in FPGA

    Get PDF
    This thesis implemented a field programmable gate array (FPGA)-based face detector using a neural network (NN), as well as a bit-width reduced floating-point unit (FPU). An NN was used to easily separate face data and non-face data in the face detector. The NN performs time consuming repetitive calculation. This time consuming problem was solved by a Field Programmable Gate Array (FPGA) device and a bit-width reduced FPU in this thesis. A floating-point bit-width reduction provided a significant saving of hardware resources, such as area and power.The analytical error model, using the maximum relative representation error (MRRE) and the average relative representation error (ARRE), was developed to obtain the maximum and average output errors for the bit-width reduced FPUs. After the development of the analytical error model, the bit-width reduced FPUs and an NN were designed using MATLAB and VHDL. Finally, the analytical (MATLAB) results, along with the experimental (VHDL) results, were compared. The analytical results and the experimental results showed conformity of shape. It was also found that while maintaining 94.1% detection accuracy, a reduction in bit-width from 32 bits to 16 bits reduced the size of memory and arithmetic units by 50%, and the total power consumption by 14.7%

    Real-time multi-domain optimization controller for multi-motor electric vehicles using automotive-suitable methods and heterogeneous embedded platforms

    Get PDF
    Los capítulos 2,3 y 7 están sujetos a confidencialidad por el autor. 145 p.In this Thesis, an elaborate control solution combining Machine Learning and Soft Computing techniques has been developed, targeting a chal lenging vehicle dynamics application aiming to optimize the torque distribution across the wheels with four independent electric motors.The technological context that has motivated this research brings together potential -and challenges- from multiple dom ains: new automotive powertrain topologies with increased degrees of freedom and controllability, which can be approached with innovative Machine Learning algorithm concepts, being implementable by exploiting the computational capacity of modern heterogeneous embedded platforms and automated toolchains. The complex relations among these three domains that enable the potential for great enhancements, do contrast with the fourth domain in this context: challenging constraints brought by industrial aspects and safe ty regulations. The innovative control architecture that has been conce ived combines Neural Networks as Virtual Sensor for unmeasurable forces , with a multi-objective optimization function driven by Fuzzy Logic , which defines priorities basing on the real -time driving situation. The fundamental principle is to enhance vehicle dynamics by implementing a Torque Vectoring controller that prevents wheel slip using the inputs provided by the Neural Network. Complementary optimization objectives are effici ency, thermal stress and smoothness. Safety -critical concerns are addressed through architectural and functional measures.Two main phases can be identified across the activities and milestones achieved in this work. In a first phase, a baseline Torque Vectoring controller was implemented on an embedded platform and -benefiting from a seamless transition using Hardware-in -the -Loop - it was integrated into a real Motor -in -Wheel vehicle for race track tests. Having validated the concept, framework, methodology and models, a second simulation-based phase proceeds to develop the more sophisticated controller, targeting a more capable vehicle, leading to the final solution of this work. Besides, this concept was further evolved to support a joint research work which lead to outstanding FPGA and GPU based embedded implementations of Neural Networks. Ultimately, the different building blocks that compose this work have shown results that have met or exceeded the expectations, both on technical and conceptual level. The highly non-linear multi-variable (and multi-objective) control problem was tackled. Neural Network estimations are accurate, performance metrics in general -and vehicle dynamics and efficiency in particular- are clearly improved, Fuzzy Logic and optimization behave as expected, and efficient embedded implementation is shown to be viable. Consequently, the proposed control concept -and the surrounding solutions and enablers- have proven their qualities in what respects to functionality, performance, implementability and industry suitability.The most relevant contributions to be highlighted are firstly each of the algorithms and functions that are implemented in the controller solutions and , ultimately, the whole control concept itself with the architectural approaches it involves. Besides multiple enablers which are exploitable for future work have been provided, as well as an illustrative insight into the intricacies of a vivid technological context, showcasing how they can be harmonized. Furthermore, multiple international activities in both academic and professional contexts -which have provided enrichment as well as acknowledgement, for this work-, have led to several publications, two high-impact journal papers and collateral work products of diverse nature

    Dynamically reconfigurable bio-inspired hardware

    Get PDF
    During the last several years, reconfigurable computing devices have experienced an impressive development in their resource availability, speed, and configurability. Currently, commercial FPGAs offer the possibility of self-reconfiguring by partially modifying their configuration bitstream, providing high architectural flexibility, while guaranteeing high performance. These configurability features have received special interest from computer architects: one can find several reconfigurable coprocessor architectures for cryptographic algorithms, image processing, automotive applications, and different general purpose functions. On the other hand we have bio-inspired hardware, a large research field taking inspiration from living beings in order to design hardware systems, which includes diverse topics: evolvable hardware, neural hardware, cellular automata, and fuzzy hardware, among others. Living beings are well known for their high adaptability to environmental changes, featuring very flexible adaptations at several levels. Bio-inspired hardware systems require such flexibility to be provided by the hardware platform on which the system is implemented. In general, bio-inspired hardware has been implemented on both custom and commercial hardware platforms. These custom platforms are specifically designed for supporting bio-inspired hardware systems, typically featuring special cellular architectures and enhanced reconfigurability capabilities; an example is their partial and dynamic reconfigurability. These aspects are very well appreciated for providing the performance and the high architectural flexibility required by bio-inspired systems. However, the availability and the very high costs of such custom devices make them only accessible to a very few research groups. Even though some commercial FPGAs provide enhanced reconfigurability features such as partial and dynamic reconfiguration, their utilization is still in its early stages and they are not well supported by FPGA vendors, thus making their use difficult to include in existing bio-inspired systems. In this thesis, I present a set of architectures, techniques, and methodologies for benefiting from the configurability advantages of current commercial FPGAs in the design of bio-inspired hardware systems. Among the presented architectures there are neural networks, spiking neuron models, fuzzy systems, cellular automata and random boolean networks. For these architectures, I propose several adaptation techniques for parametric and topological adaptation, such as hebbian learning, evolutionary and co-evolutionary algorithms, and particle swarm optimization. Finally, as case study I consider the implementation of bio-inspired hardware systems in two platforms: YaMoR (Yet another Modular Robot) and ROPES (Reconfigurable Object for Pervasive Systems); the development of both platforms having been co-supervised in the framework of this thesis

    Real-time multi-domain optimization controller for multi-motor electric vehicles using automotive-suitable methods and heterogeneous embedded platforms

    Get PDF
    Los capítulos 2,3 y 7 están sujetos a confidencialidad por el autor. 145 p.In this Thesis, an elaborate control solution combining Machine Learning and Soft Computing techniques has been developed, targeting a chal lenging vehicle dynamics application aiming to optimize the torque distribution across the wheels with four independent electric motors.The technological context that has motivated this research brings together potential -and challenges- from multiple dom ains: new automotive powertrain topologies with increased degrees of freedom and controllability, which can be approached with innovative Machine Learning algorithm concepts, being implementable by exploiting the computational capacity of modern heterogeneous embedded platforms and automated toolchains. The complex relations among these three domains that enable the potential for great enhancements, do contrast with the fourth domain in this context: challenging constraints brought by industrial aspects and safe ty regulations. The innovative control architecture that has been conce ived combines Neural Networks as Virtual Sensor for unmeasurable forces , with a multi-objective optimization function driven by Fuzzy Logic , which defines priorities basing on the real -time driving situation. The fundamental principle is to enhance vehicle dynamics by implementing a Torque Vectoring controller that prevents wheel slip using the inputs provided by the Neural Network. Complementary optimization objectives are effici ency, thermal stress and smoothness. Safety -critical concerns are addressed through architectural and functional measures.Two main phases can be identified across the activities and milestones achieved in this work. In a first phase, a baseline Torque Vectoring controller was implemented on an embedded platform and -benefiting from a seamless transition using Hardware-in -the -Loop - it was integrated into a real Motor -in -Wheel vehicle for race track tests. Having validated the concept, framework, methodology and models, a second simulation-based phase proceeds to develop the more sophisticated controller, targeting a more capable vehicle, leading to the final solution of this work. Besides, this concept was further evolved to support a joint research work which lead to outstanding FPGA and GPU based embedded implementations of Neural Networks. Ultimately, the different building blocks that compose this work have shown results that have met or exceeded the expectations, both on technical and conceptual level. The highly non-linear multi-variable (and multi-objective) control problem was tackled. Neural Network estimations are accurate, performance metrics in general -and vehicle dynamics and efficiency in particular- are clearly improved, Fuzzy Logic and optimization behave as expected, and efficient embedded implementation is shown to be viable. Consequently, the proposed control concept -and the surrounding solutions and enablers- have proven their qualities in what respects to functionality, performance, implementability and industry suitability.The most relevant contributions to be highlighted are firstly each of the algorithms and functions that are implemented in the controller solutions and , ultimately, the whole control concept itself with the architectural approaches it involves. Besides multiple enablers which are exploitable for future work have been provided, as well as an illustrative insight into the intricacies of a vivid technological context, showcasing how they can be harmonized. Furthermore, multiple international activities in both academic and professional contexts -which have provided enrichment as well as acknowledgement, for this work-, have led to several publications, two high-impact journal papers and collateral work products of diverse nature

    Scalable event-driven modelling architectures for neuromimetic hardware

    Get PDF
    Neural networks present a fundamentally different model of computation from the conventional sequential digital model. Dedicated hardware may thus be more suitable for executing them. Given that there is no clear consensus on the model of computation in the brain, model flexibility is at least as important a characteristic of neural hardware as is performance acceleration. The SpiNNaker chip is an example of the emerging 'neuromimetic' architecture, a universal platform that specialises the hardware for neural networks but allows flexibility in model choice. It integrates four key attributes: native parallelism, event-driven processing, incoherent memory and incremental reconfiguration, in a system combining an array of general-purpose processors with a configurable asynchronous interconnect. Making such a device usable in practice requires an environment for instantiating neural models on the chip that allows the user to focus on model characteristics rather than on hardware details. The central part of this system is a library of predesigned, 'drop-in' event-driven neural components that specify their specific implementation on SpiNNaker. Three exemplar models: two spiking networks and a multilayer perceptron network, illustrate techniques that provide a basis for the library and demonstrate a reference methodology that can be extended to support third-party library components not only on SpiNNaker but on any configurable neuromimetic platform. Experiments demonstrate the capability of the library model to implement efficient on-chip neural networks, but also reveal important hardware limitations, particularly with respect to communications, that require careful design. The ultimate goal is the creation of a library-based development system that allows neural modellers to work in the high-level environment of their choice, using an automated tool chain to create the appropriate SpiNNaker instantiation. Such a system would enable the use of the hardware to explore abstractions of biological neurodynamics that underpin a functional model of neural computation.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A Method for Image Classification Using Low-Precision Analog Computing Arrays

    Get PDF
    Computing with analog micro electronics can offer several advantages over standard digital technology, most notably: Low space and power consumption and massive parallelization. On the other hand, analog computation lacks the exactness of digital calculations due to inevitable device variations introduced during the chip production, but also due to electric noise in the analog signals. Artificial neural networks are well suited for parallel analog implementations, first, because of their inherent parallelity and second, because they can adapt to device imperfections by training. This thesis evaluates the feasibility of implementing a convolutional neural network for image classification on a massively parallel low-power hardware system. A particular, mixed analogdigital, hardware model is considered, featuring simple threshold neurons. Appropriate, gradient-free, training algorithms, combining self-organization and supervised learning are developed and tested with two benchmark problems (MNIST hand-written digits and traffic signs). Software simulations evaluate the methods under various defined computation faults. A model-free closed-loop technique is shown to compensate for rather serious computation errors without the need for explicit error quantification. Last but not least, the developed networks and the training techniques are verified on a real prototype chip
    • …
    corecore