92 research outputs found

    NACU: A Non-Linear Arithmetic Unit for Neural Networks

    Get PDF
    Reconfigurable architectures targeting neural networks are an attractive option. They allow multiple neural networks of different types to be hosted on the same hardware, in parallel or sequence. Reconfigurability also grants the ability to morph into different micro-architectures to meet varying power-performance constraints. In this context, the need for a reconfigurable non-linear computational unit has not been widely researched. In this work, we present a formal and comprehensive method to select the optimal fixed-point representation to achieve the highest accuracy against the floating-point implementation benchmark. We also present a novel design of an optimised reconfigurable arithmetic unit for calculating non-linear functions. The unit can be dynamically configured to calculate the sigmoid, hyperbolic tangent, and exponential function using the same underlying hardware. We compare our work with the state-of-the-art and show that our unit can calculate all three functions without loss of accuracy

    Personalized Health Monitoring Using Evolvable Block-based Neural Networks

    Get PDF
    This dissertation presents personalized health monitoring using evolvable block-based neural networks. Personalized health monitoring plays an increasingly important role in modern society as the population enjoys longer life. Personalization in health monitoring considers physiological variations brought by temporal, personal or environmental differences, and demands solutions capable to reconfigure and adapt to specific requirements. Block-based neural networks (BbNNs) consist of 2-D arrays of modular basic blocks that can be easily implemented using reconfigurable digital hardware such as field programmable gate arrays (FPGAs) that allow on-line partial reorganization. The modular structure of BbNNs enables easy expansion in size by adding more blocks. A computationally efficient evolutionary algorithm is developed that simultaneously optimizes structure and weights of BbNNs. This evolutionary algorithm increases optimization speed by integrating a local search operator. An adaptive rate update scheme removing manual tuning of operator rates enhances the fitness trend compared to pre-determined fixed rates. A fitness scaling with generalized disruptive pressure reduces the possibility of premature convergence. The BbNN platform promises an evolvable solution that changes structures and parameters for personalized health monitoring. A BbNN evolved with the proposed evolutionary algorithm using the Hermite transform coefficients and a time interval between two neighboring R peaks of ECG signal, provides a patient-specific ECG heartbeat classification system. Experimental results using the MIT-BIH Arrhythmia database demonstrate a potential for significant performance enhancements over other major techniques

    Reconfigurable Architectures and Systems for IoT Applications

    Get PDF
    abstract: Internet of Things (IoT) has become a popular topic in industry over the recent years, which describes an ecosystem of internet-connected devices or things that enrich the everyday life by improving our productivity and efficiency. The primary components of the IoT ecosystem are hardware, software and services. While the software and services of IoT system focus on data collection and processing to make decisions, the underlying hardware is responsible for sensing the information, preprocess and transmit it to the servers. Since the IoT ecosystem is still in infancy, there is a great need for rapid prototyping platforms that would help accelerate the hardware design process. However, depending on the target IoT application, different sensors are required to sense the signals such as heart-rate, temperature, pressure, acceleration, etc., and there is a great need for reconfigurable platforms that can prototype different sensor interfacing circuits. This thesis primarily focuses on two important hardware aspects of an IoT system: (a) an FPAA based reconfigurable sensing front-end system and (b) an FPGA based reconfigurable processing system. To enable reconfiguration capability for any sensor type, Programmable ANalog Device Array (PANDA), a transistor-level analog reconfigurable platform is proposed. CAD tools required for implementation of front-end circuits on the platform are also developed. To demonstrate the capability of the platform on silicon, a small-scale array of 24×25 PANDA cells is fabricated in 65nm technology. Several analog circuit building blocks including amplifiers, bias circuits and filters are prototyped on the platform, which demonstrates the effectiveness of the platform for rapid prototyping IoT sensor interfaces. IoT systems typically use machine learning algorithms that run on the servers to process the data in order to make decisions. Recently, embedded processors are being used to preprocess the data at the energy-constrained sensor node or at IoT gateway, which saves considerable energy for transmission and bandwidth. Using conventional CPU based systems for implementing the machine learning algorithms is not energy-efficient. Hence an FPGA based hardware accelerator is proposed and an optimization methodology is developed to maximize throughput of any convolutional neural network (CNN) based machine learning algorithm on a resource-constrained FPGA.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems

    Harnessing Evolution in-Materio as an Unconventional Computing Resource

    Get PDF
    This thesis illustrates the use and development of physical conductive analogue systems for unconventional computing using the Evolution in-Materio (EiM) paradigm. EiM uses an Evolutionary Algorithm to configure and exploit a physical material (or medium) for computation. While EiM processors show promise, fundamental questions and scaling issues remain. Additionally, their development is hindered by slow manufacturing and physical experimentation. This work addressed these issues by implementing simulated models to speed up research efforts, followed by investigations of physically implemented novel in-materio devices. Initial work leveraged simulated conductive networks as single substrate ‘monolithic’ EiM processors, performing classification by formulating the system as an optimisation problem, solved using Differential Evolution. Different material properties and algorithm parameters were isolated and investigated; which explained the capabilities of configurable parameters and showed ideal nanomaterial choice depended upon problem complexity. Subsequently, drawing from concepts in the wider Machine Learning field, several enhancements to monolithic EiM processors were proposed and investigated. These ensured more efficient use of training data, better classification decision boundary placement, an independently optimised readout layer, and a smoother search space. Finally, scalability and performance issues were addressed by constructing in-Materio Neural Networks (iM-NNs), where several EiM processors were stacked in parallel and operated as physical realisations of Hidden Layer neurons. Greater flexibility in system implementation was achieved by re-using a single physical substrate recursively as several virtual neurons, but this sacrificed faster parallelised execution. These novel iM-NNs were first implemented using Simulated in-Materio neurons, and trained for classification as Extreme Learning Machines, which were found to outperform artificial networks of a similar size. Physical iM-NN were then implemented using a Raspberry Pi, custom Hardware Interface and Lambda Diode based Physical in-Materio neurons, which were trained successfully with neuroevolution. A more complex AutoEncoder structure was then proposed and implemented physically to perform dimensionality reduction on a handwritten digits dataset, outperforming both Principal Component Analysis and artificial AutoEncoders. This work presents an approach to exploit systems with interesting physical dynamics, and leverage them as a computational resource. Such systems could become low power, high speed, unconventional computing assets in the future

    Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars

    Full text link
    The discovery of neural architectures from simple building blocks is a long-standing goal of Neural Architecture Search (NAS). Hierarchical search spaces are a promising step towards this goal but lack a unifying search space design framework and typically only search over some limited aspect of architectures. In this work, we introduce a unifying search space design framework based on context-free grammars that can naturally and compactly generate expressive hierarchical search spaces that are 100s of orders of magnitude larger than common spaces from the literature. By enhancing and using their properties, we effectively enable search over the complete architecture and can foster regularity. Further, we propose an efficient hierarchical kernel design for a Bayesian Optimization search strategy to efficiently search over such huge spaces. We demonstrate the versatility of our search space design framework and show that our search strategy can be superior to existing NAS approaches. Code is available at https://github.com/automl/hierarchical_nas_construction

    Autonomously Reconfigurable Artificial Neural Network on a Chip

    Get PDF
    Artificial neural network (ANN), an established bio-inspired computing paradigm, has proved very effective in a variety of real-world problems and particularly useful for various emerging biomedical applications using specialized ANN hardware. Unfortunately, these ANN-based systems are increasingly vulnerable to both transient and permanent faults due to unrelenting advances in CMOS technology scaling, which sometimes can be catastrophic. The considerable resource and energy consumption and the lack of dynamic adaptability make conventional fault-tolerant techniques unsuitable for future portable medical solutions. Inspired by the self-healing and self-recovery mechanisms of human nervous system, this research seeks to address reliability issues of ANN-based hardware by proposing an Autonomously Reconfigurable Artificial Neural Network (ARANN) architectural framework. Leveraging the homogeneous structural characteristics of neural networks, ARANN is capable of adapting its structures and operations, both algorithmically and microarchitecturally, to react to unexpected neuron failures. Specifically, we propose three key techniques --- Distributed ANN, Decoupled Virtual-to-Physical Neuron Mapping, and Dual-Layer Synchronization --- to achieve cost-effective structural adaptation and ensure accurate system recovery. Moreover, an ARANN-enabled self-optimizing workflow is presented to adaptively explore a "Pareto-optimal" neural network structure for a given application, on the fly. Implemented and demonstrated on a Virtex-5 FPGA, ARANN can cover and adapt 93% chip area (neurons) with less than 1% chip overhead and O(n) reconfiguration latency. A detailed performance analysis has been completed based on various recovery scenarios

    Genetic algorithm for Artificial Neural Network training for the purpose of Automated Part Recognition

    Get PDF
    Object or part recognition is of major interest in industrial environments. Current methods implement expensive camera based solutions. There is a need for a cost effective alternative to be developed. One of the proposed methods is to overcome the hardware, camera, problem by implementing a software solution. Artificial Neural Networks (ANN) are to be used as the underlying intelligent software as they have high tolerance for noise and have the ability to generalize. A colleague has implemented a basic ANN based system comprising of an ANN and three cost effective laser distance sensors. However, the system is only able to identify 3 different parts and needed hard coding changes made by trial and error. This is not practical for industrial use in a production environment where there are a large quantity of different parts to be identified that change relatively regularly. The ability to easily train more parts is required. Difficulties associated with traditional mathematically guided training methods are discussed, which leads to the development of a Genetic Algorithm (GA) based evolutionary training method that overcomes these difficulties and makes accurate part recognition possible. An ANN hybridised with GA training is introduced and a general solution encoding scheme which is used to encode the required ANN connection weights. Experimental tests were performed in order to determine the ideal GA performance and control parameters as studies have indicated that different GA control parameters can lead to large differences in training accuracy. After performing these tests, the training accuracy was analyzed by investigation into GA performance as well as hardware based part recognition performance. This analysis identified the ideal GA control parameters when training an ANN for the purpose of part recognition and showed that the ANN generally trained well and could generalize well on data not presented to it during training

    An efficient implementation of lattice-ladder multilayer perceptrons in field programmable gate arrays

    Get PDF
    The implementation efficiency of electronic systems is a combination of conflicting requirements, as increasing volumes of computations, accelerating the exchange of data, at the same time increasing energy consumption forcing the researchers not only to optimize the algorithm, but also to quickly implement in a specialized hardware. Therefore in this work, the problem of efficient and straightforward implementation of operating in a real-time electronic intelligent systems on field-programmable gate array (FPGA) is tackled. The object of research is specialized FPGA intellectual property (IP) cores that operate in a real-time. In the thesis the following main aspects of the research object are investigated: implementation criteria and techniques. The aim of the thesis is to optimize the FPGA implementation process of selected class dynamic artificial neural networks. In order to solve stated problem and reach the goal following main tasks of the thesis are formulated: rationalize the selection of a class of Lattice-Ladder Multi-Layer Perceptron (LLMLP) and its electronic intelligent system test-bed – a speaker dependent Lithuanian speech recognizer, to be created and investigated; develop dedicated technique for implementation of LLMLP class on FPGA that is based on specialized efficiency criteria for a circuitry synthesis; develop and experimentally affirm the efficiency of optimized FPGA IP cores used in Lithuanian speech recognizer. The dissertation contains: introduction, four chapters and general conclusions. The first chapter reveals the fundamental knowledge on computer-aideddesign, artificial neural networks and speech recognition implementation on FPGA. In the second chapter the efficiency criteria and technique of LLMLP IP cores implementation are proposed in order to make multi-objective optimization of throughput, LLMLP complexity and resource utilization. The data flow graphs are applied for optimization of LLMLP computations. The optimized neuron processing element is proposed. The IP cores for features extraction and comparison are developed for Lithuanian speech recognizer and analyzed in third chapter. The fourth chapter is devoted for experimental verification of developed numerous LLMLP IP cores. The experiments of isolated word recognition accuracy and speed for different speakers, signal to noise ratios, features extraction and accelerated comparison methods were performed. The main results of the thesis were published in 12 scientific publications: eight of them were printed in peer-reviewed scientific journals, four of them in a Thomson Reuters Web of Science database, four articles – in conference proceedings. The results were presented in 17 scientific conferences
    corecore