632 research outputs found

    Memory and information processing in neuromorphic systems

    Full text link
    A striking difference between brain-inspired neuromorphic processors and current von Neumann processors architectures is the way in which memory and processing is organized. As Information and Communication Technologies continue to address the need for increased computational power through the increase of cores within a digital processor, neuromorphic engineers and scientists can complement this need by building processor architectures where memory is distributed with the processing. In this paper we present a survey of brain-inspired processor architectures that support models of cortical networks and deep neural networks. These architectures range from serial clocked implementations of multi-neuron systems to massively parallel asynchronous ones and from purely digital systems to mixed analog/digital systems which implement more biological-like models of neurons and synapses together with a suite of adaptation and learning mechanisms analogous to the ones found in biological nervous systems. We describe the advantages of the different approaches being pursued and present the challenges that need to be addressed for building artificial neural processing systems that can display the richness of behaviors seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed neuromorphic computing platforms and system

    Improvement Energy Efficiency for a Hybrid Multibank Memory in Energy Critical Applications

    Get PDF
    High performance, low power multiprocessor/multibank memory system requires a compiler that provides efficient data partitioning and mapping procedures. This paper introduced two compiler techniques for the data mapping to multibank memory, since data mapping is still an open problem and needs a better solution. The multibank memory can be consisted of volatile and non-volatile memory components to support ultra-low powered wearable devices. This hybrid memory system including volatile and non-volatile memory components yields higher complexity to map data onto it. To efficiently solve this mapping problem, we formulate it to a simple decision problem. Based on the problem definition, we proposed two efficient algorithms to determine the placement of data to the multibank memory. The proposed techniques consider the characteristic of the non-volatile memory that its write operation consumes more energy than the same operation of a volatile memory even though it provides ultra-low operation power and nearly zero leakage current. The proposed technique solves this negative effect of non-volatile memory by using efficient data placement technique and hybrid memory architecture. In experimental section, the result shows that the proposed techniques improve energy saving up to 59.5% for the hybrid multibank memory architecture

    Moving Learning Machine Towards Fast Real-Time Applications: A High-Speed FPGA-based Implementation of the OS-ELM Training Algorithm

    Get PDF
    Currently, there are some emerging online learning applications handling data streams in real-time. The On-line Sequential Extreme Learning Machine (OS-ELM) has been successfully used in real-time condition prediction applications because of its good generalization performance at an extreme learning speed, but the number of trainings by a second (training frequency) achieved in these continuous learning applications has to be further reduced. This paper proposes a performance-optimized implementation of the OS-ELM training algorithm when it is applied to real-time applications. In this case, the natural way of feeding the training of the neural network is one-by-one, i.e., training the neural network for each new incoming training input vector. Applying this restriction, the computational needs are drastically reduced. An FPGA-based implementation of the tailored OS-ELMalgorithm is used to analyze, in a parameterized way, the level of optimization achieved. We observed that the tailored algorithm drastically reduces the number of clock cycles consumed for the training execution up to approximately the 1%. This performance enables high-speed sequential training ratios, such as 14 KHz of sequential training frequency for a 40 hidden neurons SLFN, or 180 Hz of sequential training frequency for a 500 hidden neurons SLFN. In practice, the proposed implementation computes the training almost 100 times faster, or more, than other applications in the bibliography. Besides, clock cycles follows a quadratic complexity O(N 2), with N the number of hidden neurons, and are poorly influenced by the number of input neurons. However, it shows a pronounced sensitivity to data type precision even facing small-size problems, which force to use double floating-point precision data types to avoid finite precision arithmetic effects. In addition, it has been found that distributed memory is the limiting resource and, thus, it can be stated that current FPGA devices can support OS-ELM-based on-chip learning of up to 500 hidden neurons. Concluding, the proposed hardware implementation of the OS-ELM offers great possibilities for on-chip learning in portable systems and real-time applications where frequent and fast training is required
    • …
    corecore