17 research outputs found

    Open the box of digital neuromorphic processor: Towards effective algorithm-hardware co-design

    Full text link
    Sparse and event-driven spiking neural network (SNN) algorithms are the ideal candidate solution for energy-efficient edge computing. Yet, with the growing complexity of SNN algorithms, it isn't easy to properly benchmark and optimize their computational cost without hardware in the loop. Although digital neuromorphic processors have been widely adopted to benchmark SNN algorithms, their black-box nature is problematic for algorithm-hardware co-optimization. In this work, we open the black box of the digital neuromorphic processor for algorithm designers by presenting the neuron processing instruction set and detailed energy consumption of the SENeCA neuromorphic architecture. For convenient benchmarking and optimization, we provide the energy cost of the essential neuromorphic components in SENeCA, including neuron models and learning rules. Moreover, we exploit the SENeCA's hierarchical memory and exhibit an advantage over existing neuromorphic processors. We show the energy efficiency of SNN algorithms for video processing and online learning, and demonstrate the potential of our work for optimizing algorithm designs. Overall, we present a practical approach to enable algorithm designers to accurately benchmark SNN algorithms and pave the way towards effective algorithm-hardware co-design

    Optimizing event-based neural networks on digital neuromorphic architecture: a comprehensive design space exploration

    Get PDF
    Neuromorphic processors promise low-latency and energy-efficient processing by adopting novel brain-inspired design methodologies. Yet, current neuromorphic solutions still struggle to rival conventional deep learning accelerators' performance and area efficiency in practical applications. Event-driven data-flow processing and near/in-memory computing are the two dominant design trends of neuromorphic processors. However, there remain challenges in reducing the overhead of event-driven processing and increasing the mapping efficiency of near/in-memory computing, which directly impacts the performance and area efficiency. In this work, we discuss these challenges and present our exploration of optimizing event-based neural network inference on SENECA, a scalable and flexible neuromorphic architecture. To address the overhead of event-driven processing, we perform comprehensive design space exploration and propose spike-grouping to reduce the total energy and latency. Furthermore, we introduce the event-driven depth-first convolution to increase area efficiency and latency in convolutional neural networks (CNNs) on the neuromorphic processor. We benchmarked our optimized solution on keyword spotting, sensor fusion, digit recognition and high resolution object detection tasks. Compared with other state-of-the-art large-scale neuromorphic processors, our proposed optimizations result in a 6Ă— to 300Ă— improvement in energy efficiency, a 3Ă— to 15Ă— improvement in latency, and a 3Ă— to 100Ă— improvement in area efficiency. Our optimizations for event-based neural networks can be potentially generalized to a wide range of event-based neuromorphic processors

    A simple hardware implementation of the tabu search heuristic for DSP application

    No full text
    Tabu Search heuristic is an optimization technique suitable for many DSP applications such as finite word length filter design or adaptive, linear or non-linear filters. In this paper a simple hardware implementation of the proposed algorithm is presented in order to try to tackle its main and well known bottleneck due to a high computational load. The presented system can be addressed to many DSP applications where we are supposed to solve an optimization task. 1

    Power-of-two adaptive filters using tabu search,” presented at the

    No full text
    Abstract—Digital filters with power-of-two or a sum of power-of-two coefficients can be built using simple and fast shift registers instead of slower floating-point multipliers, such a strategy can reduce both the VLSI silicon area and the computational time. Due to the quantization and the nonuniform distribution of the coefficients through their domain, in the case of adaptive filters, classical steepest descent based approaches cannot be successfully applied. Methods for adaptation processes, as in the least mean squares (LMS) error and other related adaptation algorithms, can actually lose their convergence properties. In this brief, we present a customized Tabu Search (TS) adaptive algorithm that works directly on the power-of-two filter coefficients domain, avoiding any rounding process. In particular, we propose TS for a time varying environment, suitable for real time adaptive signal processing. Several experimental results demonstrate the effectiveness of the proposed method. Index Terms—Adaptive digital filter, finite precision design of adaptive digital filter, finite wordlength, global optimization, signed digit code, signed power-of-two, tabu search. quantized filter coefficients domain made of a sum of SPT’s. The proposed method avoids any coefficient rounding and is suitable for a simple hardware implementation [1], [2], and [18]. The TS algorithm has been widely employed in typical operational research problems and for static digital filter design [4], [5], [15]; here, we have exploited its capability in finding optimal points in a dynamical search space. In the case of SPT static filter design, some researchers have tried to use nonconventional optimization procedures, like the Genetic Algorith

    Uncini A. Efficient allocation of power-of-two terms in FIR digital filter design using tabu search

    No full text
    Abstract- This paper concerns the design of FIR filters using quantized coefficients belonging to a sum of signed power-of-two terms (SPT) domain. Unlike other SPT filter design approaches, we added a global constraint that a priori fixes the total number of shift registers; thus each coefficient can be represented using different precision. The difficult FIR filter design optimization task, is solved by a specific Tabu Search (TS) method. Several experimental results and comparisons with other well known approaches, demonstrate the effectiveness of this filter architecture and the proposed optimization algorithm. The increasing demand for fast FIR digital filters brings us to look for designing solutions based on the use of shift registers, which perform multiplications ver

    A 33-ppm/°C 240-nW 40-nm CMOS Wakeup Timer Based on a Bang-Bang Digital-Intensive Frequency-Locked-Loop for IoT Applications

    No full text
    This paper presents a wakeup timer in 40-nm CMOS for Internet-of-Things (IoT) applications based on a bang-bang Digital-intensive Frequency-Locked Loop (DFLL). A self-biased Σ Δ Digitally Controlled Oscillator (DCO) is locked to an RC time constant via a feedback loop consisting of a single-bit chopped comparator and a digital loop filter, thus maximizing the use of digital circuits while keeping only the RC network and the comparator as the sole analog blocks. Analysis and behavior level simulations of the DFLL have been carried out to guide the optimization of the long-term stability and frequency accuracy of the timer. High frequency accuracy and a 10× enhancement of long-term stability is achieved by the adoption of chopping to reduce the effect of comparator offset and 1/f noise and by the use of Σ Δ modulation to improve the DCO resolution. Such highly digitized architecture fully exploits the advantages of advanced CMOS processes, thus enabling operation down to 0.7 V and a small area (0.07 mm2). The proposed timer achieves the excellent energy efficiency (0.57 pJ/cycle at 417 kHz at 0.8-V supply) over prior art while keeping excellent on-par long-term stability (Allan deviation floor < 20 ppm) and temperature stability (33 ppm°Cat 0.8-V supply).Accepted Author Manuscript(OLD)Applied Quantum Architecture

    A 0.7-V 0.43-pJ/cycle Wakeup Timer based on a Bang-bang Digital-Intensive frequency-Locked-Loop for IoT Applications

    No full text
    A 40-nm CMOS wakeup timer employing a bang-bang digital-intensive frequency-locked loop for Internet-of-Things applications is presented. A self-biased ΣΔ digitally controlled oscillator (DCO) is locked to an RC time constant via a single-bit chopped comparator and a digital loop filter. Such highly digitized architecture fully exploits the advantages of advanced CMOS processes, thus enabling operation down to 0.7 V and a small area (0.07 mm 2 ). Most circuitry operates at 32× lower frequency than the DCO in order to reduce the total power consumption down to 181 nW. High frequency accuracy and a 10× enhancement of long-term stability is achieved by the adoption of chopping to reduce the effect of comparator offset and 1/f noise and by the use of ΣΔ modulation to improve the DCO resolution. The proposed timer achieves the best energy efficiency (0.43 pJ/cycle at 417 kHz) over prior art while keeping excellent on-par long-term stability (Allan deviation floor <;20 ppm) and temperature stability (106 ppm/°C).Accepted Author Manuscript(OLD)Applied Quantum Architecture
    corecore