Search CORE

17 research outputs found

Open the box of digital neuromorphic processor: Towards effective algorithm-hardware co-design

Author: Detterer Paul
Konijnenburg Mario
Safa Ali
Shidqi Kevin
Sifalakis Manolis
Tang Guangzhi
Traferro Stefano
van Schaik Gert-Jan
Yousefzadeh Amirreza
Publication venue
Publication date: 27/03/2023
Field of study

Sparse and event-driven spiking neural network (SNN) algorithms are the ideal candidate solution for energy-efficient edge computing. Yet, with the growing complexity of SNN algorithms, it isn't easy to properly benchmark and optimize their computational cost without hardware in the loop. Although digital neuromorphic processors have been widely adopted to benchmark SNN algorithms, their black-box nature is problematic for algorithm-hardware co-optimization. In this work, we open the black box of the digital neuromorphic processor for algorithm designers by presenting the neuron processing instruction set and detailed energy consumption of the SENeCA neuromorphic architecture. For convenient benchmarking and optimization, we provide the energy cost of the essential neuromorphic components in SENeCA, including neuron models and learning rules. Moreover, we exploit the SENeCA's hierarchical memory and exhibit an advantage over existing neuromorphic processors. We show the energy efficiency of SNN algorithms for video processing and online learning, and demonstrate the potential of our work for optimizing algorithm designs. Overall, we present a practical approach to enable algorithm designers to accurately benchmark SNN algorithms and pave the way towards effective algorithm-hardware co-design

arXiv.org e-Print Archive

Optimizing event-based neural networks on digital neuromorphic architecture: a comprehensive design space exploration

Author: Alexandra Dobrita
Amirreza Yousefzadeh
Amirreza Yousefzadeh
Anteneh Gebregiorgis
Cina Arjmand
Gert-Jan van Schaik
Guangzhi Tang
Kanishkan Vadivel
Kevin Shidqi
Manolis Sifalakis
Mario Konijnenburg
Paul Detterer
Pietro Martinello
Prithvish Nembhani
Refik Bilgic
Roy Meijer
Said Hamdioui
Shenqi Wang
Stefano Traferro
Yingfu Xu
Publication venue: Frontiers Media S.A.
Publication date: 01/03/2024
Field of study

Neuromorphic processors promise low-latency and energy-efficient processing by adopting novel brain-inspired design methodologies. Yet, current neuromorphic solutions still struggle to rival conventional deep learning accelerators' performance and area efficiency in practical applications. Event-driven data-flow processing and near/in-memory computing are the two dominant design trends of neuromorphic processors. However, there remain challenges in reducing the overhead of event-driven processing and increasing the mapping efficiency of near/in-memory computing, which directly impacts the performance and area efficiency. In this work, we discuss these challenges and present our exploration of optimizing event-based neural network inference on SENECA, a scalable and flexible neuromorphic architecture. To address the overhead of event-driven processing, we perform comprehensive design space exploration and propose spike-grouping to reduce the total energy and latency. Furthermore, we introduce the event-driven depth-first convolution to increase area efficiency and latency in convolutional neural networks (CNNs) on the neuromorphic processor. We benchmarked our optimized solution on keyword spotting, sensor fusion, digit recognition and high resolution object detection tasks. Compared with other state-of-the-art large-scale neuromorphic processors, our proposed optimizations result in a 6× to 300× improvement in energy efficiency, a 3× to 15× improvement in latency, and a 3× to 100× improvement in area efficiency. Our optimizations for event-based neural networks can be potentially generalized to a wide range of event-based neuromorphic processors

Directory of Open Access Journals

A simple hardware implementation of the tabu search heuristic for DSP application

Author: Aurelio Uncini
Stefano Traferro
Publication venue
Publication date: 01/01/1999
Field of study

Tabu Search heuristic is an optimization technique suitable for many DSP applications such as finite word length filter design or adaptive, linear or non-linear filters. In this paper a simple hardware implementation of the proposed algorithm is presented in order to try to tackle its main and well known bottleneck due to a high computational load. The presented system can be addressed to many DSP applications where we are supposed to solve an optimization task. 1

CiteSeerX

Archivio della ricerca- Università di Roma La Sapienza

Power-of-two adaptive filters using tabu search,” presented at the

Author: Aurelio Uncini
Stefano Traferro
Publication venue
Publication date
Field of study

Abstract—Digital filters with power-of-two or a sum of power-of-two coefficients can be built using simple and fast shift registers instead of slower floating-point multipliers, such a strategy can reduce both the VLSI silicon area and the computational time. Due to the quantization and the nonuniform distribution of the coefficients through their domain, in the case of adaptive filters, classical steepest descent based approaches cannot be successfully applied. Methods for adaptation processes, as in the least mean squares (LMS) error and other related adaptation algorithms, can actually lose their convergence properties. In this brief, we present a customized Tabu Search (TS) adaptive algorithm that works directly on the power-of-two filter coefficients domain, avoiding any rounding process. In particular, we propose TS for a time varying environment, suitable for real time adaptive signal processing. Several experimental results demonstrate the effectiveness of the proposed method. Index Terms—Adaptive digital filter, finite precision design of adaptive digital filter, finite wordlength, global optimization, signed digit code, signed power-of-two, tabu search. quantized filter coefficients domain made of a sum of SPT’s. The proposed method avoids any coefficient rounding and is suitable for a simple hardware implementation [1], [2], and [18]. The TS algorithm has been widely employed in typical operational research problems and for static digital filter design [4], [5], [15]; here, we have exploited its capability in finding optimal points in a dynamical search space. In the case of SPT static filter design, some researchers have tried to use nonconventional optimization procedures, like the Genetic Algorith

CiteSeerX

Uncini A. Efficient allocation of power-of-two terms in FIR digital filter design using tabu search

Author: Aurelio Uncini
Francesco Piazza
Stefano Traferro
Publication venue
Publication date
Field of study

Abstract- This paper concerns the design of FIR filters using quantized coefficients belonging to a sum of signed power-of-two terms (SPT) domain. Unlike other SPT filter design approaches, we added a global constraint that a priori fixes the total number of shift registers; thus each coefficient can be represented using different precision. The difficult FIR filter design optimization task, is solved by a specific Tabu Search (TS) method. Several experimental results and comparisons with other well known approaches, demonstrate the effectiveness of this filter architecture and the proposed optimization algorithm. The increasing demand for fast FIR digital filters brings us to look for designing solutions based on the use of shift registers, which perform multiplications ver

CiteSeerX

A 33-ppm/°C 240-nW 40-nm CMOS Wakeup Timer Based on a Bang-Bang Digital-Intensive Frequency-Locked-Loop for IoT Applications

Author: Bachmann Christian (author)
Ding Ming (author)
Liu Yao Hong (author)
Sebastiano F. (author)
Traferro Stefano (author)
Zhou Zhihao (author)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

This paper presents a wakeup timer in 40-nm CMOS for Internet-of-Things (IoT) applications based on a bang-bang Digital-intensive Frequency-Locked Loop (DFLL). A self-biased Σ Δ Digitally Controlled Oscillator (DCO) is locked to an RC time constant via a feedback loop consisting of a single-bit chopped comparator and a digital loop filter, thus maximizing the use of digital circuits while keeping only the RC network and the comparator as the sole analog blocks. Analysis and behavior level simulations of the DFLL have been carried out to guide the optimization of the long-term stability and frequency accuracy of the timer. High frequency accuracy and a 10× enhancement of long-term stability is achieved by the adoption of chopping to reduce the effect of comparator offset and 1/f noise and by the use of Σ Δ modulation to improve the DCO resolution. Such highly digitized architecture fully exploits the advantages of advanced CMOS processes, thus enabling operation down to 0.7 V and a small area (0.07 mm2). The proposed timer achieves the excellent energy efficiency (0.57 pJ/cycle at 417 kHz at 0.8-V supply) over prior art while keeping excellent on-par long-term stability (Allan deviation floor < 20 ppm) and temperature stability (33 ppm°Cat 0.8-V supply).Accepted Author Manuscript(OLD)Applied Quantum Architecture

TU Delft Repository

A 0.7-V 0.43-pJ/cycle Wakeup Timer based on a Bang-bang Digital-Intensive frequency-Locked-Loop for IoT Applications

Author: Bachmann Christian (author)
Ding Ming (author)
Liu Yao-Hong (author)
Philips Kathleen (author)
Sebastiano F. (author)
Traferro Stefano (author)
Zhou Zhihao (author)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2018
Field of study

A 40-nm CMOS wakeup timer employing a bang-bang digital-intensive frequency-locked loop for Internet-of-Things applications is presented. A self-biased ΣΔ digitally controlled oscillator (DCO) is locked to an RC time constant via a single-bit chopped comparator and a digital loop filter. Such highly digitized architecture fully exploits the advantages of advanced CMOS processes, thus enabling operation down to 0.7 V and a small area (0.07 mm 2 ). Most circuitry operates at 32× lower frequency than the DCO in order to reduce the total power consumption down to 181 nW. High frequency accuracy and a 10× enhancement of long-term stability is achieved by the adoption of chopping to reduce the effect of comparator offset and 1/f noise and by the use of ΣΔ modulation to improve the DCO resolution. The proposed timer achieves the best energy efficiency (0.43 pJ/cycle at 417 kHz) over prior art while keeping excellent on-par long-term stability (Allan deviation floor <;20 ppm) and temperature stability (106 ppm/°C).Accepted Author Manuscript(OLD)Applied Quantum Architecture

TU Delft Repository