Search CORE

28 research outputs found

DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling

Author: Goetschalckx Koen
Mei Linyan
Symons Arne
Verhelst Marian
Publication venue
Publication date: 10/12/2022
Field of study

DNN workloads can be scheduled onto DNN accelerators in many different ways: from layer-by-layer scheduling to cross-layer depth-first scheduling (a.k.a. layer fusion, or cascaded execution). This results in a very broad scheduling space, with each schedule leading to varying hardware (HW) costs in terms of energy and latency. To rapidly explore this vast space for a wide variety of hardware architectures, analytical cost models are crucial to estimate scheduling effects on the HW level. However, state-of-the-art cost models are lacking support for exploring the complete depth-first scheduling space, for instance focusing only on activations while ignoring weights, or modeling only DRAM accesses while overlooking on-chip data movements. These limitations prevent researchers from systematically and accurately understanding the depth-first scheduling space. After formalizing this design space, this work proposes a unified modeling framework, DeFiNES, for layer-by-layer and depth-first scheduling to fill in the gaps. DeFiNES enables analytically estimating the hardware cost for possible schedules in terms of both energy and latency, while considering data access at every memory level. This is done for each schedule and HW architecture under study by optimally choosing the active part of the memory hierarchy per unique combination of operand, layer, and feature map tile. The hardware costs are estimated, taking into account both data computation and data copy phases. The analytical cost model is validated against measured data from a taped-out depth-first DNN accelerator, DepFiN, showing good modeling accuracy at the end-to-end neural network level. A comparison with generalized state-of-the-art demonstrates up to 10X better solutions found with DeFiNES.Comment: Accepted by HPCA 202

arXiv.org e-Print Archive

TinyVers: A Tiny Versatile System-on-chip with State-Retentive eMRAM for ML Inference at the Extreme Edge

Author: Boons Bert
De Roose Jaro
Giraldo Sebastian
Jain Vikram
Mei Linyan
Verhelst Marian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/01/2023
Field of study

Extreme edge devices or Internet-of-thing nodes require both ultra-low power always-on processing as well as the ability to do on-demand sampling and processing. Moreover, support for IoT applications like voice recognition, machine monitoring, etc., requires the ability to execute a wide range of ML workloads. This brings challenges in hardware design to build flexible processors operating in ultra-low power regime. This paper presents TinyVers, a tiny versatile ultra-low power ML system-on-chip to enable enhanced intelligence at the Extreme Edge. TinyVers exploits dataflow reconfiguration to enable multi-modal support and aggressive on-chip power management for duty-cycling to enable smart sensing applications. The SoC combines a RISC-V host processor, a 17 TOPS/W dataflow reconfigurable ML accelerator, a 1.7

\mu

W deep sleep wake-up controller, and an eMRAM for boot code and ML parameter retention. The SoC can perform up to 17.6 GOPS while achieving a power consumption range from 1.7

\mu

W-20 mW. Multiple ML workloads aimed for diverse applications are mapped on the SoC to showcase its flexibility and efficiency. All the models achieve 1-2 TOPS/W of energy efficiency with power consumption below 230

\mu

W in continuous operation. In a duty-cycling use case for machine monitoring, this power is reduced to below 10

\mu

W.Comment: Accepted in IEEE Journal of Solid-State Circuit

arXiv.org e-Print Archive

Non-benzoquinone geldanamycin analogs trigger various forms of death in human breast cancer cells

Author: A Degterev
A Degterev
A Kamal
A Maloney
Bing Zhu
C Avila
Can Zhou
Cheng-Zhu Wu
CZ Wu
D Li
D Vercammen
DRL Green
DW Zhang
F Humphries
H Xing
Hao Liu
Hong-Mei Li
J Ferlay
J Silke
J Trepel
J Wang
J Zhao
JC Shin
K Sidera
L Chang
L Llanos
L Neckers
L Sun
L Whitesell
Linyan Ma
Lirong Wang
M Arnedos
MC Almagro de
MV Blagosklonny
MV Blagosklonny
Q Zhao
Qixiang Li
RK Ramanathan
S Lheureux
S Sharp
S Zhao
T Zhang
TW Schulte
W Declercq
W Zhou
Xudong Zhang
Yiming Sun
Young-Soo Hong
YS Cho
Z Lin
Zhirui Zhang
Zixuan Zhang
ZN Demidenko
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

White matter abnormalities in adolescents with generalized anxiety disorder: a diffusion tensor imaging study

Author: A Etkin
A Etkin
A Giorgio
A Schienle
B Birmaher
CH Lai
CS Monk
CS Monk
DL Hoffman
EB McClure
Fan Yang
H Barbas
J Choi
J Kaufman
JB Nitschke
JE LeDoux
JM Gorman
JM Hettema
JR Strawn
JR Strawn
K Beesdo
K Blair
K Hua
KM Thomas
L Su
Lingjiang Li
Linyan Su
LM Shin
M Davis
MA Yassa
MD de Bellis
Mei Liao
MJ Dugas
MP Milham
MT Herrero
PA Bandettini
PM Tromp do
RC Kessler
S Goh
S Mori
SM Smith
SM Smith
TE Behrens
TE Nichols
TJ Meyer
Y Zhang
Yan Zhang
Z Peng
Zhong He
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing

Author: Camus Vincent
Enz Christian
Mei Linyan
Verhelst Marian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2019
Field of study

status: Published onlin

Lirias

Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing

Author: Camus Vincent
Enz Christian
Mei Linyan
Verhelst Marian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/01/2020
Field of study

The current trend for deep learning has come with an enormous computational need for billions of Multiply-Accumulate (MAC) operations per inference. Fortunately, reduced precision has demonstrated large benefits with low impact on accuracy, paving the way towards processing in mobile devices and IoT nodes. To this end, various precision-scalable MAC architectures optimized for neural networks have recently been proposed. Yet, it has been hard to comprehend their differences and make a fair judgment of their relative benefits as they have been implemented with different technologies and performance targets. To overcome this, this work exhaustively reviews the state-of-the-art precision-scalable MAC architectures and unifies them in a new taxonomy. Subsequently, these different topologies are thoroughly benchmarked in a 28nm commercial CMOS process, across a wide range of performance targets, and with precision ranging from 2 to 8 bits. Circuits are analyzed for each precision as well as jointly in practical use cases, highlighting the impact of architectures and scalability in terms of energy, throughput, area and bandwidth, aiming to understand the key trends to reduce computation costs in neural-network processing

Infoscience - École polytechnique fédérale de Lausanne

SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

Author: Benini Luca
Jung Victor J.B.
Mei Linyan
Symons Arne
Verhelst Marian
Publication venue: IEEE
Publication date: 07/07/2023
Field of study

To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed. Each DNN layer should be mapped onto the hardware with the most efficient schedule, however, SotA schedulers struggle to consistently provide optimum schedules in a reasonable time across all DNN-HW combinations. This paper proposes SALSA, a fast dual-engine scheduler to generate optimal execution schedules for both even and uneven mapping. We introduce a new strategy, combining exhaustive search with simulated annealing to address the dynamic nature of the loop ordering design space size across layers. SALSA is extensively benchmarked against two SotA schedulers, LOMA [1] and Timeloop [2] on 5 different DNNs, on average SALSA finds schedules with 11.9% and 7.6% lower energy while speeding-up the search by 1.7× and 24× compared to LOMA and Timeloop, respectively

Repository for Publications and Research Data

Sub-Word Parallel Precision-Scalable MAC Engines for Efficient Embedded DNN Inference

Author: Constantin Jeremy
Dandekar Mohit
Debacker Peter
Lauwereins Rudy
Mei Linyan
Rodopoulos Dimitrios
Verhelst Marian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

status: Published onlin

Lirias

Magnesium-Induced Variation of Polyamide Membrane Behavior for the Treatment of Haloacetic Acids in Swimming Pool Waters

Author: Cai Lankun
Cao Guomin
Chen Xueming
Li Yejin
She Qianhong
Sheng Mei
Yang Linyan
Zhao Huihui
Publication venue
Publication date: 01/01/2021
Field of study

Online Research Database In Technology