Search CORE

24 research outputs found

Grassroots Operator Search for Model Edge Adaptation

Author: Benmeziane Hadjer
Maghraoui Kaoutar El
Niar Smail
Ouarnoughi Hamza
Publication venue
Publication date: 20/09/2023
Field of study

Hardware-aware Neural Architecture Search (HW-NAS) is increasingly being used to design efficient deep learning architectures. An efficient and flexible search space is crucial to the success of HW-NAS. Current approaches focus on designing a macro-architecture and searching for the architecture's hyperparameters based on a set of possible values. This approach is biased by the expertise of deep learning (DL) engineers and standard modeling approaches. In this paper, we present a Grassroots Operator Search (GOS) methodology. Our HW-NAS adapts a given model for edge devices by searching for efficient operator replacement. We express each operator as a set of mathematical instructions that capture its behavior. The mathematical instructions are then used as the basis for searching and selecting efficient replacement operators that maintain the accuracy of the original model while reducing computational complexity. Our approach is grassroots since it relies on the mathematical foundations to construct new and efficient operators for DL architectures. We demonstrate on various DL models, that our method consistently outperforms the original models on two edge devices, namely Redmi Note 7S and Raspberry Pi3, with a minimum of 2.2x speedup while maintaining high accuracy. Additionally, we showcase a use case of our GOS approach in pulse rate estimation on wristband devices, where we achieve state-of-the-art performance, while maintaining reduced computational complexity, demonstrating the effectiveness of our approach in practical applications

arXiv.org e-Print Archive

A flexible and fast PyTorch toolkit for simulating training and inference on analog crossbar arrays

Author: Carta Fabio
Gallo Manuel Le
Gokmen Tayfun
Goldberg Cindy
Maghraoui Kaoutar El
Moreda Diego
Narayanan Vijay
Rasch Malte J.
Sebastian Abu
Publication venue
Publication date: 05/04/2021
Field of study

We introduce the IBM Analog Hardware Acceleration Kit, a new and first of a kind open source toolkit to simulate analog crossbar arrays in a convenient fashion from within PyTorch (freely available at https://github.com/IBM/aihwkit). The toolkit is under active development and is centered around the concept of an "analog tile" which captures the computations performed on a crossbar array. Analog tiles are building blocks that can be used to extend existing network modules with analog components and compose arbitrary artificial neural networks (ANNs) using the flexibility of the PyTorch framework. Analog tiles can be conveniently configured to emulate a plethora of different analog hardware characteristics and their non-idealities, such as device-to-device and cycle-to-cycle variations, resistive device response curves, and weight and output noise. Additionally, the toolkit makes it possible to design custom unit cell configurations and to use advanced analog optimization algorithms such as Tiki-Taka. Moreover, the backward and update behavior can be set to "ideal" to enable hardware-aware training features for chips that target inference acceleration only. To evaluate the inference accuracy of such chips over time, we provide statistical programming noise and drift models calibrated on phase-change memory hardware. Our new toolkit is fully GPU accelerated and can be used to conveniently estimate the impact of material properties and non-idealities of future analog technology on the accuracy for arbitrary ANNs.Comment: Submitted to AICAS202

arXiv.org e-Print Archive

Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

Author: Buechel Julian
Carta Fabio
Fagbohungbe Omobayode
Gallo Manuel Le
Lammie Corey
Mackin Charles
Maghraoui Kaoutar El
Narayanan Vijay
Rasch Malte J.
Sebastian Abu
Tsai Hsinyu
Publication venue
Publication date: 18/07/2023
Field of study

Analog In-Memory Computing (AIMC) is a promising approach to reduce the latency and energy consumption of Deep Neural Network (DNN) inference and training. However, the noisy and non-linear device characteristics, and the non-ideal peripheral circuitry in AIMC chips, require adapting DNNs to be deployed on such hardware to achieve equivalent accuracy to digital computing. In this tutorial, we provide a deep dive into how such adaptations can be achieved and evaluated using the recently released IBM Analog Hardware Acceleration Kit (AIHWKit), freely available at https://github.com/IBM/aihwkit. The AIHWKit is a Python library that simulates inference and training of DNNs using AIMC. We present an in-depth description of the AIHWKit design, functionality, and best practices to properly perform inference and training. We also present an overview of the Analog AI Cloud Composer, that provides the benefits of using the AIHWKit simulation platform in a fully managed cloud setting. Finally, we show examples on how users can expand and customize AIHWKit for their own needs. This tutorial is accompanied by comprehensive Jupyter Notebook code examples that can be run using AIHWKit, which can be downloaded from https://github.com/IBM/aihwkit/tree/master/notebooks/tutorial

arXiv.org e-Print Archive

Malleable applications for scalable high performance computing

Author: A. Szalay
C. Varela
C.A. Varela
Carlos A. Varela
D.P. Anderson
F. Berman
K.E. Maghraoui
K.E. Maghraoui
Kaoutar El Maghraoui
Message Passing Interface Forum
O. Sievert
R. Wolski
S.S. Vadhiyar
Travis Desell
V. Pande
Z. Lan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An architecture for reconfigurable iterative MPI applications in dynamic environments

Author: Boleslaw K. Szymanski
Carlos Varela
Kaoutar El Maghraoui
Publication venue: Springer Verlag
Publication date: 01/01/2005
Field of study

Abstract. With the proliferation of large scale dynamic execution environments such as grids, the need for providing efficient and scalable application adaptation strategies for long running parallel and distributed applications has emerged. Message passing interfaces have been initially designed with a traditional machine model in mind which assumes homogeneous and static environments. It is inevitable that long running message passing applications will require support for dynamic reconfiguration to maintain high performance under varying load conditions. In this paper we describe a framework that provides iterative MPI applications with reconfiguration capabilities. Our approach is based on integrating MPI applications with a middleware that supports process migration and large scale distributed application reconfiguration. We present our architecture for reconfiguring MPI applications, and verify our design with a heat diffusion application in a dynamic setting.

CiteSeerX