24 research outputs found
Grassroots Operator Search for Model Edge Adaptation
Hardware-aware Neural Architecture Search (HW-NAS) is increasingly being used
to design efficient deep learning architectures. An efficient and flexible
search space is crucial to the success of HW-NAS. Current approaches focus on
designing a macro-architecture and searching for the architecture's
hyperparameters based on a set of possible values. This approach is biased by
the expertise of deep learning (DL) engineers and standard modeling approaches.
In this paper, we present a Grassroots Operator Search (GOS) methodology. Our
HW-NAS adapts a given model for edge devices by searching for efficient
operator replacement. We express each operator as a set of mathematical
instructions that capture its behavior. The mathematical instructions are then
used as the basis for searching and selecting efficient replacement operators
that maintain the accuracy of the original model while reducing computational
complexity. Our approach is grassroots since it relies on the mathematical
foundations to construct new and efficient operators for DL architectures. We
demonstrate on various DL models, that our method consistently outperforms the
original models on two edge devices, namely Redmi Note 7S and Raspberry Pi3,
with a minimum of 2.2x speedup while maintaining high accuracy. Additionally,
we showcase a use case of our GOS approach in pulse rate estimation on
wristband devices, where we achieve state-of-the-art performance, while
maintaining reduced computational complexity, demonstrating the effectiveness
of our approach in practical applications
A flexible and fast PyTorch toolkit for simulating training and inference on analog crossbar arrays
We introduce the IBM Analog Hardware Acceleration Kit, a new and first of a
kind open source toolkit to simulate analog crossbar arrays in a convenient
fashion from within PyTorch (freely available at
https://github.com/IBM/aihwkit). The toolkit is under active development and is
centered around the concept of an "analog tile" which captures the computations
performed on a crossbar array. Analog tiles are building blocks that can be
used to extend existing network modules with analog components and compose
arbitrary artificial neural networks (ANNs) using the flexibility of the
PyTorch framework. Analog tiles can be conveniently configured to emulate a
plethora of different analog hardware characteristics and their non-idealities,
such as device-to-device and cycle-to-cycle variations, resistive device
response curves, and weight and output noise. Additionally, the toolkit makes
it possible to design custom unit cell configurations and to use advanced
analog optimization algorithms such as Tiki-Taka. Moreover, the backward and
update behavior can be set to "ideal" to enable hardware-aware training
features for chips that target inference acceleration only. To evaluate the
inference accuracy of such chips over time, we provide statistical programming
noise and drift models calibrated on phase-change memory hardware. Our new
toolkit is fully GPU accelerated and can be used to conveniently estimate the
impact of material properties and non-idealities of future analog technology on
the accuracy for arbitrary ANNs.Comment: Submitted to AICAS202
Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference
Analog In-Memory Computing (AIMC) is a promising approach to reduce the
latency and energy consumption of Deep Neural Network (DNN) inference and
training. However, the noisy and non-linear device characteristics, and the
non-ideal peripheral circuitry in AIMC chips, require adapting DNNs to be
deployed on such hardware to achieve equivalent accuracy to digital computing.
In this tutorial, we provide a deep dive into how such adaptations can be
achieved and evaluated using the recently released IBM Analog Hardware
Acceleration Kit (AIHWKit), freely available at https://github.com/IBM/aihwkit.
The AIHWKit is a Python library that simulates inference and training of DNNs
using AIMC. We present an in-depth description of the AIHWKit design,
functionality, and best practices to properly perform inference and training.
We also present an overview of the Analog AI Cloud Composer, that provides the
benefits of using the AIHWKit simulation platform in a fully managed cloud
setting. Finally, we show examples on how users can expand and customize
AIHWKit for their own needs. This tutorial is accompanied by comprehensive
Jupyter Notebook code examples that can be run using AIHWKit, which can be
downloaded from https://github.com/IBM/aihwkit/tree/master/notebooks/tutorial
An architecture for reconfigurable iterative MPI applications in dynamic environments
Abstract. With the proliferation of large scale dynamic execution environments such as grids, the need for providing efficient and scalable application adaptation strategies for long running parallel and distributed applications has emerged. Message passing interfaces have been initially designed with a traditional machine model in mind which assumes homogeneous and static environments. It is inevitable that long running message passing applications will require support for dynamic reconfiguration to maintain high performance under varying load conditions. In this paper we describe a framework that provides iterative MPI applications with reconfiguration capabilities. Our approach is based on integrating MPI applications with a middleware that supports process migration and large scale distributed application reconfiguration. We present our architecture for reconfiguring MPI applications, and verify our design with a heat diffusion application in a dynamic setting.