Search CORE

40 research outputs found

Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

Author: Lammie Corey
Publication venue
Publication date: 01/01/2022
Field of study

Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

ResearchOnline at James Cook University

Doctor of Philosophy

Author: Ardestani Ali Shafiee
Publication venue: University of Utah
Publication date: 01/01/2018
Field of study

dissertationDeep Neural Networks (DNNs) are the state-of-art solution in a growing number of tasks including computer vision, speech recognition, and genomics. However, DNNs are computationally expensive as they are carefully trained to extract and abstract features from raw data using multiple layers of neurons with millions of parameters. In this dissertation, we primarily focus on inference, e.g., using a DNN to classify an input image. This is an operation that will be repeatedly performed on billions of devices in the datacenter, in self-driving cars, in drones, etc. We observe that DNNs spend a vast majority of their runtime to runtime performing matrix-by-vector multiplications (MVM). MVMs have two major bottlenecks: fetching the matrix and performing sum-of-product operations. To address these bottlenecks, we use in-situ computing, where the matrix is stored in programmable resistor arrays, called crossbars, and sum-of-product operations are performed using analog computing. In this dissertation, we propose two hardware units, ISAAC and Newton.In ISAAC, we show that in-situ computing designs can outperform DNN digital accelerators, if they leverage pipelining, smart encodings, and can distribute a computation in time and space, within crossbars, and across crossbars. In the ISAAC design, roughly half the chip area/power can be attributed to the analog-to-digital conversion (ADC), i.e., it remains the key design challenge in mixed-signal accelerators for deep networks. In spite of the ADC bottleneck, ISAAC is able to out-perform the computational efficiency of the state-of-the-art design (DaDianNao) by 8x. In Newton, we take advantage of a number of techniques to address ADC inefficiency. These techniques exploit matrix transformations, heterogeneity, and smart mapping of computation to the analog substrate. We show that Newton can increase the efficiency of in-situ computing by an additional 2x. Finally, we show that in-situ computing, unfortunately, cannot be easily adapted to handle training of deep networks, i.e., it is only suitable for inference of already-trained networks. By improving the efficiency of DNN inference with ISAAC and Newton, we move closer to low-cost deep learning that in turn will have societal impact through self-driving cars, assistive systems for the disabled, and precision medicine

The University of Utah: J. Willard Marriott Digital Library

Towards Efficient In-memory Computing Hardware for Quantized Neural Networks: State-of-the-art, Open Challenges and Perspectives

Author: Krestinskaya Olga
Salama Khaled Nabil
Zhang Li
Publication venue
Publication date: 08/07/2023
Field of study

The amount of data processed in the cloud, the development of Internet-of-Things (IoT) applications, and growing data privacy concerns force the transition from cloud-based to edge-based processing. Limited energy and computational resources on edge push the transition from traditional von Neumann architectures to In-memory Computing (IMC), especially for machine learning and neural network applications. Network compression techniques are applied to implement a neural network on limited hardware resources. Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption. This paper provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation. Moreover, open challenges, QNN design requirements, recommendations, and perspectives along with an IMC-based QNN hardware roadmap are provided

arXiv.org e-Print Archive

Modeling and simulating in-memory memristive deep learning systems: an overview of current efforts

Author: Lammie Corey
Rahimiazghadi Mostafa
Xiang Wei
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Deep Learning (DL) systems have demonstrated unparalleled performance in many challenging engineering applications. As the complexity of these systems inevitably increase, they require increased processing capabilities and consume larger amounts of power, which are not readily available in resource-constrained processors, such as Internet of Things (IoT) edge devices. Memristive In-Memory Computing (IMC) systems for DL, entitled Memristive Deep Learning Systems (MDLSs), that perform the computation and storage of repetitive operations in the same physical location using emerging memory devices, can be used to augment the performance of traditional DL architectures; massively reducing their power consumption and latency. However, memristive devices, such as Resistive Random-Access Memory (RRAM) and Phase-Change Memory (PCM), are difficult and cost-prohibitive to fabricate in small quantities, and are prone to various device non-idealities that must be accounted for. Consequently, the popularity of simulation frameworks, used to simulate MDLS prior to circuit-level realization, is burgeoning. In this paper, we provide a survey of existing simulation frameworks and related tools used to model large-scale MDLS. Moreover, we perform direct performance comparisons of modernized open-source simulation frameworks, and provide insights into future modeling and simulation strategies and approaches. We hope that this treatise is beneficial to the large computers and electrical engineering community, and can help readers better understand available tools and techniques for MDLS development

ResearchOnline at James Cook University

Recommended from our members

ANALOG SIGNAL PROCESSING SOLUTIONS AND DESIGN OF MEMRISTOR-CMOS ANALOG CO-PROCESSOR FOR ACCELERATION OF HIGH-PERFORMANCE COMPUTING APPLICATIONS

Author: Athreyas Nihar
Publication venue: ScholarWorks@UMass Amherst
Publication date: 16/07/2018
Field of study

Emerging applications in the field of machine vision, deep learning and scientific simulation require high computational speed and are run on platforms that are size, weight and power constrained. With the transistor scaling coming to an end, existing digital hardware architectures will not be able to meet these ever-increasing demands. Analog computation with its rich set of primitives and inherent parallel architecture can be faster, more efficient and compact for some of these applications. The major contribution of this work is to show that analog processing can be a viable solution to this problem. This is demonstrated in the three parts of the dissertation. In the first part of the dissertation, we demonstrate that analog processing can be used to solve the problem of stereo correspondence. Novel modifications to the algorithms are proposed which improves the computational speed and makes them efficiently implementable in analog hardware. The analog domain implementation provides further speedup in computation and has lower power consumption than a digital implementation. In the second part of the dissertation, a prototype of an analog processor was developed using commercially available off-the-shelf components. The focus was on providing experimental results that demonstrate functionality and to show that the performance of the prototype for low-level and mid-level image processing tasks is equivalent to a digital implementation. To demonstrate improvement in speed and power consumption, an integrated circuit design of the analog processor was proposed, and it was shown that such an analog processor would be faster than state-of-the-art digital and other analog processors. In the third part of the dissertation, a memristor-CMOS analog co-processor that can perform floating point vector matrix multiplication (VMM) is proposed. VMM computation underlies some of the major applications. To demonstrate the working of the analog co-processor at a system level, a new tool called PSpice Systems Option is used. It is shown that the analog co-processor has a superior performance when compared to the projected performances of digital and analog processors. Using the new tool, various application simulations for image processing and solution to partial differential equations are performed on the co-processor model

ScholarWorks@UMass Amherst

The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview

Author: Castrillon Jeronimo
De Lima João Paulo C.
Farzaneh Hamid
Khan Asif Ali
Publication venue
Publication date: 24/01/2024
Field of study

In today's data-centric world, where data fuels numerous application domains, with machine learning at the forefront, handling the enormous volume of data efficiently in terms of time and energy presents a formidable challenge. Conventional computing systems and accelerators are continually being pushed to their limits to stay competitive. In this context, computing near-memory (CNM) and computing-in-memory (CIM) have emerged as potentially game-changing paradigms. This survey introduces the basics of CNM and CIM architectures, including their underlying technologies and working principles. We focus particularly on CIM and CNM architectures that have either been prototyped or commercialized. While surveying the evolving CIM and CNM landscape in academia and industry, we discuss the potential benefits in terms of performance, energy, and cost, along with the challenges associated with these cutting-edge computing paradigms

arXiv.org e-Print Archive