7 research outputs found

    Energy-Efficient In-Memory Architectures Leveraging Intrinsic Behaviors of Embedded MRAM Devices

    Get PDF
    For decades, innovations to surmount the processor versus memory gap and move beyond conventional von Neumann architectures continue to be sought and explored. Recent machine learning models still expend orders of magnitude more time and energy to access data in memory in addition to merely performing the computation itself. This phenomenon referred to as a memory-wall bottleneck, is addressed herein via a completely fresh perspective on logic and memory technology design. The specific solutions developed in this dissertation focus on utilizing intrinsic switching behaviors of embedded MRAM devices to design cross-layer and energy-efficient Compute-in-Memory (CiM) architectures, accelerate the computationally-intensive operations in various Artificial Neural Networks (ANNs), achieve higher density and reduce the power consumption as crucial requirements in future Internet of Things (IoT) devices. The first cross-layer platform developed herein is an Approximate Generative Adversarial Network (ApGAN) designed to accelerate the Generative Adversarial Networks from both algorithm and hardware implementation perspectives. In addition to binarizing the weights, further reduction in storage and computation resources is achieved by leveraging an in-memory addition scheme. Moreover, a memristor-based CiM accelerator for ApGAN is developed. The second design is a biologically-inspired memory architecture. The Short-Term Memory and Long-Term Memory features in biology are realized in hardware via a beyond-CMOS-based learning approach derived from the repeated input information and retrieval of the encoded data. The third cross-layer architecture is a programmable energy-efficient hardware implementation for Recurrent Neural Network with ultra-low power, area-efficient spin-based activation functions. A novel CiM architecture is proposed to leverage data-level parallelism during the evaluation phase. Specifically, we employ an MRAM-based Adjustable Probabilistic Activation Function (APAF) via a low-power tunable activation mechanism, providing adjustable accuracy levels to mimic ideal sigmoid and tanh thresholding along with a matching algorithm to regulate neuronal properties. Finally, the APAF design is utilized in the Long Short-Term Memory (LSTM) network to evaluate the network performance using binary and non-binary activation functions. The simulation results indicate up to 74.5 x 215; energy-efficiency, 35-fold speedup and ~11x area reduction compared with the similar baseline designs. These can form basis for future post-CMOS based non-Von Neumann architectures suitable for intermittently powered energy harvesting devices capable of pushing intelligence towards the edge of computing network

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

    Get PDF
    The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, sometimes even better than, the original dense networks. Sparsity promises to reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered

    Actas de las XXXIV Jornadas de Automática

    Get PDF
    Postprint (published version
    corecore