1,556 research outputs found

    A Phase Change Memory and DRAM Based Framework For Energy-Efficient and High-Speed In-Memory Stochastic Computing

    Get PDF
    Convolutional Neural Networks (CNNs) have proven to be highly effective in various fields related to Artificial Intelligence (AI) and Machine Learning (ML). However, the significant computational and memory requirements of CNNs make their processing highly compute and memory-intensive. In particular, the multiply-accumulate (MAC) operation, which is a fundamental building block of CNNs, requires enormous arithmetic operations. As the input dataset size increases, the traditional processor-centric von-Neumann computing architecture becomes ill-suited for CNN-based applications. This results in exponentially higher latency and energy costs, making the processing of CNNs highly challenging. To overcome these challenges, researchers have explored the Processing-In Memory (PIM) technique, which involves placing the processing unit inside or near the memory unit. This approach reduces data migration length and utilizes the internal memory bandwidth at the memory chip level. However, developing a reliable PIM-based system with minimal hardware modifications and design complexity remains a significant challenge. The proposed solution in the report suggests utilizing different memory technologies, such as Dynamic RAM (DRAM) and phase change memory (PCM), with Stochastic arithmetic and minimal add-on logic. Stochastic computing is a technique that uses random numbers to perform arithmetic operations instead of traditional binary representation. This technique reduces hardware requirements for CNN\u27s arithmetic operations, making it possible to implement them with minimal add-on logic. The report details the workflow for performing arithmetical operations used by CNNs, including MAC, activation, and floating-point functions. The proposed solution includes designs for scalable Stochastic Number Generator (SNG), DRAM CNN accelerator, non-volatile memory (NVM) class PCRAM-based CNN accelerator, and DRAM-based stochastic to binary conversion (StoB) for in-situ deep learning. These designs utilize stochastic computing to reduce the hardware requirements for CNN\u27s arithmetic operations and enable energy and time-efficient processing of CNNs. The report also identifies future research directions for the proposed designs, including in-situ PCRAM-based SNG, ODIN (A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-Situ Neural Network Processing in Phase Change RAM), ATRIA (Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing), and AGNI (In-Situ, Iso-Latency Stochastic-to-Binary Number Conversion for In-DRAM Deep Learning), and presents initial findings for these ideas. In summary, the proposed solution in the report offers a comprehensive approach to address the challenges of processing CNNs, and the proposed designs have the potential to improve the energy and time efficiency of CNNs significantly. Using Stochastic Computing and different memory technologies enables the development of reliable PIM-based systems with minimal hardware modifications and design complexity, providing a promising path for the future of CNN-based applications

    A Memory-Centric Customizable Domain-Specific FPGA Overlay for Accelerating Machine Learning Applications

    Get PDF
    Low latency inferencing is of paramount importance to a wide range of real time and userfacing Machine Learning (ML) applications. Field Programmable Gate Arrays (FPGAs) offer unique advantages in delivering low latency as well as energy efficient accelertors for low latency inferencing. Unfortunately, creating machine learning accelerators in FPGAs is not easy, requiring the use of vendor specific CAD tools and low level digital and hardware microarchitecture design knowledge that the majority of ML researchers do not possess. The continued refinement of High Level Synthesis (HLS) tools can reduce but not eliminate the need for hardware-specific design knowledge. The designs by these tools can also produce inefficient use of FPGA resources that ultimately limit the performance of the neural network. This research investigated a new FPGA-based software-hardware codesigned overlay architecture that opens the advantages of FPGAs to the broader ML user community. As an overlay, the proposed design allows rapid coding and deployment of different ML network configurations and different data-widths, eliminating the prior barrier of needing to resynthesize each design. This brings important attributes of code portability over different FPGA families. The proposed overlay design is a Single-Instruction-Multiple-Data (SIMD) Processor-In-Memory (PIM) architecture developed as a programmable overlay for FPGAs. In contrast to point designs, it can be programmed to implement different types of machine learning algorithms. The overlay architecture integrates bit-serial Arithmetic Logic Units (ALUs) with distributed Block RAMs (BRAMs). The PIM design increases the size of arithmetic operations and on-chip storage capacity. User-visible inference latencies are reduced by exploiting concurrent accesses to network parameters (weights and biases) and partial results stored throughout the distributed BRAMs. Run-time performance comparisons show that the proposed design achieves a speedup compared to HLS-based or custom-tuned equivalent designs. Notably, the proposed design is programmable, allowing rapid design space exploration without the need to resynthesize when changing ML algorithms on the FPGA

    Applications in Electronics Pervading Industry, Environment and Society

    Get PDF
    This book features the manuscripts accepted for the Special Issue “Applications in Electronics Pervading Industry, Environment and Society—Sensing Systems and Pervasive Intelligence” of the MDPI journal Sensors. Most of the papers come from a selection of the best papers of the 2019 edition of the “Applications in Electronics Pervading Industry, Environment and Society” (APPLEPIES) Conference, which was held in November 2019. All these papers have been significantly enhanced with novel experimental results. The papers give an overview of the trends in research and development activities concerning the pervasive application of electronics in industry, the environment, and society. The focus of these papers is on cyber physical systems (CPS), with research proposals for new sensor acquisition and ADC (analog to digital converter) methods, high-speed communication systems, cybersecurity, big data management, and data processing including emerging machine learning techniques. Physical implementation aspects are discussed as well as the trade-off found between functional performance and hardware/system costs

    Model-based fleet deployment in the IoT–edge–cloud continuum

    Get PDF
    With the increasing computing and networking capabilities, IoT devices and edge gateways have become part of a larger IoT–edge–cloud computing continuum, where processing and storage tasks are distributed across the whole network hierarchy, not concentrated only in the cloud. At the same time, this also introduced continuous delivery practices to the development of software components for network-connected gateways and sensing/actuating nodes. These devices are placed on end users’ premises and are characterized by continuously changing cyber-physical contexts, forcing software developers to maintain multiple application versions and frequently redeploy them on a distributed fleet of devices with respect to their current contexts. Doing this correctly and efficiently goes beyond manual capabilities and requires an intelligent and reliable automated solution. This paper describes a model-based approach to automatically assigning multiple software deployment plans to hundreds of edge gateways and connected IoT devices implemented in collaboration with a smart healthcare application provider. From a platform-specific model of an existing edge computing platform, we extract a platform-independent model that describes a list of target devices and a pool of available deployment plans. Next, we use constraint solving to automatically assign deployment plans to devices at once with respect to their specific contexts. The result is transformed back into the platform-specific model and includes a suitable deployment plan for each device, which is then consumed by our engine to deploy software components not only on edge gateways but also on their downstream IoT devices with constrained resources and connectivity. We validate the approach with a fleet deployment prototype integrated into a DevOps toolchain used by the partner application provider. Initial experiments demonstrate the viability of the approach and its usefulness in supporting DevOps for edge and IoT software development.publishedVersio

    Three Studies on Model Transformations - Parsing, Generation and Ease of Use

    Get PDF
    ABSTRACTTransformations play an important part in both software development and the automatic processing of natural languages. We present three publications rooted in the multi-disciplinary research of Language Technology and Software Engineering and relate their contribution to the literature on syntactical transformations. Parsing Linear Context-Free Rewriting SystemsThe first publication describes four different parsing algorithms for the mildly context-sensitive grammar formalism Linear Context-Free Rewriting Systems. The algorithms automatically transform a text into a chart. As a result the parse chart contains the (possibly partial) analysis of the text according to a grammar with a lower level of abstraction than the original text. The uni-directional and endogenous transformations are described within the framework of parsing as deduction. Natural Language Generation from Class DiagramsUsing the framework of Model-Driven Architecture we generate natural language from class diagrams. The transformation is done in two steps. In the first step we transform the class diagram, defined by Executable and Translatable UML, to grammars specified by the Grammatical Framework. The grammars are then used to generate the desired text. Overall, the transformation is uni-directional, automatic and an example of a reverse engineering translation. Executable and Translatable UML - How Difficult Can it Be?Within Model-Driven Architecture there has been substantial research on the transformation from Platform-Independent Models (PIM) into Platform-Specifc Models, less so on the transformation from Computationally Independent Models (CIM) into PIMs. This publication reflects on the outcomes of letting novice software developers transform CIMs specified by UML into PIMs defined in Executable and Translatable UML.ConclusionThe three publications show how model transformations can be used within both Language Technology and Software Engineering to tackle the challenges of natural language processing and software development

    Model-Driven Methodology for Rapid Deployment of Smart Spaces based on Resource-Oriented Architectures

    Get PDF
    Advances in electronics nowadays facilitate the design of smart spaces based on physical mash-ups of sensor and actuator devices. At the same time, software paradigms such as Internet of Things (IoT) and Web of Things (WoT) are motivating the creation of technology to support the development and deployment of web-enabled embedded sensor and actuator devices with two major objectives: (i) to integrate sensing and actuating functionalities into everyday objects, and (ii) to easily allow a diversity of devices to plug into the Internet. Currently, developers who are applying this Internet-oriented approach need to have solid understanding about specific platforms and web technologies. In order to alleviate this development process, this research proposes a Resource-Oriented and Ontology-Driven Development (ROOD) methodology based on the Model Driven Architecture (MDA). This methodology aims at enabling the development of smart spaces through a set of modeling tools and semantic technologies that support the definition of the smart space and the automatic generation of code at hardware level. ROOD feasibility is demonstrated by building an adaptive health monitoring service for a Smart Gym

    Enabling Edge-Intelligence in Resource-Constrained Autonomous Systems

    Get PDF
    The objective of this research is to shift Machine Learning algorithms from resource-extensive server/cloud to compute-limited edge nodes by designing energy-efficient ML systems. Multiple sub-areas of research in this domain are explored for the application of drone autonomous navigation. Our principal goal is to enable the UAV to autonomously navigate using Reinforcement Learning, without incurring any additional hardware or sensor cost. Most of the lightweight UAVs are limited in their resources such as compute capabilities and onboard energy source, and the conventional state-of-the-art ML algorithms cannot be directly implemented on them. This research addresses this issue by devising energy-efficient ML algorithms, modifying existing ML algorithms, designing energy-efficient ML accelerators, and leveraging the hardware-algorithm co-design. RL is notorious for being data-hungry and requires trials and error for it to converge. Hence it cannot be directly implemented on real drones until the issues of safety, data limitations, and reward generation is addressed. Instead of learning the task from scratch, just like humans, RL algorithms can benefit from prior knowledge which can help them converge to their goals in less time and consume less energy. Multiple drones can be collectively used to help each other by sharing their locally learned knowledge. Such distributive systems can help agents learn their respective local tasks faster but may become vulnerable to attacks in the presence of adversarial agents which needs to be addressed. Finally, the improvement in the energy efficiency of RL-based systems achieved from the algorithmic approaches is limited by the underlying hardware and computing architectures. Hence, these need to be redesigned in an application-specific way exploring and exploiting the nature of the most used ML operators This can be done by exploring new computing devices and considering the data reuse and dataflow of ML operators within the architectural design. This research discusses these issues by addressing them and presenting better alternatives. It is concluded that energy consumption at multiple levels of hierarchy needs to be addressed by exploring algorithmic, hardware-based, and algorithm-hardware co-design approaches.Ph.D
    • …
    corecore