9,813 research outputs found

    Optimizing the flash-RAM energy trade-off in deeply embedded systems

    Full text link
    Deeply embedded systems often have the tightest constraints on energy consumption, requiring that they consume tiny amounts of current and run on batteries for years. However, they typically execute code directly from flash, instead of the more energy efficient RAM. We implement a novel compiler optimization that exploits the relative efficiency of RAM by statically moving carefully selected basic blocks from flash to RAM. Our technique uses integer linear programming, with an energy cost model to select a good set of basic blocks to place into RAM, without impacting stack or data storage. We evaluate our optimization on a common ARM microcontroller and succeed in reducing the average power consumption by up to 41% and reducing energy consumption by up to 22%, while increasing execution time. A case study is presented, where an application executes code then sleeps for a period of time. For this example we show that our optimization could allow the application to run on battery for up to 32% longer. We also show that for this scenario the total application energy can be reduced, even if the optimization increases the execution time of the code

    Dynamic Virtual Page-based Flash Translation Layer with Novel Hot Data Identification and Adaptive Parallelism Management

    Get PDF
    Solid-state disks (SSDs) tend to replace traditional motor-driven hard disks in high-end storage devices in past few decades. However, various inherent features, such as out-of-place update [resorting to garbage collection (GC)] and limited endurance (resorting to wear leveling), need to be reduced to a large extent before that day comes. Both the GC and wear leveling fundamentally depend on hot data identification (HDI). In this paper, we propose a hot data-aware flash translation layer architecture based on a dynamic virtual page (DVPFTL) so as to improve the performance and lifetime of NAND flash devices. First, we develop a generalized dual layer HDI (DL-HDI) framework, which is composed of a cold data pre-classifier and a hot data post-identifier. Those can efficiently follow the frequency and recency of information access. Then, we design an adaptive parallelism manager (APM) to assign the clustered data chunks to distinct resident blocks in the SSD so as to prolong its endurance. Finally, the experimental results from our realized SSD prototype indicate that the DVPFTL scheme has reliably improved the parallelizability and endurance of NAND flash devices with improved GC-costs, compared with related works.Peer reviewe

    On Benchmarking Embedded Linux Flash File Systems

    Full text link
    Due to its attractive characteristics in terms of performance, weight and power consumption, NAND flash memory became the main non volatile memory (NVM) in embedded systems. Those NVMs also present some specific characteristics/constraints: good but asymmetric I/O performance, limited lifetime, write/erase granularity asymmetry, etc. Those peculiarities are either managed in hardware for flash disks (SSDs, SD cards, USB sticks, etc.) or in software for raw embedded flash chips. When managed in software, flash algorithms and structures are implemented in a specific flash file system (FFS). In this paper, we present a performance study of the most widely used FFSs in embedded Linux: JFFS2, UBIFS,and YAFFS. We show some very particular behaviors and large performance disparities for tested FFS operations such as mounting, copying, and searching file trees, compression, etc.Comment: Embed With Linux, Lorient : France (2012

    Self-Learning Hot Data Prediction: Where Echo State Network Meets NAND Flash Memories

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Well understanding the access behavior of hot data is significant for NAND flash memory due to its crucial impact on the efficiency of garbage collection (GC) and wear leveling (WL), which respectively dominate the performance and life span of SSD. Generally, both GC and WL rely greatly on the recognition accuracy of hot data identification (HDI). However, in this paper, the first time we propose a novel concept of hot data prediction (HDP), where the conventional HDI becomes unnecessary. First, we develop a hybrid optimized echo state network (HOESN), where sufficiently unbiased and continuously shrunk output weights are learnt by a sparse regression based on L2 and L1/2 regularization. Second, quantum-behaved particle swarm optimization (QPSO) is employed to compute reservoir parameters (i.e., global scaling factor, reservoir size, scaling coefficient and sparsity degree) for further improving prediction accuracy and reliability. Third, in the test on a chaotic benchmark (Rossler), the HOESN performs better than those of six recent state-of-the-art methods. Finally, simulation results about six typical metrics tested on five real disk workloads and on-chip experiment outcomes verified from an actual SSD prototype indicate that our HOESN-based HDP can reliably promote the access performance and endurance of NAND flash memories.Peer reviewe

    An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

    Get PDF
    We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc

    Optimization and evaluation of variability in the programming window of a flash cell with molecular metal-oxide storage

    Get PDF
    We report a modeling study of a conceptual nonvolatile memory cell based on inorganic molecular metal-oxide clusters as a storage media embedded in the gate dielectric of a MOSFET. For the purpose of this paper, we developed a multiscale simulation framework that enables the evaluation of variability in the programming window of a flash cell with sub-20-nm gate length. Furthermore, we studied the threshold voltage variability due to random dopant fluctuations and fluctuations in the distribution of the molecular clusters in the cell. The simulation framework and the general conclusions of our work are transferrable to flash cells based on alternative molecules used for a storage media

    A machine learning-based approach to optimize repair and increase yield of embedded flash memories in automotive systems-on-chip

    Get PDF
    Nowadays, Embedded Flash Memory cores occupy a significant portion of Automotive Systems-on-Chip area, therefore strongly contributing to the final yield of the devices. Redundancy strategies play a key role in this context; in case of memory failures, a set of spare word- and bit-lines are allocated by a replacement algorithm that complements the memory testing procedure. In this work, we show that replacement algorithms, which are heavily constrained in terms of execution time, may be slightly inaccurate and lead to classify a repairable memory core as unrepairable. We denote this situation as Flash memory false fail. The proposed approach aims at identifying false fails by using a Machine Learning approach that exploits a feature extraction strategy based on shape recognition. Experimental results carried out on the manufacturing data show a high capability of predicting false fails

    Design and Verification of a Dual Port RAM Using UVM Methodology

    Get PDF
    Data-intensive applications such as Deep Learning, Big Data, and Computer Vision have resulted in more demand for on-chip memory storage. Hence, state of the art Systems on Chips (SOCs) have a memory that occupies somewhere between 50% to 90 % of the die space. Extensive Research is being done in the field of memory technology to improve the efficiency of memory packaging. This effort has not always been successful because densely packed memory structures can experience defects during the fabrication process. Thus, it is critical to test the embedded memory modules once they are taped out. Along with testing, functional verification of a module makes sure that the design works the way it has been intended to perform. This paper proposes a built-in self-test (BIST) to validate a Dual Port Static RAM module and a complete layered test bench to verify the module’s operation functionally. The BIST has been designed using a finite state machine and has been targeted against most of the general SRAM faults in a given linear time constraint of O(23n). The layered test bench has been designed using Universal Verification Methodology (UVM), a standardized class library which has increased the re-usability and automation to the existing design verification language, SystemVerilog

    Design Solutions For Modular Satellite Architectures

    Get PDF
    The cost-effective access to space envisaged by ESA would open a wide range of new opportunities and markets, but is still many years ahead. There is still a lack of devices, circuits, systems which make possible to develop satellites, ground stations and related services at costs compatible with the budget of academic institutions and small and medium enterprises (SMEs). As soon as the development time and cost of small satellites will fall below a certain threshold (e.g. 100,000 to 500,000 €), appropriate business models will likely develop to ensure a cost-effective and pervasive access to space, and related infrastructures and services. These considerations spurred the activity described in this paper, which is aimed at: - proving the feasibility of low-cost satellites using COTS (Commercial Off The Shelf) devices. This is a new trend in the space industry, which is not yet fully exploited due to the belief that COTS devices are not reliable enough for this kind of applications; - developing a flight model of a flexible and reliable nano-satellite with less than 25,000€; - training students in the field of avionics space systems: the design here described is developed by a team including undergraduate students working towards their graduation work. The educational aspects include the development of specific new university courses; - developing expertise in the field of low-cost avionic systems, both internally (university staff) and externally (graduated students will bring their expertise in their future work activity); - gather and cluster expertise and resources available inside the university around a common high-tech project; - creating a working group composed of both University and SMEs devoted to the application of commercially available technology to space environment. The first step in this direction was the development of a small low cost nano-satellite, started in the year 2004: the name of this project was PiCPoT (Piccolo Cubo del Politecnico di Torino, Small Cube of Politecnico di Torino). The project was carried out by some departments of the Politecnico, in particular Electronics and Aerospace. The main goal of the project was to evaluate the feasibility of using COTS components in a space project in order to greatly reduce costs; the design exploited internal subsystems modularity to allow reuse and further cost reduction for future missions. Starting from the PiCPoT experience, in 2006 we began a new project called ARaMiS (Speretta et al., 2007) which is the Italian acronym for Modular Architecture for Satellites. This work describes how the architecture of the ARaMiS satellite has been obtained from the lesson learned from our former experience. Moreover we describe satellite operations, giving some details of the major subsystems. This work is composed of two parts. The first one describes the design methodology, solutions and techniques that we used to develop the PiCPoT satellite; it gives an overview of its operations, with some details of the major subsystems. Details on the specifications can also be found in (Del Corso et al., 2007; Passerone et al, 2008). The second part, indeed exploits the experience achieved during the PiCPoT development and describes a proposal for a low-cost modular architecture for satellite
    • …
    corecore