102,704 research outputs found

    High-Level Technology Mapping for Memories

    Get PDF
    In this paper, we consider memory-mapping problems in High-Level Synthesis. We focus on the port mapping, bit-width mapping and word mapping, respectively. A 0-1 Integer Linear Programming (ILP) technique is used to solve the mapping problems, which synthesizes the source memory using one or more memory modules from a target memory library at a higher level. This method can not only perform bit-width mapping and word mapping, but it can also perform port mapping at the same time. Experimental results indicate that ILP approach is an effective method for memory reuse in high-level synthesis

    Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories

    Full text link
    Modern computing systems are embracing hybrid memory comprising of DRAM and non-volatile memory (NVM) to combine the best properties of both memory technologies, achieving low latency, high reliability, and high density. A prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access latency much higher than DRAM access latency. We call this inter-memory asymmetry. We observe that parasitic components on a long bitline are a major source of high latency in both DRAM and NVM, and a significant factor contributing to high-voltage operations in NVM, which impact their reliability. We propose an architectural change, where each long bitline in DRAM and NVM is split into two segments by an isolation transistor. One segment can be accessed with lower latency and operating voltage than the other. By introducing tiers, we enable non-uniform accesses within each memory type (which we call intra-memory asymmetry), leading to performance and reliability trade-offs in DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we exploit both inter- and intra-memory asymmetries to allocate and migrate memory pages between the tiers in DRAM and NVM. Second, we improve the OS's page allocation decisions by predicting the access intensity of a newly-referenced memory page in a program and placing it to a matching tier during its initial allocation. This minimizes page migrations during program execution, lowering the performance overhead. Third, we propose a solution to migrate pages between the tiers of the same memory without transferring data over the memory channel, minimizing channel occupancy and improving performance. Our overall approach, which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid tiered memory improves both performance and reliability for both single-core and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium on Memory Managemen

    Rewriting Codes for Joint Information Storage in Flash Memories

    Get PDF
    Memories whose storage cells transit irreversibly between states have been common since the start of the data storage technology. In recent years, flash memories have become a very important family of such memories. A flash memory cell has q states—state 0.1.....q-1 - and can only transit from a lower state to a higher state before the expensive erasure operation takes place. We study rewriting codes that enable the data stored in a group of cells to be rewritten by only shifting the cells to higher states. Since the considered state transitions are irreversible, the number of rewrites is bounded. Our objective is to maximize the number of times the data can be rewritten. We focus on the joint storage of data in flash memories, and study two rewriting codes for two different scenarios. The first code, called floating code, is for the joint storage of multiple variables, where every rewrite changes one variable. The second code, called buffer code, is for remembering the most recent data in a data stream. Many of the codes presented here are either optimal or asymptotically optimal. We also present bounds to the performance of general codes. The results show that rewriting codes can integrate a flash memory’s rewriting capabilities for different variables to a high degree

    Non-power-of-Two FFTs: Exploring the Flexibility of the Montium TP

    Get PDF
    Coarse-grain reconfigurable architectures, like the Montium TP, have proven to be a very successful approach for low-power and high-performance computation of regular digital signal processing algorithms. This paper presents the implementation of a class of non-power-of-two FFTs to discover the limitations and Flexibility of the Montium TP for less regular algorithms. A non-power-of-two FFT is less regular compared to a traditional power-of-two FFT. The results of the implementation show the processing time, accuracy, energy consumption and Flexibility of the implementation

    A Cache Management Strategy to Replace Wear Leveling Techniques for Embedded Flash Memory

    Full text link
    Prices of NAND flash memories are falling drastically due to market growth and fabrication process mastering while research efforts from a technological point of view in terms of endurance and density are very active. NAND flash memories are becoming the most important storage media in mobile computing and tend to be less confined to this area. The major constraint of such a technology is the limited number of possible erase operations per block which tend to quickly provoke memory wear out. To cope with this issue, state-of-the-art solutions implement wear leveling policies to level the wear out of the memory and so increase its lifetime. These policies are integrated into the Flash Translation Layer (FTL) and greatly contribute in decreasing the write performance. In this paper, we propose to reduce the flash memory wear out problem and improve its performance by absorbing the erase operations throughout a dual cache system replacing FTL wear leveling and garbage collection services. We justify this idea by proposing a first performance evaluation of an exclusively cache based system for embedded flash memories. Unlike wear leveling schemes, the proposed cache solution reduces the total number of erase operations reported on the media by absorbing them in the cache for workloads expressing a minimal global sequential rate.Comment: Ce papier a obtenu le "Best Paper Award" dans le "Computer System track" nombre de page: 8; International Symposium on Performance Evaluation of Computer & Telecommunication Systems, La Haye : Netherlands (2011

    The Chameleon Architecture for Streaming DSP Applications

    Get PDF
    We focus on architectures for streaming DSP applications such as wireless baseband processing and image processing. We aim at a single generic architecture that is capable of dealing with different DSP applications. This architecture has to be energy efficient and fault tolerant. We introduce a heterogeneous tiled architecture and present the details of a domain-specific reconfigurable tile processor called Montium. This reconfigurable processor has a small footprint (1.8 mm2^2 in a 130 nm process), is power efficient and exploits the locality of reference principle. Reconfiguring the device is very fast, for example, loading the coefficients for a 200 tap FIR filter is done within 80 clock cycles. The tiles on the tiled architecture are connected to a Network-on-Chip (NoC) via a network interface (NI). Two NoCs have been developed: a packet-switched and a circuit-switched version. Both provide two types of services: guaranteed throughput (GT) and best effort (BE). For both NoCs estimates of power consumption are presented. The NI synchronizes data transfers, configures and starts/stops the tile processor. For dynamically mapping applications onto the tiled architecture, we introduce a run-time mapping tool

    Configurable unitary transformations and linear logic gates using quantum memories

    Get PDF
    We show that a set of optical memories can act as a configurable linear optical network operating on frequency-multiplexed optical states. Our protocol is applicable to any quantum memories that employ off-resonant Raman transitions to store optical information in atomic spins. In addition to the configurability, the protocol also offers favourable scaling with an increasing number of modes where N memories can be configured to implement an arbitrary N-mode unitary operations during storage and readout. We demonstrate the versatility of this protocol by showing an example where cascaded memories are used to implement a conditional CZ gate.Comment: 5 pages, 2 figure
    corecore