Search CORE

102,704 research outputs found

High-Level Technology Mapping for Memories

Author: Cao Wei
Lin Zhenghui
Zhou Haifeng
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 20/02/2012
Field of study

In this paper, we consider memory-mapping problems in High-Level Synthesis. We focus on the port mapping, bit-width mapping and word mapping, respectively. A 0-1 Integer Linear Programming (ILP) technique is used to solve the mapping problems, which synthesizes the source memory using one or more memory modules from a target memory library at a higher level. This method can not only perform bit-width mapping and word mapping, but it can also perform port mapping at the same time. Experimental results indicate that ILP approach is an effective method for memory reuse in high-level synthesis

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories

Author: Antognetti P.
Arafa M.
Arjomand M.
Bhattacharyya A.
Blagodurov S.
Cao Y.
Chang Y.-M.
Cho B.-H.
Das A.
Das A.
Dray C.
Goda A.
Huang Y.
Jayasena N. S.
Kang U.
Kim Y.
Lee D.
Mallik A.
Mutlu O.
Mutlu O.
Pourshirazi B.
Qureshi M. K.
Qureshi M. K.
Redaelli A.
Rixner S.
Sandhu B. S.
Seong N. H.
Seshadri V.
Srinivasan J.
Stuecheli J.
Yoon H.
Yue J.
Zhang L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2020
Field of study

Modern computing systems are embracing hybrid memory comprising of DRAM and non-volatile memory (NVM) to combine the best properties of both memory technologies, achieving low latency, high reliability, and high density. A prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access latency much higher than DRAM access latency. We call this inter-memory asymmetry. We observe that parasitic components on a long bitline are a major source of high latency in both DRAM and NVM, and a significant factor contributing to high-voltage operations in NVM, which impact their reliability. We propose an architectural change, where each long bitline in DRAM and NVM is split into two segments by an isolation transistor. One segment can be accessed with lower latency and operating voltage than the other. By introducing tiers, we enable non-uniform accesses within each memory type (which we call intra-memory asymmetry), leading to performance and reliability trade-offs in DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we exploit both inter- and intra-memory asymmetries to allocate and migrate memory pages between the tiers in DRAM and NVM. Second, we improve the OS's page allocation decisions by predicting the access intensity of a newly-referenced memory page in a program and placing it to a matching tier during its initial allocation. This minimizes page migrations during program execution, lowering the performance overhead. Third, we propose a solution to migrate pages between the tiers of the same memory without transferring data over the memory channel, minimizing channel occupancy and improving performance. Our overall approach, which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid tiered memory improves both performance and reliability for both single-core and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium on Memory Managemen

arXiv.org e-Print Archive

Crossref

Rewriting Codes for Joint Information Storage in Flash Memories

Author: Bruck Jehoshua
Jiang Anxiao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2010
Field of study

Memories whose storage cells transit irreversibly between states have been common since the start of the data storage technology. In recent years, flash memories have become a very important family of such memories. A flash memory cell has q states—state 0.1.....q-1 - and can only transit from a lower state to a higher state before the expensive erasure operation takes place. We study rewriting codes that enable the data stored in a group of cells to be rewritten by only shifting the cells to higher states. Since the considered state transitions are irreversible, the number of rewrites is bounded. Our objective is to maximize the number of times the data can be rewritten. We focus on the joint storage of data in flash memories, and study two rewriting codes for two different scenarios. The first code, called floating code, is for the joint storage of multiple variables, where every rewrite changes one variable. The second code, called buffer code, is for remembering the most recent data in a data stream. Many of the codes presented here are either optimal or asymptotically optimal. We also present bounds to the performance of general codes. The results show that rewriting codes can integrate a flash memory’s rewriting capabilities for different variables to a high degree

Caltech Authors

Non-power-of-Two FFTs: Exploring the Flexibility of the Montium TP

Author: Hauck S.A.
Smit Gerardus Johannes Maria
van de Burgwal M.D.
Wolkotte P.T.
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2009
Field of study

Coarse-grain reconfigurable architectures, like the Montium TP, have proven to be a very successful approach for low-power and high-performance computation of regular digital signal processing algorithms. This paper presents the implementation of a class of non-power-of-two FFTs to discover the limitations and Flexibility of the Montium TP for less regular algorithms. A non-power-of-two FFT is less regular compared to a traditional power-of-two FFT. The results of the implementation show the processing time, accuracy, energy consumption and Flexibility of the implementation

CiteSeerX

Crossref

Directory of Open Access Journals

University of Twente Research Information

A Cache Management Strategy to Replace Wear Leveling Techniques for Embedded Flash Memory

Author: Boukhobza Jalil
Olivier Pierre
Rubini Stéphane
Publication venue
Publication date: 27/06/2011
Field of study

Prices of NAND flash memories are falling drastically due to market growth and fabrication process mastering while research efforts from a technological point of view in terms of endurance and density are very active. NAND flash memories are becoming the most important storage media in mobile computing and tend to be less confined to this area. The major constraint of such a technology is the limited number of possible erase operations per block which tend to quickly provoke memory wear out. To cope with this issue, state-of-the-art solutions implement wear leveling policies to level the wear out of the memory and so increase its lifetime. These policies are integrated into the Flash Translation Layer (FTL) and greatly contribute in decreasing the write performance. In this paper, we propose to reduce the flash memory wear out problem and improve its performance by absorbing the erase operations throughout a dual cache system replacing FTL wear leveling and garbage collection services. We justify this idea by proposing a first performance evaluation of an exclusively cache based system for embedded flash memories. Unlike wear leveling schemes, the proposed cache solution reduces the total number of erase operations reported on the media by absorbing them in the cache for workloads expressing a minimal global sequential rate.Comment: Ce papier a obtenu le "Best Paper Award" dans le "Computer System track" nombre de page: 8; International Symposium on Performance Evaluation of Computer & Telecommunication Systems, La Haye : Netherlands (2011

arXiv.org e-Print Archive

HAL-Université de Bretagne Occidentale

The Chameleon Architecture for Streaming DSP Applications

Author: Burgwal Marcel D. van de
Heysters Paul M.
Hölzenspies Philip K.F.
Kokkeler André B.J.
Smit Gerard J.M.
Wolkotte Pascal T.
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2007
Field of study

We focus on architectures for streaming DSP applications such as wireless baseband processing and image processing. We aim at a single generic architecture that is capable of dealing with different DSP applications. This architecture has to be energy efficient and fault tolerant. We introduce a heterogeneous tiled architecture and present the details of a domain-specific reconfigurable tile processor called Montium. This reconfigurable processor has a small footprint (1.8 mm

^2

in a 130 nm process), is power efficient and exploits the locality of reference principle. Reconfiguring the device is very fast, for example, loading the coefficients for a 200 tap FIR filter is done within 80 clock cycles. The tiles on the tiled architecture are connected to a Network-on-Chip (NoC) via a network interface (NI). Two NoCs have been developed: a packet-switched and a circuit-switched version. Both provide two types of services: guaranteed throughput (GT) and best effort (BE). For both NoCs estimates of power consumption are presented. The NI synchronizes data transfers, configures and starts/stops the tile processor. For dynamically mapping applications onto the tiled architecture, we introduce a run-time mapping tool

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

University of Twente Research Information

Configurable unitary transformations and linear logic gates using quantum memories

Author: Buchler B. C.
Campbell G. T.
Hosseini M.
Lam P. K.
Pinel O.
Ralph T. C.
Publication venue: 'American Physical Society (APS)'
Publication date: 04/08/2014
Field of study

We show that a set of optical memories can act as a configurable linear optical network operating on frequency-multiplexed optical states. Our protocol is applicable to any quantum memories that employ off-resonant Raman transitions to store optical information in atomic spins. In addition to the configurability, the protocol also offers favourable scaling with an increasing number of modes where N memories can be configured to implement an arbitrary N-mode unitary operations during storage and readout. We demonstrate the versatility of this protocol by showing an example where cascaded memories are used to implement a conditional CZ gate.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive

The Australian National University

University of Queensland eSpace