102,704 research outputs found
High-Level Technology Mapping for Memories
In this paper, we consider memory-mapping problems in High-Level Synthesis. We focus on the port mapping, bit-width mapping and word mapping, respectively. A 0-1 Integer Linear Programming (ILP) technique is used to solve the mapping problems, which synthesizes the source memory using one or more memory modules from a target memory library at a higher level. This method can not only perform bit-width mapping and word mapping, but it can also perform port mapping at the same time. Experimental results indicate that ILP approach is an effective method for memory reuse in high-level synthesis
Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories
Modern computing systems are embracing hybrid memory comprising of DRAM and
non-volatile memory (NVM) to combine the best properties of both memory
technologies, achieving low latency, high reliability, and high density. A
prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access
latency much higher than DRAM access latency. We call this inter-memory
asymmetry. We observe that parasitic components on a long bitline are a major
source of high latency in both DRAM and NVM, and a significant factor
contributing to high-voltage operations in NVM, which impact their reliability.
We propose an architectural change, where each long bitline in DRAM and NVM is
split into two segments by an isolation transistor. One segment can be accessed
with lower latency and operating voltage than the other. By introducing tiers,
we enable non-uniform accesses within each memory type (which we call
intra-memory asymmetry), leading to performance and reliability trade-offs in
DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we
exploit both inter- and intra-memory asymmetries to allocate and migrate memory
pages between the tiers in DRAM and NVM. Second, we improve the OS's page
allocation decisions by predicting the access intensity of a newly-referenced
memory page in a program and placing it to a matching tier during its initial
allocation. This minimizes page migrations during program execution, lowering
the performance overhead. Third, we propose a solution to migrate pages between
the tiers of the same memory without transferring data over the memory channel,
minimizing channel occupancy and improving performance. Our overall approach,
which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid
tiered memory improves both performance and reliability for both single-core
and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium
on Memory Managemen
Rewriting Codes for Joint Information Storage in Flash Memories
Memories whose storage cells transit irreversibly between
states have been common since the start of the data storage
technology. In recent years, flash memories have become a very
important family of such memories. A flash memory cell has q
states—state 0.1.....q-1 - and can only transit from a lower
state to a higher state before the expensive erasure operation takes
place. We study rewriting codes that enable the data stored in a
group of cells to be rewritten by only shifting the cells to higher
states. Since the considered state transitions are irreversible, the
number of rewrites is bounded. Our objective is to maximize the
number of times the data can be rewritten. We focus on the joint
storage of data in flash memories, and study two rewriting codes
for two different scenarios. The first code, called floating code, is for
the joint storage of multiple variables, where every rewrite changes
one variable. The second code, called buffer code, is for remembering
the most recent data in a data stream. Many of the codes
presented here are either optimal or asymptotically optimal. We
also present bounds to the performance of general codes. The results
show that rewriting codes can integrate a flash memory’s
rewriting capabilities for different variables to a high degree
Non-power-of-Two FFTs: Exploring the Flexibility of the Montium TP
Coarse-grain reconfigurable architectures, like the Montium TP, have proven to be a very successful approach for low-power and high-performance computation of regular digital signal processing algorithms. This paper presents the implementation of a class of non-power-of-two FFTs to discover the limitations and Flexibility of the Montium TP for less regular algorithms. A non-power-of-two FFT is less regular compared to a traditional power-of-two FFT. The results of the implementation show the processing time, accuracy, energy consumption and Flexibility of the implementation
A Cache Management Strategy to Replace Wear Leveling Techniques for Embedded Flash Memory
Prices of NAND flash memories are falling drastically due to market growth
and fabrication process mastering while research efforts from a technological
point of view in terms of endurance and density are very active. NAND flash
memories are becoming the most important storage media in mobile computing and
tend to be less confined to this area. The major constraint of such a
technology is the limited number of possible erase operations per block which
tend to quickly provoke memory wear out. To cope with this issue,
state-of-the-art solutions implement wear leveling policies to level the wear
out of the memory and so increase its lifetime. These policies are integrated
into the Flash Translation Layer (FTL) and greatly contribute in decreasing the
write performance. In this paper, we propose to reduce the flash memory wear
out problem and improve its performance by absorbing the erase operations
throughout a dual cache system replacing FTL wear leveling and garbage
collection services. We justify this idea by proposing a first performance
evaluation of an exclusively cache based system for embedded flash memories.
Unlike wear leveling schemes, the proposed cache solution reduces the total
number of erase operations reported on the media by absorbing them in the cache
for workloads expressing a minimal global sequential rate.Comment: Ce papier a obtenu le "Best Paper Award" dans le "Computer System
track" nombre de page: 8; International Symposium on Performance Evaluation
of Computer & Telecommunication Systems, La Haye : Netherlands (2011
The Chameleon Architecture for Streaming DSP Applications
We focus on architectures for streaming DSP applications such as wireless baseband processing and image processing. We aim at a single generic architecture that is capable of dealing with different DSP applications. This architecture has to be energy efficient and fault tolerant. We introduce a heterogeneous tiled architecture and present the details of a domain-specific reconfigurable tile processor called Montium. This reconfigurable processor has a small footprint (1.8 mm in a 130 nm process), is power efficient and exploits the locality of reference principle. Reconfiguring the device is very fast, for example, loading the coefficients for a 200 tap FIR filter is done within 80 clock cycles. The tiles on the tiled architecture are connected to a Network-on-Chip (NoC) via a network interface (NI). Two NoCs have been developed: a packet-switched and a circuit-switched version. Both provide two types of services: guaranteed throughput (GT) and best effort (BE). For both NoCs estimates of power consumption are presented. The NI synchronizes data transfers, configures and starts/stops the tile processor. For dynamically mapping applications onto the tiled architecture, we introduce a run-time mapping tool
Configurable unitary transformations and linear logic gates using quantum memories
We show that a set of optical memories can act as a configurable linear
optical network operating on frequency-multiplexed optical states. Our protocol
is applicable to any quantum memories that employ off-resonant Raman
transitions to store optical information in atomic spins. In addition to the
configurability, the protocol also offers favourable scaling with an increasing
number of modes where N memories can be configured to implement an arbitrary
N-mode unitary operations during storage and readout. We demonstrate the
versatility of this protocol by showing an example where cascaded memories are
used to implement a conditional CZ gate.Comment: 5 pages, 2 figure
- …