12,604 research outputs found
Coding for Phase Change Memory Performance Optimization
Over the past several decades, memory technologies have exploited
continual scaling of CMOS to drastically improve performance and
cost. Unfortunately, charge-based memories become unreliable beyond
20 nm feature sizes. A promising alternative is Phase-Change-Memory
(PCM) which leverages scalable resistive thermal mechanisms. To
realize PCM's potential, a number of challenges, including the
limited wear-endurance and costly writes, need to be addressed. This
thesis introduces novel methodologies for encoding data on PCM which exploit asymmetries in read/write performance to minimize memory's wear/energy consumption. First, we map the problem to a
distance-based graph clustering problem and prove it is NP-hard.
Next, we propose two different approaches: an optimal solution
based on Integer-Linear-Programming, and an approximately-optimal solution based on Dynamic-Programming. Our methods target both single-level and multi-level cell PCM and provide further
optimizations for stochastically-distributed data. We devise a low
overhead hardware architecture for the encoder. Evaluations
demonstrate significant performance gains of our framework
The financial clouds review
This paper demonstrates financial enterprise portability, which involves moving entire application services from desktops to clouds and between different clouds, and is transparent to users who can work as if on their familiar systems. To demonstrate portability, reviews for several financial models are studied, where Monte Carlo Methods (MCM) and Black Scholes Model (BSM) are chosen. A special technique in MCM, Least Square Methods, is used to reduce errors while performing accurate calculations. The coding algorithm for MCM written in MATLAB is explained. Simulations for MCM are performed on different types of Clouds. Benchmark and experimental results are presented for discussion. 3D Black Scholes are used to explain the impacts and added values for risk analysis, and three different scenarios with 3D risk analysis are explained. We also discuss implications for banking and ways to track risks in order to improve accuracy. We have used a conceptual Cloud platform to explain our contributions in Financial Software as a Service (FSaaS) and the IBM Fined Grained Security Framework. Our objective is to demonstrate portability, speed, accuracy and reliability of applications in the clouds, while demonstrating portability for FSaaS and the Cloud Computing Business Framework (CCBF), which is proposed to deal with cloud portability
High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory
In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design
Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories
Modern computing systems are embracing hybrid memory comprising of DRAM and
non-volatile memory (NVM) to combine the best properties of both memory
technologies, achieving low latency, high reliability, and high density. A
prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access
latency much higher than DRAM access latency. We call this inter-memory
asymmetry. We observe that parasitic components on a long bitline are a major
source of high latency in both DRAM and NVM, and a significant factor
contributing to high-voltage operations in NVM, which impact their reliability.
We propose an architectural change, where each long bitline in DRAM and NVM is
split into two segments by an isolation transistor. One segment can be accessed
with lower latency and operating voltage than the other. By introducing tiers,
we enable non-uniform accesses within each memory type (which we call
intra-memory asymmetry), leading to performance and reliability trade-offs in
DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we
exploit both inter- and intra-memory asymmetries to allocate and migrate memory
pages between the tiers in DRAM and NVM. Second, we improve the OS's page
allocation decisions by predicting the access intensity of a newly-referenced
memory page in a program and placing it to a matching tier during its initial
allocation. This minimizes page migrations during program execution, lowering
the performance overhead. Third, we propose a solution to migrate pages between
the tiers of the same memory without transferring data over the memory channel,
minimizing channel occupancy and improving performance. Our overall approach,
which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid
tiered memory improves both performance and reliability for both single-core
and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium
on Memory Managemen
A Construction Kit for Efficient Low Power Neural Network Accelerator Designs
Implementing embedded neural network processing at the edge requires
efficient hardware acceleration that couples high computational performance
with low power consumption. Driven by the rapid evolution of network
architectures and their algorithmic features, accelerator designs are
constantly updated and improved. To evaluate and compare hardware design
choices, designers can refer to a myriad of accelerator implementations in the
literature. Surveys provide an overview of these works but are often limited to
system-level and benchmark-specific performance metrics, making it difficult to
quantitatively compare the individual effect of each utilized optimization
technique. This complicates the evaluation of optimizations for new accelerator
designs, slowing-down the research progress. This work provides a survey of
neural network accelerator optimization approaches that have been used in
recent works and reports their individual effects on edge processing
performance. It presents the list of optimizations and their quantitative
effects as a construction kit, allowing to assess the design choices for each
building block separately. Reported optimizations range from up to 10'000x
memory savings to 33x energy reductions, providing chip designers an overview
of design choices for implementing efficient low power neural network
accelerators
- …