Search CORE

5,767 research outputs found

Recommended from our members

Testability considerations for implementing an embedded memory subsystem

Author: Seok Geewhun
Publication venue
Publication date: 01/02/2012
Field of study

textThere are a number of testability considerations for VLSI design, but test coverage, test time, accuracy of test patterns and correctness of design information for DFD (Design for debug) are the most important ones in design with embedded memories. The goal of DFT (Design-for-Test) is to achieve zero defects. When it comes to the memory subsystem in SOCs (system on chips), many flavors of memory BIST (built-in self test) are able to get high test coverage in a memory, but often, no proper attention is given to the memory interface logic (shadow logic). Functional testing and BIST are the most prevalent tests for this logic, but functional testing is impractical for complicated SOC designs. As a result, industry has widely used at-speed scan testing to detect delay induced defects. Compared with functional testing, scan-based testing for delay faults reduces overall pattern generation complexity and cost by enhancing both controllability and observability of flip-flops. However, without proper modeling of memory, Xs are generated from memories. Also, when the design has chip compression logic, the number of ATPG patterns is increased significantly due to Xs from memories. In this dissertation, a register based testing method and X prevention logic are presented to tackle these problems. An important design stage for scan based testing with memory subsystems is the step to create a gate level model and verify with this model. The flow needs to provide a robust ATPG netlist model. Most industry standard CAD tools used to analyze fault coverage and generate test vectors require gate level models. However, custom embedded memories are typically designed using a transistor-level flow, there is a need for an abstraction step to generate the gate models, which must be equivalent to the actual design (transistor level). The contribution of the research is a framework to verify that the gate level representation of custom designs is equivalent to the transistor-level design. Compared to basic stuck-at fault testing, the number of patterns for at-speed testing is much larger than for basic stuck-at fault testing. So reducing test and data volume are important. In this desertion, a new scan reordering method is introduced to reduce test data with an optimal routing solution. With in depth understanding of embedded memories and flows developed during the study of custom memory DFT, a custom embedded memory Bit Mapping method using a symbolic simulator is presented in the last chapter to achieve high yield for memories.Electrical and Computer Engineerin

Texas ScholarWorks

Lossy and Lossless Compression Techniques to Improve the Utilization of Memory Bandwidth and Capacity

Author: Eldst\ue5l-Ahrens Albin
Publication venue
Publication date: 01/01/2022
Field of study

Main memory is a critical resource in modern computer systems and is in increasing demand. An increasing number of on-chip cores and specialized accelerators improves the potential processing throughput but also calls for higher data rates and greater memory capacity. In addition, new emerging data-intensive applications further increase memory traffic and footprint. On the other hand, memory bandwidth is pin limited and power constrained and is therefore more difficult to scale. Memory capacity is limited by cost and energy considerations.This thesis proposes a variety of memory compression techniques as a means to reduce the memory bottleneck. These techniques target two separate problems in the memory hierarchy: memory bandwidth and memory capacity. In order to reduce transferred data volumes, lossy compression is applied which is able to reach more aggressive compression ratios. A reduction of off-chip memory traffic leads to reduced memory latency, which in turn improves the performance and energy efficiency of the system. To improve memory capacity, a novel approach to memory compaction is presented.The first part of this thesis introduces Approximate Value Reconstruction (AVR), which combines a low-complexity downsampling compressor with an LLC design able to co-locate compressed and uncompressed data. Two separate thresholds limit the error introduced by approximation. For applications that tolerate aggressive approximation in large fractions of their data, in a system with 1GB of 1600MHz DDR4 per core and 1MB of LLC space per core, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing at most 1.2% error in the application output.The second part of this thesis proposes Memory Squeeze (MemSZ), introducing a parallelized implementation of the more advanced Squeeze (SZ) compression method. Furthermore, MemSZ improves on the error limiting capability of AVR by keeping track of life-time accumulated error. An alternate memory compression architecture is also proposed, which utilizes 3D-stacked DRAM as a last-level cache. In a system with 1GB of 800MHz DDR4 per core and 1MB of LLC space per core, MemSZ improves execution time, energy and memory traffic over AVR by up to 15%, 9%, and 64%, respectively.The third part of the thesis describes L2C, a hybrid lossy and lossless memory compression scheme. L2C applies lossy compression to approximable data, and falls back to lossless if an error threshold is exceeded. In a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, L2C improves on the performance of MemSZ by 9%, and energy consumption by 3%.The fourth and final contribution is FlatPack, a novel memory compaction scheme. FlatPack is able to reduce the traffic overhead compared to other memory compaction systems, thus retaining the bandwidth benefits of compression. Furthermore, FlatPack is flexible to changes in block compressibility both over time and between adjacent blocks. When available memory corresponds to 50% of the application footprint, in a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, FlatPack increases system performance compared to current state-of-the-art designs by 36%, while reducing system energy consumption by 12%

Chalmers Research