5,767 research outputs found
Recommended from our members
Testability considerations for implementing an embedded memory subsystem
textThere are a number of testability considerations for VLSI design,
but test coverage, test time, accuracy of test patterns and
correctness of design information for DFD (Design for debug) are
the most important ones in design with embedded memories. The goal
of DFT (Design-for-Test) is to achieve zero defects. When it comes
to the memory subsystem in SOCs (system on chips), many flavors of
memory BIST (built-in self test) are able to get high test
coverage in a memory, but often, no proper attention is given to
the memory interface logic (shadow logic). Functional testing and
BIST are the most prevalent tests for this logic, but functional
testing is impractical for complicated SOC designs. As a result,
industry has widely used at-speed scan testing to detect delay
induced defects. Compared with functional testing, scan-based
testing for delay faults reduces overall pattern generation
complexity and cost by enhancing both controllability and
observability of flip-flops. However, without proper modeling of
memory, Xs are generated from memories. Also, when the design has
chip compression logic, the number of ATPG patterns is increased
significantly due to Xs from memories. In this dissertation, a
register based testing method and X prevention logic are presented
to tackle these problems.
An important design stage for scan based testing with memory
subsystems is the step to create a gate level model and verify
with this model. The flow needs to provide a robust ATPG netlist
model. Most industry standard CAD tools used to analyze fault
coverage and generate test vectors require gate level models.
However, custom embedded memories are typically designed using a
transistor-level flow, there is a need for an abstraction step to
generate the gate models, which must be equivalent to the actual
design (transistor level). The contribution of the research is a
framework to verify that the gate level representation of custom
designs is equivalent to the transistor-level design.
Compared to basic stuck-at fault testing, the number of patterns
for at-speed testing is much larger than for basic stuck-at fault
testing. So reducing test and data volume are important. In this
desertion, a new scan reordering method is introduced to reduce
test data with an optimal routing solution. With in depth
understanding of embedded memories and flows developed during the
study of custom memory DFT, a custom embedded memory Bit Mapping
method using a symbolic simulator is presented in the last chapter
to achieve high yield for memories.Electrical and Computer Engineerin
Lossy and Lossless Compression Techniques to Improve the Utilization of Memory Bandwidth and Capacity
Main memory is a critical resource in modern computer systems and is in increasing demand. An increasing number of on-chip cores and specialized accelerators improves the potential processing throughput but also calls for higher data rates and greater memory capacity. In addition, new emerging data-intensive applications further increase memory traffic and footprint. On the other hand, memory bandwidth is pin limited and power constrained and is therefore more difficult to scale. Memory capacity is limited by cost and energy considerations.This thesis proposes a variety of memory compression techniques as a means to reduce the memory bottleneck. These techniques target two separate problems in the memory hierarchy: memory bandwidth and memory capacity. In order to reduce transferred data volumes, lossy compression is applied which is able to reach more aggressive compression ratios. A reduction of off-chip memory traffic leads to reduced memory latency, which in turn improves the performance and energy efficiency of the system. To improve memory capacity, a novel approach to memory compaction is presented.The first part of this thesis introduces Approximate Value Reconstruction (AVR), which combines a low-complexity downsampling compressor with an LLC design able to co-locate compressed and uncompressed data. Two separate thresholds limit the error introduced by approximation. For applications that tolerate aggressive approximation in large fractions of their data, in a system with 1GB of 1600MHz DDR4 per core and 1MB of LLC space per core, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing at most 1.2% error in the application output.The second part of this thesis proposes Memory Squeeze (MemSZ), introducing a parallelized implementation of the more advanced Squeeze (SZ) compression method. Furthermore, MemSZ improves on the error limiting capability of AVR by keeping track of life-time accumulated error. An alternate memory compression architecture is also proposed, which utilizes 3D-stacked DRAM as a last-level cache. In a system with 1GB of 800MHz DDR4 per core and 1MB of LLC space per core, MemSZ improves execution time, energy and memory traffic over AVR by up to 15%, 9%, and 64%, respectively.The third part of the thesis describes L2C, a hybrid lossy and lossless memory compression scheme. L2C applies lossy compression to approximable data, and falls back to lossless if an error threshold is exceeded. In a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, L2C improves on the performance of MemSZ by 9%, and energy consumption by 3%.The fourth and final contribution is FlatPack, a novel memory compaction scheme. FlatPack is able to reduce the traffic overhead compared to other memory compaction systems, thus retaining the bandwidth benefits of compression. Furthermore, FlatPack is flexible to changes in block compressibility both over time and between adjacent blocks. When available memory corresponds to 50% of the application footprint, in a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, FlatPack increases system performance compared to current state-of-the-art designs by 36%, while reducing system energy consumption by 12%
- …