6 research outputs found
Memory system architecture for real-time multitasking systems
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 119-120).by Scott Rixner.M.Eng
Energy efficient cache architectures for single, multi and many core processors
With each technology generation we get more transistors per chip. Whilst processor
frequencies have increased over the past few decades, memory speeds have not kept
pace. Therefore, more and more transistors are devoted to on-chip caches to reduce
latency to data and help achieve high performance.
On-chip caches consume a significant fraction of the processor energy budget but
need to deliver high performance. Therefore cache resources should be optimized to
meet the requirements of the running applications. Fixed configuration caches are designed to deliver low average memory access times across a wide range of potential
applications. However, this can lead to excessive energy consumption for applications
that do not require the full capacity or associativity of the cache at all times. Furthermore, in systems where the clock period is constrained by the access times of level-1
caches, the clock frequency for all applications is effectively limited by the cache requirements of the most demanding phase within the most demanding application. This
motivates the need for dynamic adaptation of cache configurations in order to optimize
performance while minimizing energy consumption, on a per-application basis.
First, this thesis proposes an energy-efficient cache architecture for a single core
system, along with a run-time support framework for dynamic adaptation of cache size
and associativity through the use of machine learning. The machine learning model,
which is trained offline, profiles the application’s cache usage and then reconfigures
the cache according to the program’s requirement. The proposed cache architecture
has, on average, 18% better energy-delay product than the prior state-of-the-art cache
architectures proposed in the literature.
Next, this thesis proposes cooperative partitioning, an energy-efficient cache partitioning scheme for multi-core systems that share the Last Level Cache (LLC), with
a core to LLC cache way ratio of 1:4. The proposed cache partitioning scheme uses
small auxiliary tags to capture each core’s cache requirements, and partitions the LLC
according to the individual cores cache requirement. The proposed partitioning uses a
way-aligned scheme that helps in the reduction of both dynamic and static energy. This
scheme, on an average offers 70% and 30% reduction in dynamic and static energy
respectively, while maintaining high performance on par with state-of-the-art cache
partitioning schemes.
Finally, when Last Level Cache (LLC) ways are equal to or less than the number
of cores present in many-core systems, cooperative partitioning cannot be used for
partitioning the LLC. This thesis proposes a region aware cache partitioning scheme
as an energy-efficient approach for many core systems that share the LLC, with a core
to LLC way ratio of 1:2 and 1:1. The proposed partitioning, on an average offers 68%
and 33% reduction in dynamic and static energy respectively, while again maintaining
high performance on par with state-of-the-art LLC cache management techniques
Multipurpose short-term memory structures.
by Yung, Chan.Thesis (M.Phil.)--Chinese University of Hong Kong, 1995.Includes bibliographical references (leaves 107-110).Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Cache --- p.1Chapter 1.1.1 --- Introduction --- p.1Chapter 1.1.2 --- Data Prefetching --- p.2Chapter 1.2 --- Register --- p.2Chapter 1.3 --- Problems and Challenges --- p.3Chapter 1.3.1 --- Overhead of registers --- p.3Chapter 1.3.2 --- EReg --- p.5Chapter 1.4 --- Organization of the Thesis --- p.6Chapter 2 --- Previous Studies --- p.8Chapter 2.1 --- Introduction --- p.8Chapter 2.2 --- Data aliasing --- p.9Chapter 2.3 --- Data prefetching --- p.12Chapter 2.3.1 --- Introduction --- p.12Chapter 2.3.2 --- Hardware Prefetching --- p.12Chapter 2.3.3 --- Prefetching with Software Support --- p.13Chapter 2.3.4 --- Reducing Cache Pollution --- p.14Chapter 3 --- BASIC and ADM Models --- p.15Chapter 3.1 --- Introduction of Basic Model --- p.15Chapter 3.2 --- Architectural and Operational Detail of Basic Model --- p.18Chapter 3.3 --- Discussion --- p.19Chapter 3.3.1 --- Implicit Storing --- p.19Chapter 3.3.2 --- Associative Logic --- p.22Chapter 3.4 --- Example for Basic Model --- p.22Chapter 3.5 --- Simulation Results --- p.23Chapter 3.6 --- Temporary Storage Problem in Basic Model --- p.29Chapter 3.6.1 --- Introduction --- p.29Chapter 3.6.2 --- Discussion on the Solutions --- p.31Chapter 3.7 --- Introduction of ADM Model --- p.35Chapter 3.8 --- Architectural and Operational Detail of ADM Model --- p.37Chapter 3.9 --- Discussion --- p.39Chapter 3.9.1 --- File Partition --- p.39Chapter 3.9.2 --- STORE Instruction --- p.39Chapter 3.10 --- Example for ADM Model --- p.40Chapter 3.11 --- Simulation Results --- p.40Chapter 3.12 --- Temporary storage Problem of ADM Model --- p.46Chapter 3.12.1 --- Introduction --- p.46Chapter 3.12.2 --- Discussion on the Solutions --- p.46Chapter 4 --- ADS Model and ADSM Model --- p.49Chapter 4.1 --- Introduction of ADS Model --- p.49Chapter 4.2 --- Architectural and Operational Detail of ADS Model --- p.50Chapter 4.3 --- Discussion --- p.52Chapter 4.3.1 --- Prefetching Priority --- p.52Chapter 4.3.2 --- Data Prefetching --- p.53Chapter 4.3.3 --- EReg File Splitting --- p.53Chapter 4.3.4 --- Compiling Procedure --- p.53Chapter 4.4 --- Example for ADS Model --- p.54Chapter 4.5 --- Simulation Results --- p.55Chapter 4.6 --- Discussion on the Architectural and Operational Variations for ADS Model --- p.62Chapter 4.6.1 --- Temporary storage Problem --- p.62Chapter 4.6.2 --- Operational variation for Data Prefetching --- p.63Chapter 4.7 --- Introduction of ADSM Model --- p.64Chapter 4.8 --- Architectural and Operational Detail of ADSM Model --- p.65Chapter 4.9 --- Discussion --- p.67Chapter 4.10 --- Example for ADSM Model --- p.67Chapter 4.11 --- Simulation Results --- p.68Chapter 4.12 --- Discussion on the Architectural and Operational Variations for ADSM Model --- p.71Chapter 4.12.1 --- Temporary storage Problem --- p.71Chapter 4.12.2 --- Operational variation for Data Prefetching --- p.73Chapter 5 --- IADSM Model and IADSMC&IDLC Model --- p.75Chapter 5.1 --- Introduction of IADSM Model --- p.75Chapter 5.2 --- Architectural and Operational Detail of IADSM Model --- p.76Chapter 5.3 --- Discussion --- p.79Chapter 5.3.1 --- Implicit Loading --- p.79Chapter 5.3.2 --- Compiling Procedure --- p.81Chapter 5.4 --- Example for IADSM Model --- p.81Chapter 5.5 --- Simulation Results --- p.84Chapter 5.6 --- Temporary Storage Problem of IADSM Model --- p.87Chapter 5.7 --- Introduction of IADSMC&IDLC Model..........: --- p.88Chapter 5.8 --- Architectural and Operational Detail of IADSMC & IDLC Model --- p.89Chapter 5.9 --- Discussion --- p.90Chapter 5.9.1 --- Additional Operations --- p.90Chapter 5.9.2 --- Compiling Procedure --- p.93Chapter 5.10 --- Example for IADSMC&IDLC Model --- p.93Chapter 5.11 --- Simulation Results --- p.94Chapter 5.12 --- Temporary Storage Problem of IADSMC&IDLC Model --- p.96Chapter 6 --- Compiler and Memory System Support for EReg --- p.99Chapter 6.1 --- Impact on Compiler --- p.99Chapter 6.1.1 --- Register Usage --- p.99Chapter 6.1.2 --- Effect of Unrolling --- p.100Chapter 6.1.3 --- Code Scheduling Algorithm --- p.101Chapter 6.2 --- Impact on Memory System --- p.102Chapter 6.2.1 --- Memory Bottleneck --- p.102Chapter 6.2.2 --- Size of EReg Files --- p.103Chapter 7 --- Conclusions --- p.104Chapter 7.1 --- Summary --- p.104Chapter 7.2 --- Future Research --- p.105Bibliography --- p.107Chapter A --- Source code of the Kernels --- p.111Chapter B --- Program Analysis --- p.126Chapter B.1 --- Program analysed by Basic Model --- p.126Chapter B.2 --- Program analysed by ADM Model --- p.133Chapter B.3 --- Program analysed by ADS Model --- p.140Chapter B.4 --- Program analysed by ADSM Model --- p.148Chapter B.5 --- Program analysed by IADSM Model --- p.156Chapter B.6 --- Program analysed by IADSMC&IDLC Model --- p.163Chapter C --- Cache Simulation on Prefetching of ADS model --- p.17
Data prefetching using hardware register value predictable table.
by Chin-Ming, Cheung.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 95-97).Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview --- p.1Chapter 1.2 --- Objective --- p.3Chapter 1.3 --- Organization of the dissertation --- p.4Chapter 2 --- Related Works --- p.6Chapter 2.1 --- Previous Cache Works --- p.6Chapter 2.2 --- Data Prefetching Techniques --- p.7Chapter 2.2.1 --- Hardware Vs Software Assisted --- p.7Chapter 2.2.2 --- Non-selective Vs Highly Selective --- p.8Chapter 2.2.3 --- Summary on Previous Data Prefetching Schemes --- p.12Chapter 3 --- Program Data Mapping --- p.13Chapter 3.1 --- Regular and Irregular Data Access --- p.13Chapter 3.2 --- Propagation of Data Access Regularity --- p.16Chapter 3.2.1 --- Data Access Regularity in High Level Program --- p.17Chapter 3.2.2 --- Data Access Regularity in Machine Code --- p.18Chapter 3.2.3 --- Data Access Regularity in Memory Address Sequence --- p.20Chapter 3.2.4 --- Implication --- p.21Chapter 4 --- Register Value Prediction Table (RVPT) --- p.22Chapter 4.1 --- Predictability of Register Values --- p.23Chapter 4.2 --- Register Value Prediction Table --- p.26Chapter 4.3 --- Control Scheme of RVPT --- p.29Chapter 4.3.1 --- Details of RVPT Mechanism --- p.29Chapter 4.3.2 --- Explanation of the Register Prediction Mechanism --- p.32Chapter 4.4 --- Examples of RVPT --- p.35Chapter 4.4.1 --- Linear Array Example --- p.35Chapter 4.4.2 --- Linked List Example --- p.36Chapter 5 --- Program Register Dependency --- p.39Chapter 5.1 --- Register Dependency --- p.40Chapter 5.2 --- Generalized Concept of Register --- p.44Chapter 5.2.1 --- Cyclic Dependent Register(CDR) --- p.44Chapter 5.2.2 --- Acyclic Dependent Register(ADR) --- p.46Chapter 5.3 --- Program Register Overview --- p.47Chapter 6 --- Generalized RVPT Model --- p.49Chapter 6.1 --- Level N RVPT Model --- p.49Chapter 6.1.1 --- Identification of Level N CDR --- p.51Chapter 6.1.2 --- Recording CDR instructions of Level N CDR --- p.53Chapter 6.1.3 --- Prediction of Level N CDR --- p.55Chapter 6.2 --- Level 2 Register Value Prediction Table --- p.55Chapter 6.2.1 --- Level 2 RVPT Structure --- p.56Chapter 6.2.2 --- Identification of Level 2 CDR --- p.58Chapter 6.2.3 --- Control Scheme of Level 2 RVPT --- p.59Chapter 6.2.4 --- Example of Index Array --- p.63Chapter 7 --- Performance Evaluation --- p.66Chapter 7.1 --- Evaluation Methodology --- p.66Chapter 7.1.1 --- Trace-Drive Simulation --- p.66Chapter 7.1.2 --- Architectural Method --- p.68Chapter 7.1.3 --- Benchmarks and Metrics --- p.70Chapter 7.2 --- General Result --- p.75Chapter 7.2.1 --- Constant Stride or Regular Data Access Applications --- p.77Chapter 7.2.2 --- Non-constant Stride or Irregular Data Access Applications --- p.79Chapter 7.3 --- Effect of Design Variations --- p.80Chapter 7.3.1 --- Effect of Cache Size --- p.81Chapter 7.3.2 --- Effect of Block Size --- p.83Chapter 7.3.3 --- Effect of Set Associativity --- p.86Chapter 7.4 --- Summary --- p.87Chapter 8 --- Conclusion and Future Research --- p.88Chapter 8.1 --- Conclusion --- p.88Chapter 8.2 --- Future Research --- p.90Bibliography --- p.95Appendix --- p.98Chapter A --- MCPI vs. cache size --- p.98Chapter B --- MCPI Reduction Percentage Vs cache size --- p.102Chapter C --- MCPI vs. block size --- p.106Chapter D --- MCPI Reduction Percentage Vs block size --- p.110Chapter E --- MCPI vs. set-associativity --- p.114Chapter F --- MCPI Reduction Percentage Vs set-associativity --- p.11