6 research outputs found

    Memory system architecture for real-time multitasking systems

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (p. 119-120).by Scott Rixner.M.Eng

    Energy efficient cache architectures for single, multi and many core processors

    Get PDF
    With each technology generation we get more transistors per chip. Whilst processor frequencies have increased over the past few decades, memory speeds have not kept pace. Therefore, more and more transistors are devoted to on-chip caches to reduce latency to data and help achieve high performance. On-chip caches consume a significant fraction of the processor energy budget but need to deliver high performance. Therefore cache resources should be optimized to meet the requirements of the running applications. Fixed configuration caches are designed to deliver low average memory access times across a wide range of potential applications. However, this can lead to excessive energy consumption for applications that do not require the full capacity or associativity of the cache at all times. Furthermore, in systems where the clock period is constrained by the access times of level-1 caches, the clock frequency for all applications is effectively limited by the cache requirements of the most demanding phase within the most demanding application. This motivates the need for dynamic adaptation of cache configurations in order to optimize performance while minimizing energy consumption, on a per-application basis. First, this thesis proposes an energy-efficient cache architecture for a single core system, along with a run-time support framework for dynamic adaptation of cache size and associativity through the use of machine learning. The machine learning model, which is trained offline, profiles the application’s cache usage and then reconfigures the cache according to the program’s requirement. The proposed cache architecture has, on average, 18% better energy-delay product than the prior state-of-the-art cache architectures proposed in the literature. Next, this thesis proposes cooperative partitioning, an energy-efficient cache partitioning scheme for multi-core systems that share the Last Level Cache (LLC), with a core to LLC cache way ratio of 1:4. The proposed cache partitioning scheme uses small auxiliary tags to capture each core’s cache requirements, and partitions the LLC according to the individual cores cache requirement. The proposed partitioning uses a way-aligned scheme that helps in the reduction of both dynamic and static energy. This scheme, on an average offers 70% and 30% reduction in dynamic and static energy respectively, while maintaining high performance on par with state-of-the-art cache partitioning schemes. Finally, when Last Level Cache (LLC) ways are equal to or less than the number of cores present in many-core systems, cooperative partitioning cannot be used for partitioning the LLC. This thesis proposes a region aware cache partitioning scheme as an energy-efficient approach for many core systems that share the LLC, with a core to LLC way ratio of 1:2 and 1:1. The proposed partitioning, on an average offers 68% and 33% reduction in dynamic and static energy respectively, while again maintaining high performance on par with state-of-the-art LLC cache management techniques

    Multipurpose short-term memory structures.

    Get PDF
    by Yung, Chan.Thesis (M.Phil.)--Chinese University of Hong Kong, 1995.Includes bibliographical references (leaves 107-110).Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Cache --- p.1Chapter 1.1.1 --- Introduction --- p.1Chapter 1.1.2 --- Data Prefetching --- p.2Chapter 1.2 --- Register --- p.2Chapter 1.3 --- Problems and Challenges --- p.3Chapter 1.3.1 --- Overhead of registers --- p.3Chapter 1.3.2 --- EReg --- p.5Chapter 1.4 --- Organization of the Thesis --- p.6Chapter 2 --- Previous Studies --- p.8Chapter 2.1 --- Introduction --- p.8Chapter 2.2 --- Data aliasing --- p.9Chapter 2.3 --- Data prefetching --- p.12Chapter 2.3.1 --- Introduction --- p.12Chapter 2.3.2 --- Hardware Prefetching --- p.12Chapter 2.3.3 --- Prefetching with Software Support --- p.13Chapter 2.3.4 --- Reducing Cache Pollution --- p.14Chapter 3 --- BASIC and ADM Models --- p.15Chapter 3.1 --- Introduction of Basic Model --- p.15Chapter 3.2 --- Architectural and Operational Detail of Basic Model --- p.18Chapter 3.3 --- Discussion --- p.19Chapter 3.3.1 --- Implicit Storing --- p.19Chapter 3.3.2 --- Associative Logic --- p.22Chapter 3.4 --- Example for Basic Model --- p.22Chapter 3.5 --- Simulation Results --- p.23Chapter 3.6 --- Temporary Storage Problem in Basic Model --- p.29Chapter 3.6.1 --- Introduction --- p.29Chapter 3.6.2 --- Discussion on the Solutions --- p.31Chapter 3.7 --- Introduction of ADM Model --- p.35Chapter 3.8 --- Architectural and Operational Detail of ADM Model --- p.37Chapter 3.9 --- Discussion --- p.39Chapter 3.9.1 --- File Partition --- p.39Chapter 3.9.2 --- STORE Instruction --- p.39Chapter 3.10 --- Example for ADM Model --- p.40Chapter 3.11 --- Simulation Results --- p.40Chapter 3.12 --- Temporary storage Problem of ADM Model --- p.46Chapter 3.12.1 --- Introduction --- p.46Chapter 3.12.2 --- Discussion on the Solutions --- p.46Chapter 4 --- ADS Model and ADSM Model --- p.49Chapter 4.1 --- Introduction of ADS Model --- p.49Chapter 4.2 --- Architectural and Operational Detail of ADS Model --- p.50Chapter 4.3 --- Discussion --- p.52Chapter 4.3.1 --- Prefetching Priority --- p.52Chapter 4.3.2 --- Data Prefetching --- p.53Chapter 4.3.3 --- EReg File Splitting --- p.53Chapter 4.3.4 --- Compiling Procedure --- p.53Chapter 4.4 --- Example for ADS Model --- p.54Chapter 4.5 --- Simulation Results --- p.55Chapter 4.6 --- Discussion on the Architectural and Operational Variations for ADS Model --- p.62Chapter 4.6.1 --- Temporary storage Problem --- p.62Chapter 4.6.2 --- Operational variation for Data Prefetching --- p.63Chapter 4.7 --- Introduction of ADSM Model --- p.64Chapter 4.8 --- Architectural and Operational Detail of ADSM Model --- p.65Chapter 4.9 --- Discussion --- p.67Chapter 4.10 --- Example for ADSM Model --- p.67Chapter 4.11 --- Simulation Results --- p.68Chapter 4.12 --- Discussion on the Architectural and Operational Variations for ADSM Model --- p.71Chapter 4.12.1 --- Temporary storage Problem --- p.71Chapter 4.12.2 --- Operational variation for Data Prefetching --- p.73Chapter 5 --- IADSM Model and IADSMC&IDLC Model --- p.75Chapter 5.1 --- Introduction of IADSM Model --- p.75Chapter 5.2 --- Architectural and Operational Detail of IADSM Model --- p.76Chapter 5.3 --- Discussion --- p.79Chapter 5.3.1 --- Implicit Loading --- p.79Chapter 5.3.2 --- Compiling Procedure --- p.81Chapter 5.4 --- Example for IADSM Model --- p.81Chapter 5.5 --- Simulation Results --- p.84Chapter 5.6 --- Temporary Storage Problem of IADSM Model --- p.87Chapter 5.7 --- Introduction of IADSMC&IDLC Model..........: --- p.88Chapter 5.8 --- Architectural and Operational Detail of IADSMC & IDLC Model --- p.89Chapter 5.9 --- Discussion --- p.90Chapter 5.9.1 --- Additional Operations --- p.90Chapter 5.9.2 --- Compiling Procedure --- p.93Chapter 5.10 --- Example for IADSMC&IDLC Model --- p.93Chapter 5.11 --- Simulation Results --- p.94Chapter 5.12 --- Temporary Storage Problem of IADSMC&IDLC Model --- p.96Chapter 6 --- Compiler and Memory System Support for EReg --- p.99Chapter 6.1 --- Impact on Compiler --- p.99Chapter 6.1.1 --- Register Usage --- p.99Chapter 6.1.2 --- Effect of Unrolling --- p.100Chapter 6.1.3 --- Code Scheduling Algorithm --- p.101Chapter 6.2 --- Impact on Memory System --- p.102Chapter 6.2.1 --- Memory Bottleneck --- p.102Chapter 6.2.2 --- Size of EReg Files --- p.103Chapter 7 --- Conclusions --- p.104Chapter 7.1 --- Summary --- p.104Chapter 7.2 --- Future Research --- p.105Bibliography --- p.107Chapter A --- Source code of the Kernels --- p.111Chapter B --- Program Analysis --- p.126Chapter B.1 --- Program analysed by Basic Model --- p.126Chapter B.2 --- Program analysed by ADM Model --- p.133Chapter B.3 --- Program analysed by ADS Model --- p.140Chapter B.4 --- Program analysed by ADSM Model --- p.148Chapter B.5 --- Program analysed by IADSM Model --- p.156Chapter B.6 --- Program analysed by IADSMC&IDLC Model --- p.163Chapter C --- Cache Simulation on Prefetching of ADS model --- p.17

    Data prefetching using hardware register value predictable table.

    Get PDF
    by Chin-Ming, Cheung.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 95-97).Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview --- p.1Chapter 1.2 --- Objective --- p.3Chapter 1.3 --- Organization of the dissertation --- p.4Chapter 2 --- Related Works --- p.6Chapter 2.1 --- Previous Cache Works --- p.6Chapter 2.2 --- Data Prefetching Techniques --- p.7Chapter 2.2.1 --- Hardware Vs Software Assisted --- p.7Chapter 2.2.2 --- Non-selective Vs Highly Selective --- p.8Chapter 2.2.3 --- Summary on Previous Data Prefetching Schemes --- p.12Chapter 3 --- Program Data Mapping --- p.13Chapter 3.1 --- Regular and Irregular Data Access --- p.13Chapter 3.2 --- Propagation of Data Access Regularity --- p.16Chapter 3.2.1 --- Data Access Regularity in High Level Program --- p.17Chapter 3.2.2 --- Data Access Regularity in Machine Code --- p.18Chapter 3.2.3 --- Data Access Regularity in Memory Address Sequence --- p.20Chapter 3.2.4 --- Implication --- p.21Chapter 4 --- Register Value Prediction Table (RVPT) --- p.22Chapter 4.1 --- Predictability of Register Values --- p.23Chapter 4.2 --- Register Value Prediction Table --- p.26Chapter 4.3 --- Control Scheme of RVPT --- p.29Chapter 4.3.1 --- Details of RVPT Mechanism --- p.29Chapter 4.3.2 --- Explanation of the Register Prediction Mechanism --- p.32Chapter 4.4 --- Examples of RVPT --- p.35Chapter 4.4.1 --- Linear Array Example --- p.35Chapter 4.4.2 --- Linked List Example --- p.36Chapter 5 --- Program Register Dependency --- p.39Chapter 5.1 --- Register Dependency --- p.40Chapter 5.2 --- Generalized Concept of Register --- p.44Chapter 5.2.1 --- Cyclic Dependent Register(CDR) --- p.44Chapter 5.2.2 --- Acyclic Dependent Register(ADR) --- p.46Chapter 5.3 --- Program Register Overview --- p.47Chapter 6 --- Generalized RVPT Model --- p.49Chapter 6.1 --- Level N RVPT Model --- p.49Chapter 6.1.1 --- Identification of Level N CDR --- p.51Chapter 6.1.2 --- Recording CDR instructions of Level N CDR --- p.53Chapter 6.1.3 --- Prediction of Level N CDR --- p.55Chapter 6.2 --- Level 2 Register Value Prediction Table --- p.55Chapter 6.2.1 --- Level 2 RVPT Structure --- p.56Chapter 6.2.2 --- Identification of Level 2 CDR --- p.58Chapter 6.2.3 --- Control Scheme of Level 2 RVPT --- p.59Chapter 6.2.4 --- Example of Index Array --- p.63Chapter 7 --- Performance Evaluation --- p.66Chapter 7.1 --- Evaluation Methodology --- p.66Chapter 7.1.1 --- Trace-Drive Simulation --- p.66Chapter 7.1.2 --- Architectural Method --- p.68Chapter 7.1.3 --- Benchmarks and Metrics --- p.70Chapter 7.2 --- General Result --- p.75Chapter 7.2.1 --- Constant Stride or Regular Data Access Applications --- p.77Chapter 7.2.2 --- Non-constant Stride or Irregular Data Access Applications --- p.79Chapter 7.3 --- Effect of Design Variations --- p.80Chapter 7.3.1 --- Effect of Cache Size --- p.81Chapter 7.3.2 --- Effect of Block Size --- p.83Chapter 7.3.3 --- Effect of Set Associativity --- p.86Chapter 7.4 --- Summary --- p.87Chapter 8 --- Conclusion and Future Research --- p.88Chapter 8.1 --- Conclusion --- p.88Chapter 8.2 --- Future Research --- p.90Bibliography --- p.95Appendix --- p.98Chapter A --- MCPI vs. cache size --- p.98Chapter B --- MCPI Reduction Percentage Vs cache size --- p.102Chapter C --- MCPI vs. block size --- p.106Chapter D --- MCPI Reduction Percentage Vs block size --- p.110Chapter E --- MCPI vs. set-associativity --- p.114Chapter F --- MCPI Reduction Percentage Vs set-associativity --- p.11
    corecore