5 research outputs found

    High Performance Hybrid Memory Systems with 3D-stacked DRAM

    Get PDF
    The bandwidth of traditional DRAM is pin limited and so does not scale wellwith the increasing demand of data intensive workloads limiting performance.3D-stacked DRAM can alleviate this problem providing substantially higherbandwidth to a processor chip. However, the capacity of 3D-stacked DRAM isnot enough to replace the bulk of the memory and therefore it is used eitheras a DRAM cache or as part of a flat address space with support for datamigration. The performance of both above alternative designs is limited bytheir particular overheads. In this thesis we propose designs that improvethe performance of hybrid memory systems in which 3D-stacked DRAM isused either as a cache or as part of a flat address space with data migration.DRAM caches have shown excellent potential in capturing the spatial andtemporal data locality of applications, however they are still far from their idealperformance. Besides the unavoidable DRAM access to fetch the requesteddata, tag access is in the critical path adding significant latency and energycosts. Existing approaches are not able to remove these overheads and insome cases limit DRAM cache design options. To alleviate the tag accessoverheads of DRAM caches this thesis proposes Decoupled Fused Cache (DFC),a DRAM cache design that fuses DRAM cache tags with the tags of the on-chipLast Level Cache (LLC) to access the DRAM cache data directly on LLCmisses. Compared to current state-of-the-art DRAM caches, DFC improvessystem performance by 6% on average and by 16-18% for large cacheline sizes.Finally, DFC reduces DRAM cache traffic by 18% and DRAM cache energyconsumption by 7%. Data migration schemes have significant performancepotential, but also entail overheads, which may diminish migration benefitsor even lead to performance degradation. These overheads are mainly due tothe high cost of swapping data between memories which also makes selectingwhich data to migrate critical to performance. To address these challengesof data migration this thesis proposes LLC guided Data Migration (LGM).LGM uses the LLC to predict future reuse and select memory segments formigration. Furtermore, LGM reduces the data migration traffic overheads bynot migrating the cache lines of memory segments which are present in theLLC. LGM outperforms current state-of-the art migration designs improvingsystem performance by 12.1% and reducing memory system dynamic energyby 13.2%

    Performance and Energy Trade-Offs for Parallel Applications on Heterogeneous Multi-Processing Systems

    Get PDF
    This work proposes a methodology to find performance and energy trade-offs for parallel applications running on Heterogeneous Multi-Processing systems with a single instruction-set architecture. These offer flexibility in the form of different core types and voltage and frequency pairings, defining a vast design space to explore. Therefore, for a given application, choosing a configuration that optimizes the performance and energy consumption is not straightforward. Our method proposes novel analytical models for performance and power consumption whose parameters can be fitted using only a few strategically sampled offline measurements. These models are then used to estimate an application’s performance and energy consumption for the whole configuration space. In turn, these offline predictions define the choice of estimated Pareto-optimal configurations of the model, which are used to inform the selection of the configuration that the application should be executed on. The methodology was validated on an ODROID-XU3 board for eight programs from the PARSEC Benchmark, Phoronix Test Suite and Rodinia applications. The generated Pareto-optimal configuration space represented a 99% reduction of the universe of all available configurations. Energy savings of up to 59.77%, 61.38% and 17.7% were observed when compared to the performance, ondemand and powersave Linux governors, respectively, with higher or similar performance

    High Performance Hybrid Memory Systems with 3D-stacked DRAM

    Get PDF
    The bandwidth of traditional DRAM is pin limited and so does not scale well with the increasing demand of data intensive workloads. 3D-stacked DRAM can alleviate this problem providing substantially higher bandwidth to a processor chip. However, the capacity of 3D-stacked DRAM is not enough to replace the bulk of the memory and therefore it is used together with off-chip DRAM in a hybrid memory system, either as a DRAM cache or as part of a flat address space with support for data migration. The performance of both above alternative designs is limited by their particular overheads. This thesis proposes new designs that improve the performance of hybrid memory systems. It does so first by alleviating the overheads of current approaches and second, by proposing a new design that combines the best attributes of DRAM caching and data migration while addressing their respective weaknesses. The first part of this thesis focuses on improving the performance of DRAM caches. Besides the unavoidable DRAM access to fetch the requested data, tag access is in the critical path adding significant latency and energy costs. Existing approaches are not able to remove these overheads and in some cases limit DRAM cache design options. To alleviate the tag access overheads of DRAM caches this thesis proposes Decoupled Fused Cache (DFC), a DRAM cache design that fuses DRAM cache tags with the tags of the on-chip Last Level Cache (LLC) to access the DRAM cache data directly on LLC misses. Compared to current state-of-the-art DRAM caches, DFC improves system performance by 11% on average. Finally, DFC reduces DRAM cache traffic by 25% and DRAM cache energy consumption by 24.5%. The second part of this thesis focuses on improving the performance of data migration. Data migration has significant performance potential, but also entails overheads which may diminish its benefits or even degrade performance. These overheads are mainly due to the high cost of swapping data between memories which also makes selecting which data to migrate critical to performance. To address these challenges of data migration this thesis proposes LLC guided Data Migration (LGM). LGM uses the LLC to predict future reuse and select memory segments for migration. Furthermore, LGM reduces the data migration traffic overheads by not migrating the cache lines of memory segments which are present in the LLC. LGM outperforms current state-of-the art data migration, improving system performance by 12.1% and reducing memory system dynamic energy by 13.2%. DRAM caches and data migration offer different tradeoffs for the utilization of 3D-stacked DRAM but also share some similar challenges. The third part of this thesis aims to provide an alternative approach to the utilization of 3D-stacked DRAM combining the strengths of both DRAM caches and data migration while eliminating their weaknesses. To that end, this thesis proposes Hybrid2, a hybrid memory system design which uses only a small fraction of the 3D-stacked DRAM as a cache and thus does not deny valuable capacity from the memory system. It further leverages the DRAM cache as a staging area to select the data most suitable for migration. Finally, Hybrid2 alleviates the metadata overheads of both DRAM caches and migration using a common mechanism. Depending on the system configuration, Hybrid2 on average outperforms state-of-the-art migration schemes by 6.4% to 9.1%, compared to DRAM caches Hybrid2 gives away on average only 0.3%, to 5.3% of performance offering up to 24.6% more main memory capacity
    corecore