7 research outputs found
Exploiting memory allocations in clusterized many-core architectures
Power-efficient architectures have become the most important feature required for future embedded systems. Modern
designs, like those released on mobile devices, reveal that clusterization is the way to improve energy efficiency. However, such
architectures are still limited by the memory subsystem (i.e., memory latency problems). This work investigates an alternative
approach that exploits on-chip data locality to a large extent, through distributed shared memory systems that permit efficient
reuse of on-chip mapped data in clusterized many-core architectures. First, this work reviews the current literature on memory
allocations and explore the limitations of cluster-based many-core architectures. Then, several memory allocations are introduced
and benchmarked scalability, performance and energy-wise, compared to the conventional centralized shared memory solution to
reveal which memory allocation is the most appropriate for future mobile architectures. Our results show that distributed shared
memory allocations bring performance gains and opportunities to reduce energy consumption
Recommended from our members
Bridging the gap between mobile CPU design and user satisfaction via crowdsourcing
This report aims to provide an understanding of how the mobile CPU designs have evolved and its influence on end-user satisfaction. To that end, a quantitative performance analysis is conducted across ten cutting-edge mobile CPU designs studied within top-selling off-the-shelf smartphones released over the past seven years. This analysis is then used to guide a large-scale user study spanning over 25,000 participants via crowdsourcing on the Amazon Mechanical Turk service. The user study asks participants to assess the responsiveness of interactive application use cases for a set of current-generation applications (e.g. Angry Birds and FaceBook) and next-generation applications (i.e. face recognition and augmented reality) relative to the performance capabilities of the devices studied. This framework allows us to quantitatively link how the mobile CPU designs studied impacted end-user satisfaction. The study results indicate that mobile CPU designs have exhibited signifiant performance improvements through aggressive core scaling techniques prevalent in desktop CPUs. Just as was observed in desktop CPU design, these same techniques have lead to excessive mobile CPU power consumption. However, from an end-user perspective this power consumption was not without success. Mobile CPUs have evolved to provide satisfactory experiences for the studied current- generation applications. The reason is that many of these applications rely heavily on single-threaded performance. Other, more recent applications, actually multi-thread user-critical parts of the applications, which also demonstrates that multi- core mobile CPUs are an important design consideration – contrary to conventional wisdom. However, looking ahead, the same mobile CPUs where not able to provide satisfactory experiences for many of the next-generation applications studied, questioning the sustainability of these power-hungry design techniques in future mobile CPU designs.Electrical and Computer Engineerin
Coordinated management of the processor and memory for optimizing energy efficiency
Energy efficiency is a key design goal for future computing systems. With diverse components interacting with each other on the System-on-Chip (SoC), dynamically managing performance, energy and temperature is a challenge in 2D architectures and more so in a 3D stacked environment. Temperature has emerged as the parameter of primary concern. Heuristics based schemes have been employed so far to address these issues. Looking ahead into the future, complex multiphysics interactions between performance, energy and temperature reveal the limitations of such approaches. Therefore in this thesis, first, a comprehensive characterization of existing methods is carried out to identify causes for their inefficiency. Managing different components in an independent and isolated fashion using heuristics is seen to be the primary drawback. Following this, techniques based on feedback control theory to optimize the energy efficiency of the processor and memory in a coordinated fashion are developed. They are evaluated on a real physical system and a cycle-level simulator demonstrating significant improvements over prior schemes. The two main messages of this thesis are, (i) coordination between multiple components is paramount for next generation computing systems and (ii) temperature ought to be treated as a resource like compute or memory cycles.Ph.D
Recommended from our members
Energy-efficient mobile Web computing
Next-generation Web services will be primarily accessed through mobile devices. However, mobile devices are low-performance and stringently energy-constrained. In my dissertation, I propose the design of a high-performance and energy-efficient mobile Web computing substrate. It is a hardware/software co-designed system that delivers satisfactory user quality-of-service (QoS) experience on a mobile energy budget. The key insight is that the traditional interfaces between different Web stacks need to be enhanced with new abstractions that express user QoS experience and that expose architectural-level complexities. On the basis of the enhanced interfaces, I propose synergistic cross-layer optimizations across the processor architecture, Web runtime, programming language, and application layers to maximize the whole system efficiency. The contributions made in this dissertation will likely have a long-term impact because the target application domain, the Web, is becoming a universal mobile development platform, and because our solutions target the fundamental computation layers of the Web domain.Electrical and Computer Engineerin
Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study
The number and diversity of consumer devices are growing rapidly, alongside
their target applications' memory consumption. Unfortunately, DRAM scalability
is becoming a limiting factor to the available memory capacity in consumer
devices. As a potential solution, manufacturers have introduced emerging
non-volatile memories (NVMs) into the market, which can be used to increase the
memory capacity of consumer devices by augmenting or replacing DRAM. Since
entirely replacing DRAM with NVM in consumer devices imposes large system
integration and design challenges, recent works propose extending the total
main memory space available to applications by using NVM as swap space for
DRAM. However, no prior work analyzes the implications of enabling a real
NVM-based swap space in real consumer devices.
In this work, we provide the first analysis of the impact of extending the
main memory space of consumer devices using off-the-shelf NVMs. We extensively
examine system performance and energy consumption when the NVM device is used
as swap space for DRAM main memory to effectively extend the main memory
capacity. For our analyses, we equip real web-based Chromebook computers with
the Intel Optane SSD, which is a state-of-the-art low-latency NVM-based SSD
device. We compare the performance and energy consumption of interactive
workloads running on our Chromebook with NVM-based swap space, where the Intel
Optane SSD capacity is used as swap space to extend main memory capacity,
against two state-of-the-art systems: (i) a baseline system with double the
amount of DRAM than the system with the NVM-based swap space; and (ii) a system
where the Intel Optane SSD is naively replaced with a state-of-the-art (yet
slower) off-the-shelf NAND-flash-based SSD, which we use as a swap space of
equivalent size as the NVM-based swap space
An Intelligent Framework for Energy-Aware Mobile Computing Subject to Stochastic System Dynamics
abstract: User satisfaction is pivotal to the success of mobile applications. At the same time, it is imperative to maximize the energy efficiency of the mobile device to ensure optimal usage of the limited energy source available to mobile devices while maintaining the necessary levels of user satisfaction. However, this is complicated due to user interactions, numerous shared resources, and network conditions that produce substantial uncertainty to the mobile device's performance and power characteristics. In this dissertation, a new approach is presented to characterize and control mobile devices that accurately models these uncertainties. The proposed modeling framework is a completely data-driven approach to predicting power and performance. The approach makes no assumptions on the distributions of the underlying sources of uncertainty and is capable of predicting power and performance with over 93% accuracy.
Using this data-driven prediction framework, a closed-loop solution to the DEM problem is derived to maximize the energy efficiency of the mobile device subject to various thermal, reliability and deadline constraints. The design of the controller imposes minimal operational overhead and is able to tune the performance and power prediction models to changing system conditions. The proposed controller is implemented on a real mobile platform, the Google Pixel smartphone, and demonstrates a 19% improvement in energy efficiency over the standard frequency governor implemented on all Android devices.Dissertation/ThesisDoctoral Dissertation Computer Engineering 201
DSPatch: Dual Spatial Pattern Prefetcher
High main memory latency continues to limit performance of modern
high-performance out-of-order cores. While DRAM latency has remained nearly the
same over many generations, DRAM bandwidth has grown significantly due to
higher frequencies, newer architectures (DDR4, LPDDR4, GDDR5) and 3D-stacked
memory packaging (HBM). Current state-of-the-art prefetchers do not do well in
extracting higher performance when higher DRAM bandwidth is available.
Prefetchers need the ability to dynamically adapt to available bandwidth,
boosting prefetch count and prefetch coverage when headroom exists and
throttling down to achieve high accuracy when the bandwidth utilization is
close to peak. To this end, we present the Dual Spatial Pattern Prefetcher
(DSPatch) that can be used as a standalone prefetcher or as a lightweight
adjunct spatial prefetcher to the state-of-the-art delta-based Signature
Pattern Prefetcher (SPP). DSPatch builds on a novel and intuitive use of
modulated spatial bit-patterns. The key idea is to: (1) represent program
accesses on a physical page as a bit-pattern anchored to the first "trigger"
access, (2) learn two spatial access bit-patterns: one biased towards coverage
and another biased towards accuracy, and (3) select one bit-pattern at run-time
based on the DRAM bandwidth utilization to generate prefetches. Across a
diverse set of workloads, using only 3.6KB of storage, DSPatch improves
performance over an aggressive baseline with a PC-based stride prefetcher at
the L1 cache and the SPP prefetcher at the L2 cache by 6% (9% in
memory-intensive workloads and up to 26%). Moreover, the performance of
DSPatch+SPP scales with increasing DRAM bandwidth, growing from 6% over SPP to
10% when DRAM bandwidth is doubled.Comment: This work is to appear in MICRO 201