315 research outputs found
Efficient Computation of Map-scale Continuous Mutual Information on Chip in Real Time
Exploration tasks are essential to many emerging robotics applications,
ranging from search and rescue to space exploration. The planning problem for
exploration requires determining the best locations for future measurements
that will enhance the fidelity of the map, for example, by reducing its total
entropy. A widely-studied technique involves computing the Mutual Information
(MI) between the current map and future measurements, and utilizing this MI
metric to decide the locations for future measurements. However, computing MI
for reasonably-sized maps is slow and power hungry, which has been a bottleneck
towards fast and efficient robotic exploration. In this paper, we introduce a
new hardware accelerator architecture for MI computation that features a
low-latency, energy-efficient MI compute core and an optimized memory subsystem
that provides sufficient bandwidth to keep the cores fully utilized. The core
employs interleaving to counter the recursive algorithm, and workload balancing
and numerical approximations to reduce latency and energy consumption. We
demonstrate this optimized architecture with a Field-Programmable Gate Array
(FPGA) implementation, which can compute MI for all cells in an entire
201-by-201 occupancy grid ({\em e.g.}, representing a 20.1m-by-20.1m map at
0.1m resolution) in 1.55 ms while consuming 1.7 mJ of energy, thus finally
rendering MI computation for the whole map real time and at a fraction of the
energy cost of traditional compute platforms. For comparison, this particular
FPGA implementation running on the Xilinx Zynq-7000 platform is two orders of
magnitude faster and consumes three orders of magnitude less energy per MI map
compute, when compared to a baseline GPU implementation running on an NVIDIA
GeForce GTX 980 platform. The improvements are more pronounced when compared to
CPU implementations of equivalent algorithms
Intelligent systems for efficiency and security
As computing becomes ubiquitous and personalized, resources like energy, storage and time are becoming increasingly scarce and, at the same time, computing systems must deliver in multiple dimensions, such as high performance, quality of service, reliability, security and low power. Building such computers is hard, particularly when the operating environment is becoming more dynamic, and systems are becoming heterogeneous and distributed.
Unfortunately, computers today manage resources with many ad hoc heuristics that are suboptimal, unsafe, and cannot be composed across the computer’s subsystems. Continuing this approach has severe consequences: underperforming systems, resource waste, information loss, and even life endangerment.
This dissertation research develops computing systems which, through intelligent adaptation, deliver efficiency along multiple dimensions. The key idea is to manage computers with principled methods from formal control. It is with these methods that the multiple subsystems of a computer sense their environment and configure themselves to meet system-wide goals.
To achieve the goal of intelligent systems, this dissertation makes a series of contributions, each building on the previous. First, it introduces the use of formal MIMO (Multiple Input Multiple Output) control for processors, to simultaneously optimize many goals like performance, power, and temperature. Second, it develops the Yukta control system, which uses coordinated formal controllers in different layers of the stack (hardware and operating system). Third, it uses robust control to develop a fast, globally coordinated and decentralized control framework called Tangram, for heterogeneous computers. Finally, it presents Maya, a defense against power side-channel attacks that uses formal control to reshape the power dissipated by a computer, confusing the attacker. The ideas in the dissertation have been demonstrated successfully with several prototypes, including one built along with AMD (Advanced Micro Devices, Inc.) engineers. These designs significantly outperformed the state of the art.
The research in this dissertation brought formal control closer to computer architecture and has been well-received in both domains. It has the first application of full-fledged MIMO control for processors, the first use of robust control in computer systems, and the first application of formal control for side-channel defense. It makes a significant stride towards intelligent systems that are efficient, secure and reliable
Power-Performance Modeling and Adaptive Management of Heterogeneous Mobile Platforms​
abstract: Nearly 60% of the world population uses a mobile phone, which is typically powered by a system-on-chip (SoC). While the mobile platform capabilities range widely, responsiveness, long battery life and reliability are common design concerns that are crucial to remain competitive. Consequently, state-of-the-art mobile platforms have become highly heterogeneous by combining a powerful SoC with numerous other resources, including display, memory, power management IC, battery and wireless modems. Furthermore, the SoC itself is a heterogeneous resource that integrates many processing elements, such as CPU cores, GPU, video, image, and audio processors. Therefore, CPU cores do not dominate the platform power consumption under many application scenarios.
Competitive performance requires higher operating frequency, and leads to larger power consumption. In turn, power consumption increases the junction and skin temperatures, which have adverse effects on the device reliability and user experience. As a result, allocating the power budget among the major platform resources and temperature control have become fundamental consideration for mobile platforms. Dynamic thermal and power management algorithms address this problem by putting a subset of the processing elements or shared resources to sleep states, or throttling their frequencies. However, an adhoc approach could easily cripple the performance, if it slows down the performance-critical processing element. Furthermore, mobile platforms run a wide range of applications with time varying workload characteristics, unlike early generations, which supported only limited functionality. As a result, there is a need for adaptive power and performance management approaches that consider the platform as a whole, rather than focusing on a subset. Towards this need, our specific contributions include (a) a framework to dynamically select the Pareto-optimal frequency and active cores for the heterogeneous CPUs, such as ARM big.Little architecture, (b) a dynamic power budgeting approach for allocating optimal power consumption to the CPU and GPU using performance sensitivity models for each PE, (c) an adaptive GPU frame time sensitivity prediction model to aid power management algorithms, and (d) an online learning algorithm that constructs adaptive run-time models for non-stationary workloads.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
Power And Hotspot Modeling For Modern GPUs
As General Purpose GPUs (GPGPU) are increasingly becoming a prominent component of high performance computing platforms, power and thermal dissipation are getting more attention. The trade-offs among performance, power, and heat must be well modeled and evaluated from the early stage of GPU design. This necessitates a tool that allows GPU architects to quickly and accurately evaluate their design. There are a few models for GPU power but most of them estimate power at a higher level than architecture, which are therefore missing hardware reconfigurability. In this thesis, we propose a framework that models power and heat dissipation at the hardware architecture level, which allows for configuring and investigating individual hardware components. Our framework is also capable of visualizing the heat map of the processor over different clock cycles. To the best of our knowledge, this is the first comprehensive framework that integrates and visualizes power consumption and heat dissipation of GPUs
Intelligent Management of Mobile Systems through Computational Self-Awareness
Runtime resource management for many-core systems is increasingly complex.
The complexity can be due to diverse workload characteristics with conflicting
demands, or limited shared resources such as memory bandwidth and power.
Resource management strategies for many-core systems must distribute shared
resource(s) appropriately across workloads, while coordinating the high-level
system goals at runtime in a scalable and robust manner.
To address the complexity of dynamic resource management in many-core
systems, state-of-the-art techniques that use heuristics have been proposed.
These methods lack the formalism in providing robustness against unexpected
runtime behavior. One of the common solutions for this problem is to deploy
classical control approaches with bounds and formal guarantees. Traditional
control theoretic methods lack the ability to adapt to (1) changing goals at
runtime (i.e., self-adaptivity), and (2) changing dynamics of the modeled
system (i.e., self-optimization).
In this chapter, we explore adaptive resource management techniques that
provide self-optimization and self-adaptivity by employing principles of
computational self-awareness, specifically reflection. By supporting these
self-awareness properties, the system can reason about the actions it takes by
considering the significance of competing objectives, user requirements, and
operating conditions while executing unpredictable workloads
Efficient runtime management for enabling sustainable performance in real-world mobile applications
Mobile devices have become integral parts of our society. They handle our diverse computing needs from simple daily tasks (i.e., text messaging, e-mail) to complex graphics and media processing under a limited battery budget. Mobile system-on-chip (SoC) designs have become increasingly sophisticated to handle performance needs of diverse workloads and to improve user experience. Unfortunately, power and thermal constraints have also emerged as major concerns. Increased power densities and temperatures substantially impair user experience due to frequent throttling as well as diminishing device reliability and battery life. Addressing these concerns becomes increasingly challenging due to increased complexities at both hardware (e.g., heterogeneous CPUs, accelerators) and software (e.g., vast number of applications, multi-threading). Enabling sustained user experience in face of these challenges requires (1) practical runtime management solutions that can reason about the performance needs of users and applications while optimizing power and temperature; (2) tools for analyzing real-world mobile application behavior and performance.
This thesis aims at improving sustained user experience under thermal limitations by incorporating insights from real-world mobile applications into runtime management. This thesis first proposes thermally-efficient and Quality-of-Service (QoS) aware runtime management techniques to enable sustained performance. Our work leverages inherent QoS tolerance of users in real-world applications and introduces QoS-temperature tradeoff as a viable control knob to improve user experience under thermal constraints. We present a runtime control framework, QScale, which manages CPU power and scheduling decisions to optimize temperature while strictly adhering to given QoS targets. We also design a framework, Maestro, which provides autonomous and application-aware management of QoS-temperature tradeoffs. Maestro uses our thermally-efficient QoS control framework, QScale, as its foundation.
This thesis also presents tools to facilitate studies of real-world mobile applications. We design a practical record and replay system, RandR, to generate repeatable executions of mobile applications. RandR provides this capability by automatically reproducing non-deterministic input sources in mobile applications such as user inputs and network events. Finally, we focus on the non-deterministic executions in Android malware which seek to evade analysis environments. We propose the Proteus system to identify the instruction-level inputs that reveal analysis environments
- …