Search CORE

6 research outputs found

Estimation and Control of Dynamical Systems with Applications to Multi-Processor Systems

Author: Zhang Haotian
Publication venue: 'University of Waterloo'
Publication date: 19/08/2016
Field of study

System and control theory is playing an increasingly important role in the design and analysis of computing systems. This thesis investigates a set of estimation and control problems that are driven by new challenges presented by next-generation Multi-Processor Systems on Chips (MPSoCs). Specifically, we consider problems related to state norm estimation, state estimation for positive systems, sensor selection, and nonlinear output tracking. Although these problems are motivated by applications to multi-processor systems, the corresponding theory and algorithms are developed for general dynamical systems. We first study state norm estimation for linear systems with unknown inputs. Specifically, we consider a formulation where the unknown inputs and initial condition of the system are bounded in magnitude, and the objective is to construct an unknown input norm-observer which estimates an upper bound for the norm of the states. This class of problems is motivated by the need to estimate the maximum temperature across a multi-core processor, based on a given model of the thermal dynamics. In order to characterize the existence of the norm observer, we propose a notion of bounded-input-bounded-output-bounded-state (BIBOBS) stability; this concept supplements various system properties, including bounded-input-bounded-output (BIBO) stability, bounded-input-bounded-state (BIBS) stability, and input-output-to-state stability (IOSS).We provide necessary and sufficient conditions on the system matrices under which a linear system is BIBOBS stable, and show that the set of modes of the system with magnitude 1 plays a key role. A construction for the unknown input norm-observer follows as a byproduct. Then we investigate the state estimation problem for positive linear systems with unknown inputs. This problem is also motivated by the need to monitor the temperature of a multi-processor system and the property of positivity arises due to the physical nature of the thermal model. We extend the concept of strong observability to positive systems and as a negative result, we show that the additional information on positivity does not help in state estimation. Since the states of the system are always positive, negative state estimates are meaningless and the positivity of the observers themselves may be desirable in certain applications. Moreover, positive systems possess certain desired robustness properties. Thus, for positive systems where state estimation with unknown inputs is possible, we provide a linear programming based design procedure for delayed positive observers. Next we consider the problem of selecting an optimal set of sensors to estimate the states of linear dynamical systems; in the context of multi-core processors, this problem arises due to the need to place thermal sensors in order to perform state estimation. The goal is to choose (at design-time) a subset of sensors (satisfying certain budget constraints) from a given set in order to minimize the trace of the steady state a priori or a posteriori error covariance produced by a Kalman filter. We show that the a priori and a posteriori error covariance-based sensor selection problems are both NP-hard, even under the additional assumption that the system is stable. We then provide bounds on the worst-case performance of sensor selection algorithms based on the system dynamics, and show that certain greedy algorithms are optimal for two classes of systems. However, as a negative result, we show that certain typical objective functions are not submodular or supermodular in general. While this makes it difficult to evaluate the performance of greedy algorithms for sensor selection (outside of certain special cases), we show via simulations that these greedy algorithms perform well in practice. Finally, we study the output tracking problem for nonlinear systems with constraints. This class of problems arises due to the need to optimize the energy consumption of the CPU-GPU subsystem in multi-processor systems while satisfying certain Quality of Service (QoS) requirements. In order for the system output to track a class of bounded reference signals with limited online computational resources, we propose a sampling-based explicit nonlinear model predictive control (ENMPC) approach, where only a bound on the admissible references is known to the designer a priori. The basic idea of sampling-based ENMPC is to sample the state and reference signal space using deterministic sampling and construct the ENMPC by using regression methods. The proposed approach guarantees feasibility and stability for all admissible references and ensures asymptotic convergence to the set-point. Furthermore, robustness through the use of an ancillary controller is added to the nominal ENMPC for a class of nonlinear systems with additive disturbances, where the robust controller keeps the system output close to the desired nominal trajectory

University of Waterloo's Institutional Repository

Reducing redundancy of real time computer graphics in mobile systems

Author: De Lucas Enrique
Publication venue: Universitat Politècnica de Catalunya
Publication date: 20/04/2018
Field of study

The goal of this thesis is to propose novel and effective techniques to eliminate redundant computations that waste energy and are performed in real-time computer graphics applications, with special focus on mobile GPU micro-architecture. Improving the energy-efficiency of CPU/GPU systems is not only key to enlarge their battery life, but also allows to increase their performance because, to avoid overheating above thermal limits, SoCs tend to be throttled when the load is high for a large period of time. Prior studies pointed out that the CPU and especially the GPU are the principal energy consumers in the graphics subsystem, being the off-chip main memory accesses and the processors inside the GPU the primary energy consumers of the graphics subsystem. First, we focus on reducing redundant fragment processing computations by means of improving the culling of hidden surfaces. During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image. When the GPU realizes that an object or part of it is not going to be visible, all activity required to compute its color and store it has already been performed. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware to maximize the culling effectiveness of the GPU and minimize overshading, hence reducing execution time and energy consumption. VRO exploits the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence) to provide the feeling of smooth transition. VRO keeps visibility information of a frame, and uses it to reorder the objects of the following frame. VRO just requires adding a small hardware to capture the visibility information and use it later to guide the rendering of the following frame. Moreover, VRO works in parallel with the graphics pipeline, so negligible performance overheads are incurred. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average. Then, we focus on avoiding redundant computations related to CPU Collision Detection (CD). Graphics applications such as 3D games represent a large percentage of downloaded applications for mobile devices and the trend is towards more complex and realistic scenes with accurate 3D physics simulations. CD is one of the most important algorithms in any physics kernel since it identifies the contact points between the objects of a scene and determines when they collide. However, real-time accurate CD is very expensive in terms of energy consumption. We propose Render Based Collision Detection (RBCD), a novel energy-efficient high-fidelity CD scheme that leverages some intermediate results of the rendering pipeline to perform CD, so that redundant tasks are done just once. Comparing RBCD with a conventional CD completely executed in the CPU, we show that its execution time is reduced by almost three orders of magnitude (600x speedup), because most of the CD task of our model comes for free by reusing the image rendering intermediate results. Although not necessarily, such a dramatic time improvement may result in better frames per second if physics simulation stays in the critical path. However, the most important advantage of our technique is the enormous energy savings that result from eliminating a long and costly CPU computation and converting it into a few simple operations executed by a specialized hardware within the GPU. Our results show that the energy consumed by CD is reduced on average by a factor of 448x (i.e., by 99.8\%). These dramatic benefits are accompanied by a higher fidelity CD analysis (i.e., with finer granularity), which improves the quality and realism of the application.El objetivo de esta tesis es proponer técnicas efectivas y originales para eliminar computaciones inútiles que aparecen en aplicaciones gráficas, con especial énfasis en micro-arquitectura de GPUs. Mejorar la eficiencia energética de los sistemas CPU/GPU no es solo clave para alargar la vida de la batería, sino también incrementar su rendimiento. Estudios previos han apuntado que la CPU y especialmente la GPU son los principales consumidores de energía en el sub-sistema gráfico, siendo los accesos a memoria off-chip y los procesadores dentro de la GPU los principales consumidores de energía del sub-sistema gráfico. Primero, nos hemos centrado en reducir computaciones redundantes de la fase de fragment processing mediante la mejora en la eliminación de superficies ocultas. Durante el renderizado de gráficos en tiempo real, los objetos son procesados por la GPU en el orden en el que son enviados por la CPU, y las superficies ocultas son a menudo procesadas incluso si no no acaban formando parte de la imagen final. Cuando la GPU averigua que el objeto o parte de él no es visible, toda la actividad requerida para computar su color y guardarlo ha sido realizada. Proponemos una técnica arquitectónica original para GPUs móviles, Visibility Rendering Order (VRO), la cual reordena los objetos de delante hacia atrás por completo en hardware para maximizar la efectividad del culling de la GPU y así minimizar el overshading, y por lo tanto reducir el tiempo de ejecución y el consumo de energía. VRO explota el hecho de que los objetos de las aplicaciones gráficas animadas tienden a mantener su orden relativo en profundidad a través de frames consecutivos (coherencia temporal) para proveer animaciones con transiciones suaves. Dado que las relaciones de orden en profundidad entre objetos son testeadas en la GPU, VRO introduce costes mínimos en energía. Solo requiere añadir una pequeña unidad hardware para capturar la información de visibilidad. Además, VRO trabaja en paralelo con el pipeline gráfico, por lo que introduce costes insignificantes en tiempo. Ilustramos los beneficios de VRO usango varias aplicaciones 3D comerciales para las cuales VRO consigue un 27% de speed-up y un 14.8% de reducción de energía en media. En segundo lugar, evitamos computaciones redundantes relacionadas con la Detección de Colisiones (CD) en la CPU. Las aplicaciones gráficas animadas como los juegos 3D representan un alto porcentaje de las aplicaciones descargadas en dispositivos móviles y la tendencia es hacia escenas más complejas y realistas con simulaciones físicas 3D precisas. La CD es uno de los algoritmos más importantes entre los kernel de físicas dado que identifica los puntos de contacto entre los objetos de una escena. Sin embargo, una CD en tiempo real y precisa es muy costosa en términos de consumo energético. Proponemos Render Based Collision Detection (RBCD), una técnica energéticamente eficiente y preciso de CD que utiliza resultados intermedios del rendering pipeline para realizar la CD. Comparando RBCD con una CD convencional completamente ejecutada en la CPU, mostramos que el tiempo de ejecución es reducido casi tres órdenes de magnitud (600x speedup), porque la mayoría de la CD de nuestro modelo reusa resultados intermedios del renderizado de la imagen. Aunque no es así necesariamente, esta espectacular en tiempo puede resultar en mejores frames por segundo si la simulación de físicas está en el camino crítico. Sin embargo, la ventaja más importante de nuestra técnica es el enorme ahorro de energía que resulta de eliminar las largas y costosas computaciones en la CPU, sustituyéndolas por unas pocas operaciones ejecutadas en un hardware especializado dentro de la GPU. Nuestros resultados muestran que la energía consumida por la CD es reducidad en media por un factor de 448x. Estos dramáticos beneficios vienen acompañados de una mayor fidelidad en la CD (i.e. con granularidad más fina)Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Thermal and QoS-Aware Embedded Systems

Author: Lee Youngmoon
Publication venue
Publication date: 01/01/2019
Field of study

While embedded systems such as smartphones and smart cars become essential parts of our lives, they face urgent thermal challenges. Extreme thermal conditions (i.e., both high and low temperatures) degrade system reliability, even risking safety; devices in the cold environments unexpectedly go offline, whereas extremely high device temperatures can cause device failures or battery explosions. These thermal limits become close to the norm because of ever-increasing chip power densities and application complexities. Embedded systems in the wild, however, lack adaptive and effective solutions to overcome such thermal challenges. An adaptive thermal management solution must cope with various runtime thermal scenarios under a changing ambient temperature. An effective solution requires the understanding of the dynamic thermal behaviors of underlying hardware and application workloads to ensure thermal and application quality-of-service (QoS) requirements. This thesis proposes a suite of adaptive and effective thermal management solutions to address different aspects of real-world thermal challenges faced by modern embedded systems. First, we present BPM, a battery-aware power management framework for mobile devices to address the unexpected device shutoffs in cold environments. We develop BPM as a background service that characterizes and controls real-time battery behaviors to maintain operable conditions even in cold environments. We then propose eTEC, building on the thermoelectric cooling solution, which adaptively controls cooling and computational power to avoid mobile devices overheating. For the real-time embedded systems such as cars, we present RT-TRM, a thermal-aware resource management framework that monitors changing ambient temperatures and allocates system resources to individual tasks. Next, we target in-vehicle vision systems running on CPUs–GPU system-on-chips and develop CPU–GPU co-scheduling to tackle thermal imbalance across CPUs caused by GPU heat. We evaluate all of these solutions using representative mobile/automotive platforms and workloads, demonstrating their effectiveness in meeting thermal and QoS requirements.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153350/1/ymoonlee_1.pd

Deep Blue Documents at the University of Michigan

Efficient runtime management for enabling sustainable performance in real-world mobile applications

Author: Sahin Onur
Publication venue
Publication date: 29/09/2019
Field of study

Mobile devices have become integral parts of our society. They handle our diverse computing needs from simple daily tasks (i.e., text messaging, e-mail) to complex graphics and media processing under a limited battery budget. Mobile system-on-chip (SoC) designs have become increasingly sophisticated to handle performance needs of diverse workloads and to improve user experience. Unfortunately, power and thermal constraints have also emerged as major concerns. Increased power densities and temperatures substantially impair user experience due to frequent throttling as well as diminishing device reliability and battery life. Addressing these concerns becomes increasingly challenging due to increased complexities at both hardware (e.g., heterogeneous CPUs, accelerators) and software (e.g., vast number of applications, multi-threading). Enabling sustained user experience in face of these challenges requires (1) practical runtime management solutions that can reason about the performance needs of users and applications while optimizing power and temperature; (2) tools for analyzing real-world mobile application behavior and performance. This thesis aims at improving sustained user experience under thermal limitations by incorporating insights from real-world mobile applications into runtime management. This thesis first proposes thermally-efficient and Quality-of-Service (QoS) aware runtime management techniques to enable sustained performance. Our work leverages inherent QoS tolerance of users in real-world applications and introduces QoS-temperature tradeoff as a viable control knob to improve user experience under thermal constraints. We present a runtime control framework, QScale, which manages CPU power and scheduling decisions to optimize temperature while strictly adhering to given QoS targets. We also design a framework, Maestro, which provides autonomous and application-aware management of QoS-temperature tradeoffs. Maestro uses our thermally-efficient QoS control framework, QScale, as its foundation. This thesis also presents tools to facilitate studies of real-world mobile applications. We design a practical record and replay system, RandR, to generate repeatable executions of mobile applications. RandR provides this capability by automatically reproducing non-deterministic input sources in mobile applications such as user inputs and network events. Finally, we focus on the non-deterministic executions in Android malware which seek to evade analysis environments. We propose the Proteus system to identify the instruction-level inputs that reveal analysis environments

Boston University Institutional Repository (OpenBU)