48 research outputs found

    A Control-based Methodology for Power-performance Optimization in NoCs Exploiting DVFS

    Get PDF
    Networks-on-Chip (NoCs) are considered a viable solution to fully exploit the computational power of multi- and many-cores, but their non negligible power consumption requires ad hoc power-performance design methodologies. In this perspective, several proposals exploited the possibility to dynamically tune voltage and frequency for the interconnect, taking steps from traditional CPU-based power management solutions. However, the impact of the actuators, i.e. the limited range of frequencies for a PLL (Phase Locked Loop) or the time to increase voltage and frequency for a Dynamic Voltage and Frequency Scaling (DVFS) modules, are often not carefully accounted for, thus overestimating the benefits. This paper presents a control-based methodology for the NoC power-performance optimization exploiting the Dynamic Frequency Scaling (DFS). Both timing and power overheads of the actuators are considered, thanks to an ad hoc simulation framework. Moreover the proposed methodology eventually allows for user and/or OS interactions to change between different high level power-performance modes, i.e. to trigger performance oriented or power saving system behaviors. Experimental validation considered a 16-core architecture comparing our proposal with different settings of threshold-based policies. We achieved a speedup up to 3 for the timing and a reduction up to 33.17% of the power ∗ time product against the best threshold-based policy. Moreover, our best control-based scheme provides an averaged power-performance product improvement of 16.50% and 34.79% against the best and the second considered threshold-based policy setting

    Investigation into scalable energy and performance models for many-core systems

    Get PDF
    PhD ThesisIt is likely that many-core processor systems will continue to penetrate emerging embedded and high-performance applications. Scalable energy and performance models are two critical aspects that provide insights into the conflicting trade-offs between them with growing hardware and software complexity. Traditional performance models, such as Amdahl’s Law, Gustafson’s and Sun-Ni’s, have helped the research community and industry to better understand the system performance bounds with given processing resources, which is otherwise known as speedup. However, these models and their existing extensions have limited applicability for energy and/or performance-driven system optimization in practical systems. For instance, these are typically based on software characteristics, assuming ideal and homogeneous hardware platforms or limited forms of processor heterogeneity. In addition, the measurement of speedup and parallelization factors of an application running on a specific hardware platform require instrumenting the original software codes. Indeed, practical speedup and parallelizability models of application workloads running on modern heterogeneous hardware are critical for energy and performance models, as they can be used to inform design and control decisions with an aim to improve system throughput and energy efficiency. This thesis addresses the limitations by firstly developing novel and scalable speedup and energy consumption models based on a more general representation of heterogeneity, referred to as the normal form heterogeneity. A method is developed whereby standard performance counters found in modern many-core platforms can be used to derive speedup, and therefore the parallelizability of the software, without instrumenting applications. This extends the usability of the new models to scenarios where the parallelizability of software is unknown, leading to potentially Run-Time Management (RTM) speedup and/or energy efficiency optimization. The models and optimization methods presented in this thesis are validated through extensive experimentation, by running a number of different applications in wide-ranging concurrency scenarios on a number of different homogeneous and heterogeneous Multi/Many Core Processor (M/MCP) systems. These include homogeneous and heterogeneous architectures and viii range from existing off-the-shelf platforms to potential future system extensions. The practical use of these models and methods is demonstrated through real examples such as studying the effectiveness of the system load balancer. The models and methodologies proposed in this thesis provide guidance to a new opportunities for improving the energy efficiency of M/MCP systemsHigher Committee of Education Development (HCED) in Ira

    Zuverlässige und Energieeffiziente gemischt-kritische Echtzeit On-Chip Systeme

    Get PDF
    Multi- and many-core embedded systems are increasingly becoming the target for many applications that require high performance under varying conditions. A resulting challenge is the control, and reliable operation of such complex multiprocessing architectures under changes, e.g., high temperature and degradation. In mixed-criticality systems where many applications with varying criticalities are consolidated on the same execution platform, fundamental isolation requirements to guarantee non-interference of critical functions are crucially important. While Networks-on-Chip (NoCs) are the prevalent solution to provide scalable and efficient interconnects for the multiprocessing architectures, their associated energy consumption has immensely increased. Specifically, hard real-time NoCs must manifest limited energy consumption as thermal runaway in such a core shared resource jeopardizes the whole system guarantees. Thus, dynamic energy management of NoCs, as opposed to the related work static solutions, is highly necessary to save energy and decrease temperature, while preserving essential temporal requirements. In this thesis, we introduce a centralized management to provide energy-aware NoCs for hard real-time systems. The design relies on an energy control network, developed on top of an existing switch arbitration network to allow isolation between energy optimization and data transmission. The energy control layer includes local units called Power-Aware NoC controllers that dynamically optimize NoC energy depending on the global state and applications’ temporal requirements. Furthermore, to adapt to abnormal situations that might occur in the system due to degradation, we extend the concept of NoC energy control to include the entire system scope. That is, online resource management employing hierarchical control layers to treat system degradation (imminent core failures) is supported. The mechanism applies system reconfiguration that involves workload migration. For mixed-criticality systems, it allows flexible boundaries between safety-critical and non-critical subsystems to safely apply the reconfiguration, preserving fundamental safety requirements and temporal predictability. Simulation and formal analysis-based experiments on various realistic usecases and benchmarks are conducted showing significant improvements in NoC energy-savings and in treatment of system degradation for mixed-criticality systems improving dependability over the status quo.Eingebettete Many- und Multi-core-Systeme werden zunehmend das Ziel für Anwendungen, die hohe Anfordungen unter unterschiedlichen Bedinungen haben. Für solche hochkomplexed Multi-Prozessor-Systeme ist es eine grosse Herausforderung zuverlässigen Betrieb sicherzustellen, insbesondere wenn sich die Umgebungseinflüsse verändern. In Systeme mit gemischter Kritikalität, in denen viele Anwendungen mit unterschiedlicher Kritikalität auf derselben Ausführungsplattform bedient werden müssen, sind grundlegende Isolationsanforderungen zur Gewährleistung der Nichteinmischung kritischer Funktionen von entscheidender Bedeutung. Während On-Chip Netzwerke (NoCs) häufig als skalierbare Verbindung für die Multiprozessor-Architekturen eingesetzt werden, ist der damit verbundene Energieverbrauch immens gestiegen. Daher sind dynamische Plattformverwaltungen, im Gegensatz zu den statischen, zwingend notwendig, um ein System an die oben genannten Veränderungen anzupassen und gleichzeitig Timing zu gewährleisten. In dieser Arbeit entwickeln wir energieeffiziente NoCs für harte Echtzeitsysteme. Das Design basiert auf einem Energiekontrollnetzwerk, das auf einem bestehenden Switch-Arbitration-Netzwerk entwickelt wurde, um eine Isolierung zwischen Energieoptimierung und Datenübertragung zu ermöglichen. Die Energiesteuerungsschicht umfasst lokale Einheiten, die als Power-Aware NoC-Controllers bezeichnet werden und die die NoC-Energie in Abhängigkeit vom globalen Zustand und den zeitlichen Anforderungen der Anwendungen optimieren. Darüber hinaus wird das Konzept der NoC-Energiekontrolle zur Anpassung an Anomalien, die aufgrund von Abnutzung auftreten können, auf den gesamten Systemumfang ausgedehnt. Online- Ressourcenverwaltungen, die hierarchische Kontrollschichten zur Behandlung Abnutzung (drohender Kernausfälle) einsetzen, werden bereitgestellt. Bei Systemen mit gemischter Kritikalität erlaubt es flexible Grenzen zwischen sicherheitskritischen und unkritischen Subsystemen, um die Rekonfiguration sicher anzuwenden, wobei grundlegende Sicherheitsanforderungen erhalten bleiben und Timing Vorhersehbarkeit. Experimente werden auf der Basis von Simulationen und formalen Analysen zu verschiedenen realistischen Anwendungsfallen und Benchmarks durchgeführt, die signifikanten Verbesserungen bei On-Chip Netzwerke-Energieeinsparungen und bei der Behandlung von Abnutzung für Systeme mit gemischter Kritikalität zur Verbesserung die Systemstabilität gegenüber dem bisherigen Status quo zeigen

    Run-time Resource Management in CMPs Handling Multiple Aging Mechanisms

    Get PDF
    Abstract—Run-time resource management is fundamental for efficient execution of workloads on Chip Multiprocessors. Application- and system-level requirements (e.g. on performance vs. power vs. lifetime reliability) are generally conflicting each other, and any decision on resource assignment, such as core allocation or frequency tuning, may positively affect some of them while penalizing some others. Resource assignment decisions can be perceived in few instants of time on performance and power consumption, but not on lifetime reliability. In fact, this latter changes very slowly based on the accumulation of effects of various decisions over a long time horizon. Moreover, aging mechanisms are various and have different causes; most of them, such as Electromigration (EM), are subject to temperature levels, while Thermal Cycling (TC) is caused mainly by temperature variations (both amplitude and frequency). Mitigating only EM may negatively affect TC and vice versa. We propose a resource orchestration strategy to balance the performance and power consumption constraints in the short-term and EM and TC aging in the long-term. Experimental results show that the proposed approach improves the average Mean Time To Failure at least by 17% and 20% w.r.t. EM and TC, respectively, while providing same performance level of the nominal counterpart and guaranteeing the power budget

    Efficient and Scalable Computing for Resource-Constrained Cyber-Physical Systems: A Layered Approach

    Get PDF
    With the evolution of computing and communication technology, cyber-physical systems such as self-driving cars, unmanned aerial vehicles, and mobile cognitive robots are achieving increasing levels of multifunctionality and miniaturization, enabling them to execute versatile tasks in a resource-constrained environment. Therefore, the computing systems that power these resource-constrained cyber-physical systems (RCCPSs) have to achieve high efficiency and scalability. First of all, given a fixed amount of onboard energy, these computing systems should not only be power-efficient but also exhibit sufficiently high performance to gracefully handle complex algorithms for learning-based perception and AI-driven decision-making. Meanwhile, scalability requires that the current computing system and its components can be extended both horizontally, with more resources, and vertically, with emerging advanced technology. To achieve efficient and scalable computing systems in RCCPSs, my research broadly investigates a set of techniques and solutions via a bottom-up layered approach. This layered approach leverages the characteristics of each system layer (e.g., the circuit, architecture, and operating system layers) and their interactions to discover and explore the optimal system tradeoffs among performance, efficiency, and scalability. At the circuit layer, we investigate the benefits of novel power delivery and management schemes enabled by integrated voltage regulators (IVRs). Then, between the circuit and microarchitecture/architecture layers, we present a voltage-stacked power delivery system that offers best-in-class power delivery efficiency for many-core systems. After this, using Graphics Processing Units (GPUs) as a case study, we develop a real-time resource scheduling framework at the architecture and operating system layers for heterogeneous computing platforms with guaranteed task deadlines. Finally, fast dynamic voltage and frequency scaling (DVFS) based power management across the circuit, architecture, and operating system layers is studied through a learning-based hierarchical power management strategy for multi-/many-core systems

    Architecting Efficient Data Centers.

    Full text link
    Data center power consumption has become a key constraint in continuing to scale Internet services. As our society’s reliance on “the Cloud” continues to grow, companies require an ever-increasing amount of computational capacity to support their customers. Massive warehouse-scale data centers have emerged, requiring 30MW or more of total power capacity. Over the lifetime of a typical high-scale data center, power-related costs make up 50% of the total cost of ownership (TCO). Furthermore, the aggregate effect of data center power consumption across the country cannot be ignored. In total, data center energy usage has reached approximately 2% of aggregate consumption in the United States and continues to grow. This thesis addresses the need to increase computational efficiency to address this grow- ing problem. It proposes a new classes of power management techniques: coordinated full-system idle low-power modes to increase the energy proportionality of modern servers. First, we introduce the PowerNap server architecture, a coordinated full-system idle low- power mode which transitions in and out of an ultra-low power nap state to save power during brief idle periods. While effective for uniprocessor systems, PowerNap relies on full-system idleness and we show that such idleness disappears as the number of cores per processor continues to increase. We expose this problem in a case study of Google Web search in which we demonstrate that coordinated full-system active power modes are necessary to reach energy proportionality and that PowerNap is ineffective because of a lack of idleness. To recover full-system idleness, we introduce DreamWeaver, architectural support for deep sleep. DreamWeaver allows a server to exchange latency for full-system idleness, allowing PowerNap-enabled servers to be effective and provides a better latency- power savings tradeoff than existing approaches. Finally, this thesis investigates workloads which achieve efficiency through methodical cluster provisioning techniques. Using the popular memcached workload, this thesis provides examples of provisioning clusters for cost-efficiency given latency, throughput, and data set size targets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91499/1/meisner_1.pd

    CONTREX: Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties

    Get PDF
    The increasing processing power of today’s HW/SW platforms leads to the integration of more and more functions in a single device. Additional design challenges arise when these functions share computing resources and belong to different criticality levels. CONTREX complements current activities in the area of predictable computing platforms and segregation mechanisms with techniques to consider the extra-functional properties, i.e., timing constraints, power, and temperature. CONTREX enables energy efficient and cost aware design through analysis and optimization of these properties with regard to application demands at different criticality levels. This article presents an overview of the CONTREX European project, its main innovative technology (extension of a model based design approach, functional and extra-functional analysis with executable models and run-time management) and the final results of three industrial use-cases from different domain (avionics, automotive and telecommunication).The work leading to these results has received funding from the European Community’s Seventh Framework Programme FP7/2007-2011 under grant agreement no. 611146

    Gestión de recursos energéticamente eficiente para aplicaciones paralelas basadas en tareas en entornos multi-aplicación

    Get PDF
    Tesis de la Universidad Complutense de Madrid, Facultad de Informática, leída el 28/01/2021The end of Dennard scaling, as well as the arrival of the post-Moore era, has meant a big change in the way performance and energy efficiency are achieved by modern processors. From a constant increase of the clock frequency as the main method to increase performance at the beginning of the 2000s, the increase in the number of cores inside processors running at relatively conservative frequencies has stabilised as the current trend to increase both performance and energy efficiency. The increase of the heterogeneity in the systems, both inside the processors comprising different types of cores (e.g., big LITTLE architectures) or adding specific compute units (like multimedia extensions), as well as in the platform by the addition of other specific compute units (like GPUs), offering different performance and energy-efficiency trade-offs. Together with the increase in the number of cores, the processor evolution has been accompanied by the addition of different techologies that allow processors to adapt dynamically to the changes in the environment and running aplications. Among others, techiniques like dynamic voltage and frequiency scaling, power capping or cache partitioning are widely used nowadays to increase the performance and/or energy-efficiency...El fin del escalado de Dennard, así como la llegada de la era post-Moore ha supuesto una gran revolución en la forma de obtener el rendimiento y eficiencia energética en los procesadores modernos. Desde un incremento constante en la frecuencia relativamente moderadas se ha impuesto como la tendencia actual para incrementar tanto el rendimiento como la eficiencia energética. El aumento del número de núcleos dentro del procesado ha venido acompañado en los últimos años por el aumento de la heterogeneidad en la plataforma, tanto dentro del procesador incorporando distintos tipos de núcleos en el mismo procesador (e.g., la arquitectura big.LITTLE) como añadiendo unidades de cómputo específicas (e.g., extensiones multimedia), como la incorporación de otros elementos de computo específicos, ofreciendo diferentes grados de rendimiento y eficiencia energética. La evolución de los procesadores no solo ha venido dictada por el aumento del número de núcleos, sino que ha venido acompañada por la incorporación de diferentes técnicas permitiendo la adaptación de las arquitecturas de forma dinámica al entorno así como a las aplicaciones en ejecución. Entre otras, técnicas como el escalado de frecuencia, la limitación de consumo o el particionado de la memoria caché son ampliamente utilizadas en la actualidad como métodos para incrementar el consumo y/o la eficiencia energética...Fac. de InformáticaTRUEunpu

    Energy models in data parallel CPU/GPU computations

    Get PDF
    La tesi affronta il problema della modellazione dei consumi energetici in computazioni data parallel (map) in architetture con GPU. I modelli sviluppati sono utilizzati per valutare il compromesso tra performance e consumi. La tesi include risultati sperimentali che validano sia i modelli sviluppati che la metodologia di ricerca di un compromesso tra prestazioni e consumi
    corecore