392 research outputs found

    QoS for high-performance SMT processors in embedded systems

    Get PDF
    Although simultaneous multithreading processors provide a good cost-performance tradeoff, they exhibit unpredictable performance in real-time applications. We present a resource management scheme that eliminates a major cause of performance unpredictability in SMTs, making them suitable for many types of embedded systems.Peer ReviewedPostprint (published version

    Enabling SMT for real-time embedded systems

    Get PDF
    In order to deal with real time constraints, current embedded processors are usually simple in-order processors with no speculation capabilities to ensure that execution times of applications are predictable. However, embedded systems require ever more compute power and the trend is that they will become as complex as current high performance systems. SMTs are viable candidates for future high performance embedded processors, because of their good cost/performance trade-off. However, current SMTs exhibit unpredictable performance. Hence, the SMT hardware needs to be adapted in order to meet real time constraints. This paper is a first step toward the use of high performance SMT processors in future real time systems. We present a novel collaboration between OS and SMT processors that entails that the OS exercises control over how resources are shared inside the processor. We illustrate this collaboration by a mechanism in which the OS cooperates with the SMT hardware to guarantee that a given thread runs at a specific speed, enabling SMT for real-time systems.Peer ReviewedPostprint (published version

    ADAPTIVE POWER MANAGEMENT FOR COMPUTERS AND MOBILE DEVICES

    Get PDF
    Power consumption has become a major concern in the design of computing systems today. High power consumption increases cooling cost, degrades the system reliability and also reduces the battery life in portable devices. Modern computing/communication devices support multiple power modes which enable power and performance tradeoff. Dynamic power management (DPM), dynamic voltage and frequency scaling (DVFS), and dynamic task migration for workload consolidation are system level power reduction techniques widely used during runtime. In the first part of the dissertation, we concentrate on the dynamic power management of the personal computer and server platform where the DPM, DVFS and task migrations techniques are proved to be highly effective. A hierarchical energy management framework is assumed, where task migration is applied at the upper level to improve server utilization and energy efficiency, and DPM/DVFS is applied at the lower level to manage the power mode of individual processor. This work focuses on estimating the performance impact of workload consolidation and searching for optimal DPM/DVFS that adapts to the changing workload. Machine learning based modeling and reinforcement learning based policy optimization techniques are investigated. Mobile computing has been weaved into everyday lives to a great extend in recent years. Compared to traditional personal computer and server environment, the mobile computing environment is obviously more context-rich and the usage of mobile computing device is clearly imprinted with user\u27s personal signature. The ability to learn such signature enables immense potential in workload prediction and energy or battery life management. In the second part of the dissertation, we present two mobile device power management techniques which take advantage of the context-rich characteristics of mobile platform and make adaptive energy management decisions based on different user behavior. We firstly investigate the user battery usage behavior modeling and apply the model directly for battery energy management. The first technique aims at maximizing the quality of service (QoS) while keeping the risk of battery depletion below a given threshold. The second technique is an user-aware streaming strategies for energy efficient smartphone video playback applications (e.g. YouTube) that minimizes the sleep and wake penalty of cellular module and at the same time avoid the energy waste from excessive downloading. Runtime power and thermal management has attracted substantial interests in multi-core distributed embedded systems. Fast performance evaluation is an essential step in the research of distributed power and thermal management. In last part of the dissertation, we present an FPGA based emulator of multi-core distributed embedded system designed to support the research in runtime power/thermal management. Hardware and software supports are provided to carry out basic power/thermal management actions including inter-core or inter-FPGA communications, runtime temperature monitoring and dynamic frequency scaling

    The next convergence: High-performance and mission-critical markets

    Get PDF
    The well-known convergence of the high-performance computing and the mobile markets has been a dominating factor in the computing market during the last two decades. In this paper we witness a new type of convergence between the mission-critical market (such as avionic or automotive) and the mainstream consumer electronics market. Such convergence is fuelled by the common needs of both markets for more reliability, support for mission-critical functionalities and the challenge of harnessing the unsustainable increases in safety margins to guarantee either correctness or timing. In this position paper, we present a description of this new convergence, as well as the main challenges and opportunities that it brings to computing industry.Peer ReviewedPostprint (published version

    Architecting Data Centers for High Efficiency and Low Latency

    Full text link
    Modern data centers, housing remarkably powerful computational capacity, are built in massive scales and consume a huge amount of energy. The energy consumption of data centers has mushroomed from virtually nothing to about three percent of the global electricity supply in the last decade, and will continuously grow. Unfortunately, a significant fraction of this energy consumption is wasted due to the inefficiency of current data center architectures, and one of the key reasons behind this inefficiency is the stringent response latency requirements of the user-facing services hosted in these data centers such as web search and social networks. To deliver such low response latency, data center operators often have to overprovision resources to handle high peaks in user load and unexpected load spikes, resulting in low efficiency. This dissertation investigates data center architecture designs that reconcile high system efficiency and low response latency. To increase the efficiency, we propose techniques that understand both microarchitectural-level resource sharing and system-level resource usage dynamics to enable highly efficient co-locations of latency-critical services and low-priority batch workloads. We investigate the resource sharing on real-system simultaneous multithreading (SMT) processors to enable SMT co-locations by precisely predicting the performance interference. We then leverage historical resource usage patterns to further optimize the task scheduling algorithm and data placement policy to improve the efficiency of workload co-locations. Moreover, we introduce methodologies to better manage the response latency by automatically attributing the source of tail latency to low-level architectural and system configurations in both offline load testing environment and online production environment. We design and develop a response latency evaluation framework at microsecond-level precision for data center applications, with which we construct statistical inference procedures to attribute the source of tail latency. Finally, we present an approach that proactively enacts carefully designed causal inference micro-experiments to diagnose the root causes of response latency anomalies, and automatically correct them to reduce the response latency.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144144/1/yunqi_1.pd

    A Workload Generator for Evaluating SMT Real-Time Systems

    Full text link
    [EN] Real-time tasks have experience a significant complexity increase in the last years. We can find examples of real-time tasks in nowadays systems that control self-driving cars or multimedia systems, among others. To cope with the high performance requirements of such systems, real-time systems are moving from simple in-order processor to complex out-of-order multicore processors. Furthermore, we expect real-time systems to use simultaneous multithreading (SMT) processors in a near future since these architectures address two key design concerns of embedded systems, that is, they provide higher performance and power efficiency than single-threaded multicores. The main drawback that multicores and SMT architectures present from a real-time perspective is that they implement shared resources. Single-threaded multicores usually share the main memory and the LLC, and SMT processor share additionally most of the microarchitectural core resources. Processes running concurrently can interfere in the shared resources, which increases the performance variability and predictability of these systems. We expect an increasing effort in the next years to mitigate these drawbacks and implement real-time systems with multicore SMT processors. Workload generation is a tedious and time-consuming task in the real-time research field because the workloads dispose of many parameters that should be correctly adjusted to provide flexible and representative workloads. Typically used workload generators, however, fail when designing workloads for theses architectures because they are not aware of the architectural characteristics of SMT systems. In this paper we present the task class-based (TCB) workload generator aimed at providing workloads to evaluate real-time systems with SMT multicore processors in an ease and automatized way.Furió Novejarque, C.; Feliu-Pérez, J.; Petit Martí, SV.; Duro-Gómez, J.; Sahuquillo Borrás, J. (2018). A Workload Generator for Evaluating SMT Real-Time Systems. IEEE Computer Society. 367-374. doi:10.1109/HPCS.2018.00067S36737

    A Hardware Approach to Fairly Balance the Inter-Thread Interference in Shared Caches

    Full text link
    [EN] Shared caches have become the common design choice in the vast majority of modern multi-core and many-core processors, since cache sharing improves throughput for a given silicon area. Sharing the cache, however, has a downside: the requests from multiple applications compete among them for cache resources, so the execution time of each application increases over isolated execution. The degree in which the performance of each application is affected by the interference becomes unpredictable yielding the system to unfairness situations. This paper proposes Fair-Progress Cache Partitioning (FPCP), a low-overhead hardware-based cache partitioning approach that addresses system fairness. FPCP reduces the interference by allocating to each application a cache partition and adjusting the partition sizes at runtime. To adjust partitions, our approach estimates during multicore execution the time each application would have taken in isolation, which is challenging. The proposed approach has two main differences over existing approaches. First, FPCP distributes cache ways incrementally, which makes the proposal less prone to estimation errors. Second, the proposed algorithm is much less costly than the state-of-the-art ASM-Cache approach. Experimental results show that, compared to ASM-Cache, FPCP reduces unfairness by 48 percent in four-application workloads and by 28 percent in eight-application workloads, without harming the performance.This work was supported in part by the Spanish Ministerio de Economia y Competitividad (MINECO) and Plan E funds, under grants TIN2014-62246-EXP and TIN2015-66972-C5-1-R.Selfa-Oliver, V.; Sahuquillo Borrás, J.; Petit Martí, SV.; Gómez Requena, ME. (2017). A Hardware Approach to Fairly Balance the Inter-Thread Interference in Shared Caches. IEEE Transactions on Parallel and Distributed Systems. 28(11):3021-3032. https://doi.org/10.1109/TPDS.2017.2713778S30213032281

    Flexible processing architecture for maintaining QoS in embedded systems applications

    Get PDF
    Comunicación presentada en las V Jornadas de Computación Empotrada, Valladolid, 17-19 Septiembre 2014The growing available capacity on a single chip is leading to increasingly sophisticated applications in the field of embedded systems. In addition, the cloud computing paradigm, allows the extension of the capabilities of these systems using remote resources. Among the wide range of applications that can arise in this context, are those in which it is critical to meet certain quality of service (QoS) requirements, such as limited latency. In these cases, real-time operating systems (RTOS) provide a valid solution to guarantee predictability and response time using the resources of the embedded system. However, in applications where the elements to process can grow and decrease in a variable way, the load can exceed the capabilities of the embedded system, which is an important limitation. In this paper, a new architecture is proposed, aiming to take the most of remotely available resources only when the load temporarily exceeds the capabilities of the embedded system. The access to the remote resources is done by using cloud platforms maintaining an acceptable level of QoS for the application
    corecore