5 research outputs found

    Runtime Management of Multiprocessor Systems for Fault Tolerance, Energy Efficiency and Load Balancing

    Get PDF
    Efficiency of modern multiprocessor systems is hurt by unpredictable events: aging causes permanent faults that disable components; application spawnings and terminations taking place at arbitrary times, affect energy proportionality, causing energy waste; load imbalances reduce resource utilization, penalizing performance. This thesis demonstrates how runtime management can mitigate the negative effects of unpredictable events, making decisions guided by a combination of static information known in advance and parameters that only become known at runtime. We propose techniques for three different objectives: graceful degradation of aging-prone systems; energy efficiency of heterogeneous adaptive systems; and load balancing by means of work stealing. Managing aging-prone systems for graceful efficiency degradation, is based on a high-level system description that encapsulates hardware reconfigurability and workload flexibility and allows to quantify system efficiency and use it as an objective function. Different custom heuristics, as well as simulated annealing and a genetic algorithm are proposed to optimize this objective function as a response to component failures. Custom heuristics are one to two orders of magnitude faster, provide better efficiency for the first 20% of system lifetime and are less than 13% worse than a genetic algorithm at the end of this lifetime. Custom heuristics occasionally fail to satisfy reconfiguration cost constraints. As all algorithms\u27 execution time scales well with respect to system size, a genetic algorithm can be used as backup in these cases. Managing heterogeneous multiprocessors capable of Dynamic Voltage and Frequency Scaling is based on a model that accurately predicts performance and power: performance is predicted by combining static, application-specific profiling information and dynamic, runtime performance monitoring data; power is predicted using the aforementioned performance estimations and a set of platform-specific, static parameters, determined only once and used for every application mix. Three runtime heuristics are proposed, that make use of this model to perform partial search of the configuration space, evaluating a small set of configurations and selecting the best one. When best-effort performance is adequate, the proposed approach achieves 3% higher energy efficiency compared to the powersave governor and 2x better compared to the interactive and ondemand governors. When individual applications\u27 performance requirements are considered, the proposed approach is able to satisfy them, giving away 18% of system\u27s energy efficiency compared to the powersave, which however misses the performance targets by 23%; at the same time, the proposed approach maintains an efficiency advantage of about 55% compared to the other governors, which also satisfy the requirements. Lastly, to improve load balancing of multiprocessors, a partial and approximate view of the current load distribution among system cores is proposed, which consists of lightweight data structures and is maintained by each core through cheap operations. A runtime algorithm is developed, using this view whenever a core becomes idle, to perform victim core selection for work stealing, also considering system topology and memory hierarchy. Among 12 diverse imbalanced workloads, the proposed approach achieves better performance than random, hierarchical and local stealing for six workloads. Furthermore, it is at most 8% slower among the other six workloads, while competing strategies incur a penalty of at least 89% on some workload

    System Abstractions for Scalable Application Development at the Edge

    Get PDF
    Recent years have witnessed an explosive growth of Internet of Things (IoT) devices, which collect or generate huge amounts of data. Given diverse device capabilities and application requirements, data processing takes place across a range of settings, from on-device to a nearby edge server/cloud and remote cloud. Consequently, edge-cloud coordination has been studied extensively from the perspectives of job placement, scheduling and joint optimization. Typical approaches focus on performance optimization for individual applications. This often requires domain knowledge of the applications, but also leads to application-specific solutions. Application development and deployment over diverse scenarios thus incur repetitive manual efforts. There are two overarching challenges to provide system-level support for application development at the edge. First, there is inherent heterogeneity at the device hardware level. The execution settings may range from a small cluster as an edge cloud to on-device inference on embedded devices, differing in hardware capability and programming environments. Further, application performance requirements vary significantly, making it even more difficult to map different applications to already heterogeneous hardware. Second, there are trends towards incorporating edge and cloud and multi-modal data. Together, these add further dimensions to the design space and increase the complexity significantly. In this thesis, we propose a novel framework to simplify application development and deployment over a continuum of edge to cloud. Our framework provides key connections between different dimensions of design considerations, corresponding to the application abstraction, data abstraction and resource management abstraction respectively. First, our framework masks hardware heterogeneity with abstract resource types through containerization, and abstracts away the application processing pipelines into generic flow graphs. Further, our framework further supports a notion of degradable computing for application scenarios at the edge that are driven by multimodal sensory input. Next, as video analytics is the killer app of edge computing, we include a generic data management service between video query systems and a video store to organize video data at the edge. We propose a video data unit abstraction based on a notion of distance between objects in the video, quantifying the semantic similarity among video data. Last, considering concurrent application execution, our framework supports multi-application offloading with device-centric control, with a userspace scheduler service that wraps over the operating system scheduler

    Towards a Graceful Degradable Multicore-System by Hierarchical Handling of Hard Errors

    No full text
    corecore