5 research outputs found

    RePP-C: runtime estimation of performance-power with workload consolidation in CMPs

    Get PDF
    Configuration of hardware knobs in multicore environments for meeting performance-power demands constitutes a desirable feature in modern data centers. At the same time, high energy efficiency (performance per watt) requires optimal thread-to-core assignment. In this paper, we present the runtime estimator (RePP-C) for performance-power, characterized by processor frequency states (P-states), a wide range of sleep intervals (Cl-states) and workload consolidation. We also present a schema for frequency and contention-aware thread-to-core assignment (FACTS) which considers various thread demands. The proposed solution (RePP-C) selects a given hardware configuration for each active core to ensure that the performance-power demands are satisfied while using the scheduling schema (FACTS) for mapping threads-to-cores. Our results show that FACTS improves over other state-of-the-art schedulers like Distributed Intensity Online (DIO) and native Linux scheduler by 8.25% and 37.56% in performance, with simultaneous improvement in energy efficiency by 6.2% and 14.17%, respectively. Moreover, we prove the usability of RePP-C by predicting performance and power for 7 different types of workloads and 10 different QoS targets. The results show an average error of 7.55% and 8.96% (with 95% confidence interval) when predicting energy and performance respectively.This work has been partially supported by the European Union FP7 program through the Mont-Blanc-2 project (FP7-ICT-610402), by the Ministerio de Economia y Competitividad under contract Computacion de Altas Prestaciones VII (TIN2015-65316-P), and the Departament d’Innovacio, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programacio i Entorns d’Execucio Paral.lels (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Enabling fair pricing on HPC systems with node sharing

    Get PDF
    Abstract not provide

    Handling Information and its Propagation to Engineer Complex Embedded Systems

    Get PDF
    Avec l’intérêt que la technologie d’aujourd’hui a sur les données, il est facile de supposer que l’information est au bout des doigts, prêt à être exploité. Les méthodologies et outils de recherche sont souvent construits sur cette hypothèse. Cependant, cette illusion d’abondance se brise souvent lorsqu’on tente de transférer des techniques existantes à des applications industrielles. Par exemple, la recherche a produit divers méthodologies permettant d’optimiser l’utilisation des ressources de grands systèmes complexes, tels que les avioniques de l’Airbus A380. Ces approches nécessitent la connaissance de certaines mesures telles que les temps d’exécution, la consommation de mémoire, critères de communication, etc. La conception de ces systèmes complexes a toutefois employé une combinaison de compétences de différents domaines (probablement avec des connaissances en génie logiciel) qui font que les données caractéristiques au système sont incomplètes ou manquantes. De plus, l’absence d’informations pertinentes rend difficile de décrire correctement le système, de prédire son comportement, et améliorer ses performances. Nous faisons recours au modèles probabilistes et des techniques d’apprentissage automatique pour remédier à ce manque d’informations pertinentes. La théorie des probabilités, en particulier, a un grand potentiel pour décrire les systèmes partiellement observables. Notre objectif est de fournir des approches et des solutions pour produire des informations pertinentes. Cela permet une description appropriée des systèmes complexes pour faciliter l’intégration, et permet l’utilisation des techniques d’optimisation existantes. Notre première étape consiste à résoudre l’une des difficultés rencontrées lors de l’intégration de système : assurer le bon comportement temporelle des composants critiques des systèmes. En raison de la mise à l’échelle de la technologie et de la dépendance croissante à l’égard des architectures à multi-coeurs, la surcharge de logiciels fonctionnant sur différents coeurs et le partage d’espace mémoire n’est plus négligeable. Pour tel, nous étendons la boîte à outils des système temps réel avec une analyse temporelle probabiliste statique qui estime avec précision l’exécution d’un logiciel avec des considerations pour les conflits de mémoire partagée. Le modèle est ensuite intégré dans un simulateur pour l’ordonnancement de systèmes temps réel multiprocesseurs. ----------ABSTRACT: In today’s data-driven technology, it is easy to assume that information is at the tip of our fingers, ready to be exploited. Research methodologies and tools are often built on top of this assumption. However, this illusion of abundance often breaks when attempting to transfer existing techniques to industrial applications. For instance, research produced various methodologies to optimize the resource usage of large complex systems, such as the avionics of the Airbus A380. These approaches require the knowledge of certain metrics such as the execution time, memory consumption, communication delays, etc. The design of these complex systems, however, employs a mix of expertise from different fields (likely with limited knowledge in software engineering) which might lead to incomplete or missing specifications. Moreover, the unavailability of relevant information makes it difficult to properly describe the system, predict its behavior, and improve its performance. We fall back on probabilistic models and machine learning techniques to address this lack of relevant information. Probability theory, especially, has great potential to describe partiallyobservable systems. Our objective is to provide approaches and solutions to produce relevant information. This enables a proper description of complex systems to ease integration, and allows the use of existing optimization techniques. Our first step is to tackle one of the difficulties encountered during system integration: ensuring the proper timing behavior of critical systems. Due to technology scaling, and with the growing reliance on multi-core architectures, the overhead of software running on different cores and sharing memory space is no longer negligible. For such, we extend the real-time system tool-kit with a static probabilistic timing analysis technique that accurately estimates the execution of software with an awareness of shared memory contention. The model is then incorporated into a simulator for scheduling multi-processor real-time systems

    Predicting Application Performance for Chip Multiprocessors

    Get PDF
    Today's computers have processors with multiple cores that allow several applications to execute simultaneously. The way resources are allocated to an application affects whether performance objectives, such as quality of service (QoS), are satisfied. To ensure objectives are met, resources must be carefully but quickly allocated in response to changing runtime conditions. Traditional approaches to resource allocation take place either purely online or offline. Online methods do not scale to large, multiple core systems because there are too many allocations to evaluate at runtime. Offline methods cannot handle unanticipated workloads or changes. A hybrid approach could combine the lower runtime overhead of offline approaches with the flexibility of online approaches. This thesis introduces AUTO, a hybrid solution to perform resource allocation. AUTO dynamically adjusts thread count, core count, and core type. It does so in accordance with a user-provided policy to meet performance objectives. AUTO's capabilities come from four prediction techniques. The first technique builds and uses models that consider CPU contention and application scalability in order to select co-running applications' thread counts. The second technique predicts applications' preferred thread-to-core mappings. The predictions are thread count independent and are translated into concrete thread-to-core mappings based on resource availability. The third technique predicts application performance under thread-to-core mappings. The final technique selects thread count and core count for applications on a system with cores of different capabilities. AUTO was tested in several scenarios. In each scenario, it was shown to be an effective, efficient solution to resource allocation. First, it was used to select the thread count of one or more co-running applications. Second, it was used to select application thread-to-core mappings. Third, it was used to make predictions about application performance under thread-to-core mappings. Finally, it was used to select both thread count and core type for applications on a computer with cores of different capabilities. AUTO's resource allocation and models allow for more effective and more efficient policies. By using hybrid online and offline techniques, AUTO solves the problem of allocating threads and cores to meet performance objectives
    corecore