4 research outputs found

    Studying the impact of multicore processor scaling on directory techniques via reuse distance analysis

    Full text link
    Abstract—Researchers have proposed numerous directory techniques to address multicore scalability whose behavior de-pends on the CPU’s particular configuration, e.g. core count and cache size. As CPUs continue to scale, it is essential to explore the directory’s architecture dependences. However, this is challenging using detailed simulation given the large number of CPU configurations that are possible. This paper proposes to use multicore reuse distance analysis to study coherence directories. We develop a framework to extract the directory access stream from parallel LRU stacks, enabling rapid analysis of the directory’s accesses and contents across both core count and cache size scaling. We also implement our framework in a profiler, and apply it to gain insights into multicore scaling’s impact on the directory. Our profiling results show that directory accesses reduce by 3.5x across data cache size scaling, suggesting techniques that tradeoff access latency for reduced capacity or conflicts become increasingly effective as cache size scales. We also show the portion of on-chip memory devoted to the directory cache can be reduced by 53.3 % across data cache size scaling, thus lowering the over-provisioning needed at large cache sizes. Finally, we validate our RD-based directory analyses, and find they are within 13% of cache simulations in terms of access count, on average. I

    Modeling Data Center Co-Tenancy Performance Interference

    Get PDF
    A multi-core machine allows executing several applications simultaneously. Those jobs are scheduled on different cores and compete for shared resources such as the last level cache and memory bandwidth. Such competitions might cause performance degradation. Data centers often utilize virtualization to provide a certain level of performance isolation. However, some of the shared resources cannot be divided, even in a virtualized system, to ensure complete isolation. If the performance degradation of co-tenancy is not known to the cloud administrator, a data center often has to dedicate a whole machine for a latency-sensitive application to guarantee its quality of service. Co-run scheduling attempts to make good utilization of resources by scheduling compatible jobs into one machine while maintaining their service level agreements. An ideal co-run scheduling scheme requires accurate contention modeling. Recent studies for co-run modeling and scheduling have made steady progress to predict performance for two co-run applications sharing a specific system. This thesis advances co-tenancy modeling in three aspects. First, with an accurate co-run modeling for one system at hand, we propose a regression model to transfer the knowledge and create a model for a new system with different hardware configuration. Second, by examining those programs that yield high prediction errors, we further leverage clustering techniques to create a model for each group of applications that show similar behavior. Clustering helps improve the prediction accuracy of those pathological cases. Third, existing research is typically focused on modeling two application co-run cases. We extend a two-core model to a three- and four-core model by introducing a light-weight micro-kernel that emulates a complicated benchmark through program instrumentation. Our experimental evaluation shows that our cross-architecture model achieves an average prediction error less than 2% for pairwise co-runs across the SPECCPU2006 benchmark suite. For more than two application co-tenancy modeling, we show that our model is more scalable and can achieve an average prediction error of 2-3%

    Handling Information and its Propagation to Engineer Complex Embedded Systems

    Get PDF
    Avec l’intérêt que la technologie d’aujourd’hui a sur les données, il est facile de supposer que l’information est au bout des doigts, prêt à être exploité. Les méthodologies et outils de recherche sont souvent construits sur cette hypothèse. Cependant, cette illusion d’abondance se brise souvent lorsqu’on tente de transférer des techniques existantes à des applications industrielles. Par exemple, la recherche a produit divers méthodologies permettant d’optimiser l’utilisation des ressources de grands systèmes complexes, tels que les avioniques de l’Airbus A380. Ces approches nécessitent la connaissance de certaines mesures telles que les temps d’exécution, la consommation de mémoire, critères de communication, etc. La conception de ces systèmes complexes a toutefois employé une combinaison de compétences de différents domaines (probablement avec des connaissances en génie logiciel) qui font que les données caractéristiques au système sont incomplètes ou manquantes. De plus, l’absence d’informations pertinentes rend difficile de décrire correctement le système, de prédire son comportement, et améliorer ses performances. Nous faisons recours au modèles probabilistes et des techniques d’apprentissage automatique pour remédier à ce manque d’informations pertinentes. La théorie des probabilités, en particulier, a un grand potentiel pour décrire les systèmes partiellement observables. Notre objectif est de fournir des approches et des solutions pour produire des informations pertinentes. Cela permet une description appropriée des systèmes complexes pour faciliter l’intégration, et permet l’utilisation des techniques d’optimisation existantes. Notre première étape consiste à résoudre l’une des difficultés rencontrées lors de l’intégration de système : assurer le bon comportement temporelle des composants critiques des systèmes. En raison de la mise à l’échelle de la technologie et de la dépendance croissante à l’égard des architectures à multi-coeurs, la surcharge de logiciels fonctionnant sur différents coeurs et le partage d’espace mémoire n’est plus négligeable. Pour tel, nous étendons la boîte à outils des système temps réel avec une analyse temporelle probabiliste statique qui estime avec précision l’exécution d’un logiciel avec des considerations pour les conflits de mémoire partagée. Le modèle est ensuite intégré dans un simulateur pour l’ordonnancement de systèmes temps réel multiprocesseurs. ----------ABSTRACT: In today’s data-driven technology, it is easy to assume that information is at the tip of our fingers, ready to be exploited. Research methodologies and tools are often built on top of this assumption. However, this illusion of abundance often breaks when attempting to transfer existing techniques to industrial applications. For instance, research produced various methodologies to optimize the resource usage of large complex systems, such as the avionics of the Airbus A380. These approaches require the knowledge of certain metrics such as the execution time, memory consumption, communication delays, etc. The design of these complex systems, however, employs a mix of expertise from different fields (likely with limited knowledge in software engineering) which might lead to incomplete or missing specifications. Moreover, the unavailability of relevant information makes it difficult to properly describe the system, predict its behavior, and improve its performance. We fall back on probabilistic models and machine learning techniques to address this lack of relevant information. Probability theory, especially, has great potential to describe partiallyobservable systems. Our objective is to provide approaches and solutions to produce relevant information. This enables a proper description of complex systems to ease integration, and allows the use of existing optimization techniques. Our first step is to tackle one of the difficulties encountered during system integration: ensuring the proper timing behavior of critical systems. Due to technology scaling, and with the growing reliance on multi-core architectures, the overhead of software running on different cores and sharing memory space is no longer negligible. For such, we extend the real-time system tool-kit with a static probabilistic timing analysis technique that accurately estimates the execution of software with an awareness of shared memory contention. The model is then incorporated into a simulator for scheduling multi-processor real-time systems
    corecore