3,709 research outputs found

    Run-time Spatial Mapping of Streaming Applications to Heterogeneous Multi-Processor Systems

    Get PDF
    In this paper, we define the problem of spatial mapping. We present reasons why performing spatial mappings at run-time is both necessary and desirable. We propose what is—to our knowledge—the first attempt at a formal description of spatial mappings for the embedded real-time streaming application domain. Thereby, we introduce criteria for a qualitative comparison of these spatial mappings. As an illustration of how our formalization relates to practice, we relate our own spatial mapping algorithm to the formal model

    Software Challenges For HL-LHC Data Analysis

    Full text link
    The high energy physics community is discussing where investment is needed to prepare software for the HL-LHC and its unprecedented challenges. The ROOT project is one of the central software players in high energy physics since decades. From its experience and expectations, the ROOT team has distilled a comprehensive set of areas that should see research and development in the context of data analysis software, for making best use of HL-LHC's physics potential. This work shows what these areas could be, why the ROOT team believes investing in them is needed, which gains are expected, and where related work is ongoing. It can serve as an indication for future research proposals and cooperations

    Energy-efficient and thermal-aware resource management for heterogeneous datacenters

    Get PDF
    International audienceWe propose in this paper to study the energy-, thermal- and performance-aware resource management in heterogeneous datacenters. Witnessing the continuous development of heterogeneity in datacenters, we are confronted with their different behaviors in terms of performance, power consumption and thermal dissipation: indeed, heterogeneity at server level lies both in the computing infrastructure (computing power, electrical power consumption) and in the heat removal systems (different enclosure, fans, thermal sinks). Also the physical locations of the servers become important with heterogeneity since some servers can (over)heat others. While many studies address independently these parameters (most of the time performance and power or energy), we show in this paper the necessity to tackle all these aspects for an optimal resource management of the computing resources. This leads to improved energy usage in a heterogeneous datacenter including the cooling of the computer rooms. We build our approach on the concept of heat distribution matrix to handle the mutual influence of the servers, in heterogeneous environments, which is novel in this context. We propose a heuristic to solve the server placement problem and we design a generic greedy framework for the online scheduling problem. We derive several single-objective heuristics (for performance, energy, cooling) and a novel fuzzy-based priority mechanism to handle their tradeoffs. Finally, we show results using extensive simulations fed with actual measurements on heterogeneous servers

    Exploiting asymmetric multi-core systems with flexible system software

    Get PDF
    Asymmetric multi-cores (AMCs) are a successful architectural solution for both mobile devices and supercomputers. These architectures combine different types of processing cores designed at different performance and power optimization points, thus exposing a performance-power trade-off. By maintaining two types of cores, AMCs are able to provide high performance under the facility power budget. However, there are significant challenges when using AMCs such as scheduling and load balancing. This thesis initially explores the potential of AMCs when executing current HPC applications and searches for the most appropriate execution model. Specifically we evaluate several execution models on an Arm big.LITTLE AMC using the PARSEC benchmark suite that includes representative HPC applications. We compare schedulers at the user, OS and runtime system levels, using both static and dynamic options and multiple configurations, and assess the impact of these options on the well-known problem of balancing the load across AMCs. Our results demonstrate that scheduling is more effective when it takes place in the runtime system as it improves the user-level scheduling by 23%, while the heterogeneous-aware OS scheduling solution improves the user-level scheduling by 10%. Following this outcome, this thesis focuses on increasing performance of AMC systems by improving scheduling in the runtime system level. Scheduling in the runtime system level is provided by the use of task-based parallel programming models. These programming models offer programming flexibility as they consist of an interface and a runtime system to manage the underlying resources and threads. In this thesis we improve scheduling with task-based programming models by providing three novel task schedulers for AMCs. These dynamic scheduling policies reduce total execution time either by detecting the longest or the critical path of the dynamic task dependency graph of the application. They use dynamic scheduling and information discoverable during execution, fact that makes them implementable and functional without the need of off-line profiling. In our evaluation we compare these scheduling approaches with an existing state-of the art heterogeneous scheduler and we track their improvement over a FIFO baseline scheduler. We show that the heterogeneous schedulers improve the baseline by up to 1.45x on a real 8-core AMC and up to 2.1x on a simulated 32-core AMC. Another enhancement we provide in task-based programming models is the adaptability to fine grained parallelism. The increasing number of cores on modern CMPs is pushing research towards the use of fine grained workloads, which is an important challenge for task-based programming models. Our study makes the observation that task creation becomes a bottleneck when executing fine grained workloads with task-based programming models. As the number of cores increases, the time spent generating tasks is becoming more critical to the entire execution. To overcome this issue, we propose TaskGenX. TaskGenX minimizes task creation overheads and relies both on the runtime system and a dedicated hardware. On the runtime system side, TaskGenX decouples the task creation from the other runtime activities. It then transfers this part of the runtime to a specialized hardware. From our evaluation using 11 HPC workloads on both symmetric and AMC systems, we obtain performance improvements up to 15x, averaging to 3.1x over the baseline. Finally, this thesis presents a showcase for a real-time CPU scheduler with the goal to increase the frames per second (FPS) of the game-play on mobile devices with AMC systems. We design and implement the RTS scheduler in the Android framework. RTS provides an efficient scheduling policy that takes into account the current temperature of the system to perform task migration. RTS solution increases the median FPS of the baseline mechanisms by up to 7.5% and at the same time it maintains temperature stable.Los procesadores multinúcleos asimétricos (AMC) son una solución arquitectónica exitosa para dispositivos móviles y supercomputadores. Estas arquitecturas combinan diferentes tipos de núcleos de procesamiento diseñados con diferentes propiedades de rendimiento y potencia. Al mantener dos o más tipos de núcleos, los AMCs pueden proporcionar un alto rendimiento con un consumo bajo de energía de las infraestructuras. Sin embargo, existen importantes desafíos al usar los AMC, como la programación y el equilibrio de carga. Esta tesis explora inicialmente el potencial de los AMC al ejecutar aplicaciones actuales de Computacion de Alto Rendimiento (HPC) y busca el modelo de ejecución más apropiado para ellas. Específicamente evaluamos varios modelos de ejecución en un procesador asimétrico Arm big.LITTLE utilizando las aplicaciones PARSEC que son aplicaciones representativas de HPC. En este trabajo se compara la programación en los niveles de usuario, sistema operativo y librería y evaluamos el impacto de estas opciones en el conocido problema de equilibrar la carga entre los AMCs. Nuestros resultados demuestran que la programación es más efectiva cuando se lleva a cabo en el nivel del runtime, ya que mejora la programación del nivel de usuario en un 23%, mientras que la solución de programación del sistema operativo heterogéneo mejora la programación del nivel de usuario en un 10%. Siguiendo este resultado, esta tesis se centra en aumentar el rendimiento de los sistemas AMC mejorando la programación al nivel de librería. La programación en este nivel se proporciona mediante el uso de Modelos de Programación Paralelos Basados en Tareas (MPBT). Estos modelos de programación ofrecen flexibilidad de programación, ya que consisten en una interfaz y un runtime para administrar los recursos e hilos subyacentes. En esta tesis, mejoramos la programación con MPBT al proporcionar tres nuevos planificadores de tareas para AMCs. Estos planificadores dinámicos reducen el tiempo total de ejecución ya sea detectando la camino más largo o el camino crítico del grafo de dependencia de tareas de la aplicación, que es generado dinámicamente. En nuestra evaluación, comparamos estos planificadores con un planificador heterogéneo existente y demonstramos su mejora sobre un planificador FIFO. Mostramos que los planificadores heterogéneos mejoran el planificador FIFO en hasta 1.45x en un AMC real de 8 núcleos y hasta 2.1x en un AMC simulado de 32 núcleos. Otra contribución en los MPBT es la adaptabilidad al paralelismo de grano fino. El creciente número de núcleos en los chip multinúcleos modernos está empujando la investigación hacia el uso de cargas de trabajo de grano fino, que es un desafío importante para los MPBT. Nuestro estudio observa que la creación de tareas bloquea la ejecución con cargas de trabajo de grano fino con MPBT. Cuando el número de núcleos aumenta, el tiempo empleado en generar tareas pasa a ser más crítico para toda la ejecución. Nuestra solución es TaskGenX, que minimiza los costes de creación de tareas y se basa en una extensión del runtime y en un hardware dedicado. En el runtime, TaskGenX desacopla la creación de tareas de las otras actividades del runtime, ejecutando esta actividad en un hardware especializado. Evaluamos 11 aplicaciones de HPC con TaskGenX en sistemas simétricos y AMC y obtenemos mejoras de rendimiento de hasta 15x, con un promedio de 3.1x sobre la implementación de referencia. Finalmente, esta tesis presenta un planificador de CPU con el objetivo de aumentar los fotogramas por segundo (FPS) para juegos en dispositivos móviles con sistemas AMC. Diseñamos e implementamos el planificador de Real-Time Scheduler (RTS) en Android. El RTS proporciona una política de programación eficiente que tiene en cuenta la temperatura actual del sistema para realizar la migración de tareas. La solución RTS aumenta la FPS mediana de los mecanismos de referenci

    Exploiting asymmetric multi-core systems with flexible system software

    Get PDF
    Asymmetric multi-cores (AMCs) are a successful architectural solution for both mobile devices and supercomputers. These architectures combine different types of processing cores designed at different performance and power optimization points, thus exposing a performance-power trade-off. By maintaining two types of cores, AMCs are able to provide high performance under the facility power budget. However, there are significant challenges when using AMCs such as scheduling and load balancing. This thesis initially explores the potential of AMCs when executing current HPC applications and searches for the most appropriate execution model. Specifically we evaluate several execution models on an Arm big.LITTLE AMC using the PARSEC benchmark suite that includes representative HPC applications. We compare schedulers at the user, OS and runtime system levels, using both static and dynamic options and multiple configurations, and assess the impact of these options on the well-known problem of balancing the load across AMCs. Our results demonstrate that scheduling is more effective when it takes place in the runtime system as it improves the user-level scheduling by 23%, while the heterogeneous-aware OS scheduling solution improves the user-level scheduling by 10%. Following this outcome, this thesis focuses on increasing performance of AMC systems by improving scheduling in the runtime system level. Scheduling in the runtime system level is provided by the use of task-based parallel programming models. These programming models offer programming flexibility as they consist of an interface and a runtime system to manage the underlying resources and threads. In this thesis we improve scheduling with task-based programming models by providing three novel task schedulers for AMCs. These dynamic scheduling policies reduce total execution time either by detecting the longest or the critical path of the dynamic task dependency graph of the application. They use dynamic scheduling and information discoverable during execution, fact that makes them implementable and functional without the need of off-line profiling. In our evaluation we compare these scheduling approaches with an existing state-of the art heterogeneous scheduler and we track their improvement over a FIFO baseline scheduler. We show that the heterogeneous schedulers improve the baseline by up to 1.45x on a real 8-core AMC and up to 2.1x on a simulated 32-core AMC. Another enhancement we provide in task-based programming models is the adaptability to fine grained parallelism. The increasing number of cores on modern CMPs is pushing research towards the use of fine grained workloads, which is an important challenge for task-based programming models. Our study makes the observation that task creation becomes a bottleneck when executing fine grained workloads with task-based programming models. As the number of cores increases, the time spent generating tasks is becoming more critical to the entire execution. To overcome this issue, we propose TaskGenX. TaskGenX minimizes task creation overheads and relies both on the runtime system and a dedicated hardware. On the runtime system side, TaskGenX decouples the task creation from the other runtime activities. It then transfers this part of the runtime to a specialized hardware. From our evaluation using 11 HPC workloads on both symmetric and AMC systems, we obtain performance improvements up to 15x, averaging to 3.1x over the baseline. Finally, this thesis presents a showcase for a real-time CPU scheduler with the goal to increase the frames per second (FPS) of the game-play on mobile devices with AMC systems. We design and implement the RTS scheduler in the Android framework. RTS provides an efficient scheduling policy that takes into account the current temperature of the system to perform task migration. RTS solution increases the median FPS of the baseline mechanisms by up to 7.5% and at the same time it maintains temperature stable.Los procesadores multinúcleos asimétricos (AMC) son una solución arquitectónica exitosa para dispositivos móviles y supercomputadores. Estas arquitecturas combinan diferentes tipos de núcleos de procesamiento diseñados con diferentes propiedades de rendimiento y potencia. Al mantener dos o más tipos de núcleos, los AMCs pueden proporcionar un alto rendimiento con un consumo bajo de energía de las infraestructuras. Sin embargo, existen importantes desafíos al usar los AMC, como la programación y el equilibrio de carga. Esta tesis explora inicialmente el potencial de los AMC al ejecutar aplicaciones actuales de Computacion de Alto Rendimiento (HPC) y busca el modelo de ejecución más apropiado para ellas. Específicamente evaluamos varios modelos de ejecución en un procesador asimétrico Arm big.LITTLE utilizando las aplicaciones PARSEC que son aplicaciones representativas de HPC. En este trabajo se compara la programación en los niveles de usuario, sistema operativo y librería y evaluamos el impacto de estas opciones en el conocido problema de equilibrar la carga entre los AMCs. Nuestros resultados demuestran que la programación es más efectiva cuando se lleva a cabo en el nivel del runtime, ya que mejora la programación del nivel de usuario en un 23%, mientras que la solución de programación del sistema operativo heterogéneo mejora la programación del nivel de usuario en un 10%. Siguiendo este resultado, esta tesis se centra en aumentar el rendimiento de los sistemas AMC mejorando la programación al nivel de librería. La programación en este nivel se proporciona mediante el uso de Modelos de Programación Paralelos Basados en Tareas (MPBT). Estos modelos de programación ofrecen flexibilidad de programación, ya que consisten en una interfaz y un runtime para administrar los recursos e hilos subyacentes. En esta tesis, mejoramos la programación con MPBT al proporcionar tres nuevos planificadores de tareas para AMCs. Estos planificadores dinámicos reducen el tiempo total de ejecución ya sea detectando la camino más largo o el camino crítico del grafo de dependencia de tareas de la aplicación, que es generado dinámicamente. En nuestra evaluación, comparamos estos planificadores con un planificador heterogéneo existente y demonstramos su mejora sobre un planificador FIFO. Mostramos que los planificadores heterogéneos mejoran el planificador FIFO en hasta 1.45x en un AMC real de 8 núcleos y hasta 2.1x en un AMC simulado de 32 núcleos. Otra contribución en los MPBT es la adaptabilidad al paralelismo de grano fino. El creciente número de núcleos en los chip multinúcleos modernos está empujando la investigación hacia el uso de cargas de trabajo de grano fino, que es un desafío importante para los MPBT. Nuestro estudio observa que la creación de tareas bloquea la ejecución con cargas de trabajo de grano fino con MPBT. Cuando el número de núcleos aumenta, el tiempo empleado en generar tareas pasa a ser más crítico para toda la ejecución. Nuestra solución es TaskGenX, que minimiza los costes de creación de tareas y se basa en una extensión del runtime y en un hardware dedicado. En el runtime, TaskGenX desacopla la creación de tareas de las otras actividades del runtime, ejecutando esta actividad en un hardware especializado. Evaluamos 11 aplicaciones de HPC con TaskGenX en sistemas simétricos y AMC y obtenemos mejoras de rendimiento de hasta 15x, con un promedio de 3.1x sobre la implementación de referencia. Finalmente, esta tesis presenta un planificador de CPU con el objetivo de aumentar los fotogramas por segundo (FPS) para juegos en dispositivos móviles con sistemas AMC. Diseñamos e implementamos el planificador de Real-Time Scheduler (RTS) en Android. El RTS proporciona una política de programación eficiente que tiene en cuenta la temperatura actual del sistema para realizar la migración de tareas. La solución RTS aumenta la FPS mediana de los mecanismos de referenciaPostprint (published version

    A Linux Kernel Scheduler Implementation for Asymmetric CPUs

    Get PDF
    Την τελευταία δεκαετία έχουν χρησιμοποιηθεί ευρέως ασύμμετροι επεξεργαστές που αποτελούνται από πυρήνες διαφορετικών επεξεργαστικών δυνατοτήτων στις κινητές συσκευές. Η σχεδίαση τους επιτρέπει στους κατασκευαστές επεξεργαστών να προσφέρουν ελαττώσουν την κατανάλωση ενέργειας διατηρώντας καλή ταχύτητα. Ταυτοχρόνως, τέτοιοι επεξεργαστές απαιτούν ειδική μεταχείριση από τους προγραμματιστές λειτουργικών συστημάτων, οι οποίοι πρέπει να σχεδιάσουν ειδικούς χρονοπρογραμματιστές που να εκμεταλλεύονται την ασυμμετρία. Προσφάτως έχουν κυκλοφορήσει ασύμμετροι επεξεργαστές για προσωπικούς υπολογιστές, ένας χώρος στον οποίο παραδοσιακά κυριαρχούν οι συμμετρικοί επεξεργαστές. Αυτή η εξέλιξη είναι αποτέλεσμα της βελτιωμένου ενεργειακού τους προφίλ, καθώς και της χωρικής αποδοτικότητας αυτών των αρχιτεκτονικών. Η αντικατάσταση ενός μεγάλου πυρήνα με πολλούς μικρούς προσφέρει αυξημένη διεκπεραιωτική ικανότητα πολυνηματικών εργασιών χωρίς να αυξάνει το κόστος παραγωγής. Οι ασύμμετροι επεξεργαστές για υπολογιστές δημιουργούν μία σειρά από προβλήματα που οι χρονοπρογραμματιστές καλούνται να αντιμετωπίσουν. Η ανομοιογένεια στους υπολογιστές αποσκοπεί στον συνδυασμό της ελαχιστοποίησης της κατανάλωση ενέργειας σε καταστάσεις χαμηλού φόρτου και στη μακροχρόνια και ταχύ πολυνηματική απόδοση σε καταστάσεις αυξημένου φόρτου. Αντιθέτως, στις κινητές συσκευές, οι ανομοιογενής αρχιτεκτονικές αποσκοπούν στη μεγιστοποίηση της διάρκειας μπαταρίας, και την γοργή απόδοση σε σύντομα διαστήματα υψηλού φόρτου. Επιπλέον, η ταυτόχρονη πολυνημάτωση (SMT) είναι μία τεχνική που ενώ δεν χρησιμοποιείται σε κινητές συσκευές, χρησιμοποιείται από σχεδόν όλους τους επεξεργαστές υπολογιστών. Η κακή διαχείριση της ταυτόχρονης πολυνημάτωσης μπορεί να οδηγήσει σε σημαντική ελάττωση της αποδοτικότητας του συστήματος. Στην παρούσα πτυχιακή εργασία παρουσιάζουμε τον HCS, έναν γενικό χρονοπρογραμματιστή για ανομοιογενής επεξεργαστές που υποστηρίζουν ταυτόχρονη πολυνημάτωση. Συνδυάζει υπάρχουσες τεχνικές χρονοπρογραμματισμού ασύμμετρων επεξεργαστών για να προσφέρει ενεργειακή αποδοτικότητα και ταχύτητα. Επιπλέον, ο HCS είναι εύκολο να τροποποιηθεί από τους χρήστες ή τους προγραμματιστές εφαρμογών. Ο HCS εχει υλοποιηθεί, ως επέκταση του ULE, στον πυρήνα των Linux ως αντικαταστάτης του χρόνοπρογραμματιστή CFS.Asymmetric CPUs consisting of cores with different processing capacities have been widely used in mobile devices in the last decade. Their design allows chip-makers to offer improved energy efficiency while maintaining solid performance. The other side of the coin is that CPUs of this type require extra care from operating system developers, who have to design schedulers that take advantage of this asymmetry. Recently, heterogeneous (asymmetric) architectures have started to appear in the PC space, which has traditionally been dominated by homogeneous CPUs. In addition to the energy-efficiency benefits, this development is being propelled by the improved space-efficiency of small cores. Replacing a single big core with several small cores leads to increased multi-threaded performance without increasing manufacturing costs. Asymmetric Multi-Processing in computers poses some unique challenges that schedulers have to handle. Heterogeneity in computers aims for strong, sustained multithreaded performance in high-load situations and low power consumption in low-load situations. On the contrary, in mobile devices, heterogeneous architectures aim to maximize battery life while offering decent performance to a limited number of tasks with short bursts of CPU-intensive activity. Apart from that, Simultaneous Multi-threading hasn’t seen much use in mobile platforms leading to the design of heterogeneity-aware SMT-unaware schedulers. Mishandling SMT can lead to significant performance degradation and is unacceptable for a modern computer scheduler. In this thesis, we present HCS, a heterogeneity-aware, SMT-aware, general-purpose CPU scheduler for servers, desktops and laptops. It combines utilization-based, bias-based, and fairness-based scheduling schemes to provide efficiency and performance. To add to that, HCS allows both users and application developers to configure its behavior to better suit their needs. HCS is built on top of ULE and has been implemented in the Linux Kernel as a replacement for the energy-aware variant of the CFS scheduler

    Energy and Route Optimization of Moving Devices

    Get PDF
    This thesis highlights our efforts in energy and route optimization of moving devices. We have focused on three categories of such devices; industrial robots in a multi-robot environment, generic vehicles in a vehicle routing problem (VRP) context, automatedguided vehicles (AGVs) in a large-scale flexible manufacturing system (FMS). In the first category, the aim is to develop a non-intrusive energy optimization technique, based on a given set of paths and sequences of operations, such that the original cycle time is not exceeded. We develop an optimization procedure based on a mathematical programming model that aims to minimize the energy consumption and peak power. Our technique has several advantages. It is non-intrusive, i.e. it requires limited changes in the robot program and can be implemented easily. Moreover,it is model-free, in the sense that no particular, and perhaps secret, parameter or dynamic model is required. Furthermore, the optimization can be done offline, within seconds using a generic solver. Through careful experiments, we have shown that it is possible to reduce energy and peak-power up to about 30% and 50% respectively. The second category of moving devices comprises of generic vehicles in a VRP context. We have developed a hybrid optimization approach that integrates a distributed algorithm based on a gossip protocol with a column generation (CG) algorithm, which manages to solve the tested problems faster than the CG algorithm alone. The algorithm is developed for a VRP variation including time windows (VRPTW), which is meant to model the task of scheduling and routing of caregivers in the context of home healthcare routing and scheduling problems (HHRSPs). Moreover,the developed algorithm can easily be parallelized to further increase its efficiency. The last category deals with AGVs. The choice of AGVs was not arbitrary; by design, we decided to transfer our knowledge of energy optimization and routing algorithms to a class of moving devices in which both techniques are of interest. Initially, we improve an existing method of conflict-free AGV scheduling and routing, such that the new algorithm can manage larger problems. A heuristic version of the algorithm manages to solve the problem instances in a reasonable amount of time. Later, we develop strategies to reduce the energy consumption. The study is carried out using an AGV system installed at Volvo Cars. The results are promising; (1)the algorithm reduces performance measures such as makespan up to 50%, while reducing the total travelled distance of the vehicles about 14%, leading to an energy saving of roughly 14%, compared to the results obtained from the original traffic controller. (2) It is possible to reduce the cruise velocities such that more energy is saved, up to 20%, while the new makespan remains better than the original one

    PMCTrack: Delivering performance monitoring counter support to the OS scheduler

    Get PDF
    Hardware performance monitoring counters (PMCs) have proven effective in characterizing application performance. Because PMCs can only be accessed directly at the OS privilege level, kernellevel tools must be developed to enable the end-user and userspace programs to access PMCs. A large body of work has demonstrated that the OS can perform effective runtime optimizations in multicore systems by leveraging performance-counter data. Special attention has been paid to optimizations in the OS scheduler. While existing performance monitoring tools greatly simplify the collection of PMC application data from userspace, they do not provide an architecture-agnostic kernel-level mechanism that is capable of exposing high-level PMC metrics to OS components, such as the scheduler. As a result, the implementation of PMC-based OS scheduling schemes is typically tied to specific processor models. To address this shortcoming we present PMCTrack, a novel tool for the Linux kernel that provides a simple architecture-independent mechanism that makes it possible for the OS scheduler to access per-thread PMC data. Despite being an OSoriented tool, PMCTrack still allows the gathering of monitoring data from userspace, enabling kernel developers to carry out the necessary offline analysis and debugging to assist them during the scheduler design process. In addition, the tool provides both the OS and the user-space PMCTrack components with other insightful metrics available in modern processors and which are not directly exposed as PMCs, such as cache occupancy or energy consumption. This information is also of great value when it comes to analyzing the potential benefits of novel scheduling policies on real systems. In this paper, we analyze different case studies that demonstrate the flexibility, simplicity and powerful features of PMCTrack.Facultad de InformáticaInstituto de Investigación en Informátic

    PMCTrack: Delivering performance monitoring counter support to the OS scheduler

    Get PDF
    Hardware performance monitoring counters (PMCs) have proven effective in characterizing application performance. Because PMCs can only be accessed directly at the OS privilege level, kernellevel tools must be developed to enable the end-user and userspace programs to access PMCs. A large body of work has demonstrated that the OS can perform effective runtime optimizations in multicore systems by leveraging performance-counter data. Special attention has been paid to optimizations in the OS scheduler. While existing performance monitoring tools greatly simplify the collection of PMC application data from userspace, they do not provide an architecture-agnostic kernel-level mechanism that is capable of exposing high-level PMC metrics to OS components, such as the scheduler. As a result, the implementation of PMC-based OS scheduling schemes is typically tied to specific processor models. To address this shortcoming we present PMCTrack, a novel tool for the Linux kernel that provides a simple architecture-independent mechanism that makes it possible for the OS scheduler to access per-thread PMC data. Despite being an OSoriented tool, PMCTrack still allows the gathering of monitoring data from userspace, enabling kernel developers to carry out the necessary offline analysis and debugging to assist them during the scheduler design process. In addition, the tool provides both the OS and the user-space PMCTrack components with other insightful metrics available in modern processors and which are not directly exposed as PMCs, such as cache occupancy or energy consumption. This information is also of great value when it comes to analyzing the potential benefits of novel scheduling policies on real systems. In this paper, we analyze different case studies that demonstrate the flexibility, simplicity and powerful features of PMCTrack.Facultad de InformáticaInstituto de Investigación en Informátic
    corecore