27 research outputs found

    High-Integrity Performance Monitoring Units in Automotive Chips for Reliable Timing V&V

    Get PDF
    As software continues to control more system-critical functions in cars, its timing is becoming an integral element in functional safety. Timing validation and verification (V&V) assesses softwares end-to-end timing measurements against given budgets. The advent of multicore processors with massive resource sharing reduces the significance of end-to-end execution times for timing V&V and requires reasoning on (worst-case) access delays on contention-prone hardware resources. While Performance Monitoring Units (PMU) support this finer-grained reasoning, their design has never been a prime consideration in high-performance processors - where automotive-chips PMU implementations descend from - since PMU does not directly affect performance or reliability. To meet PMUs instrumental importance for timing V&V, we advocate for PMUs in automotive chips that explicitly track activities related to worst-case (rather than average) softwares behavior, are recognized as an ISO-26262 mandatory high-integrity hardware service, and are accompanied with detailed documentation that enables their effective use to derive reliable timing estimatesThis work has also been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. Enrico Mezzet has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva-Incorporación postdoctoral fellowship number IJCI-2016- 27396.Peer ReviewedPostprint (author's final draft

    Modelado de potencia en placas SBC: integración de diferentes generaciones Raspberry Pi

    Get PDF
    Monitorear la potencia de los procesadores es una tarea importante para definir estrategias que permitan disminuir los gastos de energía en los sistemas informáticos. Hoy en día, los procesadores disponen de un elevado número de contadores que permiten monitorear eventos del sistema tales como uso de CPU, memoria, cache, entre otros. Anteriormente se ha demostrado que es posible predecir el consumo de aplicaciones paralelas a través de estos eventos, pero únicamente para una determinada arquitectura de placas SBC. El presente trabajo analiza la portabilidad de un modelo estadístico de predicción de potencia sobre una nueva generación de placas Raspberry. La experimentación destaca las optimizaciones llevadas a cabo con el objetivo de reducir sistemáticamente el error final de estimación en las arquitecturas analizadas. El modelo final arroja un error promedio de 4.76% sobre ambas placas.XX Workshop de Procesamiento Distribuido y Paralelo.Red de Universidades con Carreras en Informátic

    Modelado de potencia en placas SBC: integración de diferentes generaciones Raspberry Pi

    Get PDF
    Monitorear la potencia de los procesadores es una tarea importante para definir estrategias que permitan disminuir los gastos de energía en los sistemas informáticos. Hoy en día, los procesadores disponen de un elevado número de contadores que permiten monitorear eventos del sistema tales como uso de CPU, memoria, cache, entre otros. Anteriormente se ha demostrado que es posible predecir el consumo de aplicaciones paralelas a través de estos eventos, pero únicamente para una determinada arquitectura de placas SBC. El presente trabajo analiza la portabilidad de un modelo estadístico de predicción de potencia sobre una nueva generación de placas Raspberry. La experimentación destaca las optimizaciones llevadas a cabo con el objetivo de reducir sistemáticamente el error final de estimación en las arquitecturas analizadas. El modelo final arroja un error promedio de 4.76% sobre ambas placas.XX Workshop de Procesamiento Distribuido y Paralelo.Red de Universidades con Carreras en Informátic

    Unified Power Modeling Design for Various Raspberry Pi Generations Analyzing Different Statistical Methods

    Get PDF
    Monitoring processor power is important to define strategies that allow reducing energy costs in computer systems. Today, processors have a large number of counters that allow monitoring system events such as CPU usage, memory, cache, and so forth. In previous works, it has been shown that parallel application consumption can be predicted through these events, but only for a given SBC board architecture. In this article, we analyze the portability of a power prediction statistical model on a new generation of Raspberry boards. Our experiments focus on the optimizations using different statistical methods so as to systematically reduce the final estimation error in the architectures analyzed. The final models yield an average error between 2.24% and 4.45%, increasing computational cost as the prediction error decreases.Trabajo publicado en Pesado, P., Arroyo, M. (eds.). Computer Science – CACIC 2019. Communications in Computer and Information Science (CCIS), vol. 1184. Springer, Cham.Instituto de Investigación en InformáticaComisión de Investigaciones Científicas de la provincia de Buenos AiresConsejo Nacional de Investigaciones Científicas y Técnica

    Modelado de potencia en placas SBC: integración de diferentes generaciones Raspberry Pi

    Get PDF
    Monitorear la potencia de los procesadores es una tarea importante para definir estrategias que permitan disminuir los gastos de energía en los sistemas informáticos. Hoy en día, los procesadores disponen de un elevado número de contadores que permiten monitorear eventos del sistema tales como uso de CPU, memoria, cache, entre otros. Anteriormente se ha demostrado que es posible predecir el consumo de aplicaciones paralelas a través de estos eventos, pero únicamente para una determinada arquitectura de placas SBC. El presente trabajo analiza la portabilidad de un modelo estadístico de predicción de potencia sobre una nueva generación de placas Raspberry. La experimentación destaca las optimizaciones llevadas a cabo con el objetivo de reducir sistemáticamente el error final de estimación en las arquitecturas analizadas. El modelo final arroja un error promedio de 4.76% sobre ambas placas.XX Workshop de Procesamiento Distribuido y Paralelo.Red de Universidades con Carreras en Informátic

    Establishing a base of trust with performance counters for enterprise workloads

    Get PDF
    Understanding the performance of large, complex enterprise-class applications is an important, yet nontrivial task. Methods using hardware performance counters, such as profiling through event-based sampling, are often favored over instrumentation for analyzing such large codes, but rarely provide good accuracy at the instruction level. This work evaluates the accuracy ofmultiple eventbased sampling techniques and quantifies the impact of a range of improvements suggested in recent years. The evaluation is performed on instances of three modern CPU architectures, using designated kernels and full applications. We conclude that precisely distributed events considerably improve accuracy, with further improvements possible when using Last Branch Records. We also present practical recommendations for hardware architects, tool developers and performance engineers, aimed at improving the quality of results

    Low-Overhead Dynamic Instruction Mix Generation using Hybrid Basic Block Profiling

    Get PDF
    Dynamic instruction mixes form an important part of the toolkits of performance tuners, compiler writers, and CPU architects. Instruction mixes are traditionally generated using software instrumentation, an accurate yet slow method, that is normally limited to user-mode code. We present a new method for generating instruction mixes using the Performance Monitoring Unit (PMU) of the CPU. It has very low overhead, extends coverage to kernel-mode execution, and causes only a very modest decrease in accuracy, compared to software instrumentation. In order to achieve this level of accuracy, we develop a new PMU-based data collection method, Hybrid Basic Block Profiling (HBBP). HBBP uses simple machine learning techniques to choose, on a per basic block basis, between data from two conventional sampling methods, Event Based Sampling (EBS) and Last Branch Records (LBR). We implement a profiling tool based on HBBP, and we report on experiments with the industry standard SPEC CPU2006 suite, as well as with two large-scale scientific codes. We observe an improvement in runtime compared to software instrumentation of up to 76x on the tested benchmarks, reducing wait times from hours to minutes. Instruction attribution errors average 2.1%. The results indicate that HBBP provides a favorable tradeoff between accuracy and speed, making it a suitable candidate for use in production environments

    Taming hardware event samples for FDO compilation

    Full text link
    Feedback-directed optimization (FDO) is effective in improving application runtime performance, but has not been widely adopted due to the tedious dual-compilation model, the difficulties in generating representative training data sets, and the high runtime overhead of profile collection. The use of hardware-event sampling to generate estimated edge profiles overcomes these drawbacks. Yet, hardware event samples are typically not precise at the instruction or basic-block granularity. These inaccuracies lead to missed performance when compared to instrumentation-based FDO@. In this paper, we use multiple hardware event profiles and supervised learning techniques to generate heuristics for improved precision of basic-block-level sample profiles, and to further improve the smoothing algorithms used to construct edge profiles. We demonstrate that sampling-based FDO can achieve an average of 78% of the performance gains obtained using instrumentation-based exact edge profiles for SPEC2000 benchmarks, matching or beating instrumentation-based FDO in many cases. The overhead of collection is only 0.74% on average, while compiler based instrumentation incurs 6.8%-53.5% overhead (and 10x overhead on an industrial web search application), and dynamic instrumentation incurs 28.6%-1639.2% overhead. ? 2010 ACM.EI
    corecore