Search CORE

387 research outputs found

Understanding the Performance of Managed Runtime Environments

Author: Hartley Timothy
Publication venue
Publication date: 31/12/2022
Field of study

The University of Manchester - Institutional Repository

Scalability testing automation using multivariate characterization and detection of software performance antipatterns

Author: Avritzer Alberto
Britto Ricardo
Camilli Matteo
Chalawadi Ram Kishan
Heinrich Robert
Henß Jörg
Hoorn André van
Janes Andrea
Rapp Martina
Russo Barbara
Trubiani Catia
Publication venue: Elsevier
Publication date: 25/07/2022
Field of study

KITopen

Scalability testing automation using multivariate characterization and detection of software performance antipatterns

Author: Avritzer A.
Britto R.
Camilli M.
Chalawadi R. K.
Heinrich R.
Henss J.
Janes A.
Rapp M.
Russo B.
Trubiani C.
van Hoorn A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Special Issue: Performance Modeling and Evaluation of Computer and Telecommunication Systems

Author: Obaidat Mohammad S.
Sevillano Ramos José Luis
Publication venue
Publication date: 01/01/2012
Field of study

idUS. Depósito de Investigación Universidad de Sevilla

First observation of the very rare decay neutral kaon(long) going to electron positron

Author: Martin Robert David, III
Publication venue: W&M ScholarWorks
Publication date: 01/01/1998
Field of study

Brookhaven National Laboratory AGS Experiment 871 (E871) has carried out a search for the very rare, GIM and helicity suppressed decay of the long-lived neutral kaon into an electron-positron pair. This decay is predicted within the Standard Model to occur with a branching ratio of {dollar}(9.0 \pm 0.4)\times 10\sp{lcub}-12{rcub}.{dollar} Data were taken during the 1995 and 1996 AGS HEP running periods on the B5 beamline. A signal of (4 {dollar}\pm{dollar} 2) {dollar}K\sbsp{lcub}L{rcub}{lcub}0{rcub} \to e\sp+e\sp-{dollar} events is seen with a physics background expectation of 0.2 events. The ratio of the partial decay widths {dollar}\Gamma(K\sbsp{lcub}L{rcub}{lcub}0{rcub} \to e\sp+e\sp-)/\Gamma(K\sbsp{lcub}L{rcub}{lcub}0{rcub}\to\mu\sp+\mu\sp-){dollar} is determined to be {dollar}(1.3\sbsp{lcub}-0.6{rcub}{lcub}+0.8{rcub} \pm 0.2)\times 10\sp{lcub}-3{rcub}.{dollar} Using the Particle Data Group value for the branching fraction of {dollar}K\sbsp{lcub}L{rcub}{lcub}0{rcub}\to\mu\sp+\mu\sp-{dollar} this corresponds to a branching fraction of {dollar}B(K\sbsp{lcub}L{rcub}{lcub}0{rcub}\to e\sp+e\sp-) = (9.4\sbsp{lcub}-4.6{rcub}{lcub}+5.9{rcub}) \times 10\sp{lcub}-12{rcub}.{dollar} This result represents the lowest branching fraction ever measured in particle physics

College of William & Mary: W&M Publish

Virtual Timing Isolation Safety-Net for Multicore Processors

Author: Freitag Johannes
Publication venue
Publication date: 06/08/2020
Field of study

Multicore processors promise to offer the performance as well as the reduced space, weight and power needed by future aircrafts. However, commercial off-the-shelf multicore processors suffer from timing interferences between cores which complicates applying them in hard real-time systems like avionic applications. In this thesis, a safety-net system is proposed which enables a virtual timing isolation of applications running on one core from all other cores. The technique is based on hardware external to the multicore processor and completely transparent to the applications, i.e. no modification of the observed software is necessary. The basic idea is to apply a single-core execution based worst-case execution time analysis and to accept a predefined slowdown during multicore execution. If the slowdown exceeds the acceptable bounds, interferences will be reduced by controlling the behavior of low-critical cores to keep the main application’s progress inside the given bounds. Measuring the progress of the applications running on the main core is performed by tracking the application’s fingerprint. A fingerprint is created by extraction of the performance counters of the critical core in very small timesteps which results in a characteristic curve for every execution of a periodic program. In standalone mode, without any running applications on the other cores, a model of an application is created by clustering and combining the extracted curves. During runtime, the extracted performance counter values are compared to the model to determine the progress of the critical application. In case the progress of an application is unacceptably delayed, the cores creating the interferences are throttled. The interference creating cores are determined by the accesses of the respective cores to the shared resources. A controller that takes the progress of a critical application as well as the time until the final deadline into account throttles the low priority cores. Throttling is either performed by frequency scaling of the interfering cores or by halt and continue with a pulse width modulation scheme. The complete safety-net system was evaluated on a TACLeBench benchmark running on an NXP P4080 multicore processor observed by a Xilinx FPGA implementing a MicroBlaze soft-core microcontroller. The results show that the progress can be measured by the fingerprinting with a final deviation of less than 1% for a TACLeBench execution with running opponent cores and indicate the non-intrusiveness of the approach. Several experiments are conducted to demonstrate the effectiveness of the different throttling mechanisms. Evaluations using a real-world avionic application show that the approach can be applied to integrated modular avionic applications. The safety-net does not ensure robust partitioning in the conventional meaning. The applications on the different cores can influence each other in the timing domain, but the external safety-net ensures that the interference on the high critical application is low enough to keep the timing. This allows for an efficient utilization of the multicore processor. Every critical application is treated individually, and by relying on individual models recorded in standalone mode, the critical as well as the non-critical applications running on the other cores can be exchanged without recreating a fingerprint model. This eases the porting of legacy applications to the multicore processor and allows the exchange of applications without recertification.Der Einsatz von Multicore Prozessoren in Avioniksystemen verspricht sowohl die Performancesteigerung als auch den reduzierten Platz-, Gewichts- und Energieverbrauch, der zur Realisierung von zukünftigen Flugzeugen benötigt wird. Die Verwendung von seriengefertigten (COTS) Multicore Prozessoren in sicherheitskritischen Echtzeitsystemen ist jedoch sehr komplex, da eine gegenseitige zeitliche Beeinflussung der Anwendungen auf den unterschiedlichen Kernen nicht ausgeschlossen werden kann. In dieser Arbeit wird ein Konzept vorgestellt, das eine virtuelle zeitliche Trennung der Anwendungen, die auf einem Prozessorkern ausgeführt werden, von denen der übrigen Kerne ermöglicht. Die Grundidee besteht darin, eine auf einer Single-Core-Ausführung basierende Laufzeitanalyse (WCET) durchzuführen und eine vordefinierte Verlangsamung während der Multicore-Ausführung zu akzeptieren. Wenn die Verlangsamung die zulässige Grenze überschreitet, wird das Verhalten niedrigkritischer Kerne so gesteuert, dass der Fortschritt der Hauptanwendung innerhalb der Deadlines bleibt. Die Bestimmung des Fortschritts der kritischen Anwendungen erfolgt durch das Verfolgen eines sogenannten Fingerprints. Ein Fingerprint wird durch Auslesen der Performance Counter des kritischen Kerns in sehr kleinen Zeitschritten erzeugt, was zu einer charakteristischen Kurve für jede Ausführung eines periodischen Programms führt. Ein Modell einer Anwendung wird erstellt, indem die extrahierten Kurven gruppiert und kombiniert werden. Während der Laufzeit werden die ausgelesenen Werte mit dem Modell verglichen, um den Fortschritt zu bestimmen. Falls die zeitliche Ausführung einer ktitischen Anwendung zu stark verzögert wird, werden die Kerne gedrosselt, welche die Störungen verursachen. Das Konzept wurde mit einem TACLeBench-Benchmark evaluiert, der auf einem NXP P4080 Multicore Prozessor ausgefüht, und von einem Xilinx-FPGA beobachtet wurde. Es konnte gezeigt werden, dass der Fortschritt durch den Fingerprint mit einer endgültigen Abweichung von weniger als 1% für eine TACLeBench-Ausführung mit laufenden konkurrierenden Kernen gemessen werden kann. Die Evaluation mit einer realen Avionik-Anwendung zeigte, dass das Konzept für integrierte modulare Avionik-Anwendungen (IMA) genutzt werden kann. Der Ansatz gewährleistet keine robuste Partitionierung im herkömmlichen Sinne. Die Anwendungen auf den verschiedenen Kernen können sich zeitlich gegenseitig beeinflussen, aber ein externes Sicherheitsnetz stellt sicher, dass die Verlangsamung der hochkritischen Anwendung niedrig genug ist, um die Deadlines zu halten. Dies ermöglicht eine effiziente Auslastung des Multicore Prozessors. Außerdem wird jede kritische Anwendung einzeln behandelt und verfügt über ein individuelles Modell. Somit können die kritischen und nicht kritischen Anwendungen, die auf den anderen Kernen ausgeführt werden, ausgetauscht werden, ohne ein Modell neu zu erstellen. Dies vereinfacht die Portierung von bestehenden Anwendungen auf Multicore Prozessoren und ermöglicht den Austausch von Anwendungen ohne eine erneute Zertifizierung

OPUS Augsburg

Behaviour analysis in binary SoC data

Author: Mcewan Dave
Publication venue
Publication date: 21/06/2022
Field of study

Explore Bristol Research

E-WarP: a system-wide framework for memory bandwidth profiling and management

Author: Drepper Ulrich
Mancuso Renato
Sohal Parul
Tabish Rohan
Publication venue
Publication date: 02/12/2020
Field of study

The proliferation of multi-core, accelerator-enabled embedded systems has introduced new opportunities to consolidate real-time systems of increasing complexity. But the road to build confidence on the temporal behavior of co-running applications has presented formidable challenges. Most prominently, the main memory subsystem represents a performance bottleneck for both CPUs and accelerators. And industry-viable frameworks for full-system main memory management and performance analysis are past due. In this paper, we propose our Envelope-aWare Predictive model, or E-WarP for short. E-WarP is a methodology and technological framework to: (1) analyze the memory demand of applications following a profile-driven approach; (2) make realistic predictions on the temporal behavior of workload deployed on CPUs and accelerators; and (3) perform saturation-aware system consolidation. This work aims at providing the technological foundations as well as the theoretical grassroots for truly workload-aware analysis of real-time systems. We provide a full implementation of our techniques on a commercial platform (NXP S32V234) and make two key observations. First, we achieve, on average, a 6% overprediction on the runtime of bandwidth-regulated applications. Second, we experimentally validate that the calculated bounds hold if the main memory subsystem operates below saturation.https://cs-people.bu.edu/rmancuso/files/papers/EWarP_RTSS20_final.pdfAccepted manuscrip

Boston University Institutional Repository (OpenBU)

Operating System Noise in the Linux Kernel

Author: Casini D.
Cucinotta T.
de Oliveira D. B.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

As modern network infrastructure moves from hardware-based to software-based using Network Function Virtualization, a new set of requirements is raised for operating system developers. By using the real-time kernel options and advanced CPU isolation features common to the HPC use-cases, Linux is becoming a central building block for this new architecture that aims to enable a new set of low latency networked services. Tuning Linux for these applications is not an easy task, as it requires a deep understanding of the Linux execution model and the mix of user-space tooling and tracing features. This paper discusses the internal aspects of Linux that influence the Operating System Noise from a timing perspective. It also presents Linux’s osnoise tracer, an in-kernel tracer that enables the measurement of the Operating System Noise as observed by a workload, and the tracing of the sources of the noise, in an integrated manner, facilitating the analysis and debugging of the system. Finally, this paper presents a series of experiments demonstrating both Linux’s ability to deliver low OS noise (in the single-digit μs order), and the ability of the proposed tool to provide precise information about root-cause of timing-related OS noise problems

Archivio della ricerca della Scuola Superiore Sant'Anna

Automatic Generation of Models of Microarchitectures

Author: Abel Andreas
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2020
Field of study

Detailed microarchitectural models are necessary to predict, explain, or optimize the performance of software running on modern microprocessors. Building such models often requires a significant manual effort, as the documentation provided by hardware manufacturers is typically not precise enough. The goal of this thesis is to develop techniques for generating microarchitectural models automatically. In the first part, we focus on recent x86 microarchitectures. We implement a tool to accurately evaluate small microbenchmarks using hardware performance counters. We then describe techniques to automatically generate microbenchmarks for measuring the performance of individual instructions and for characterizing cache architectures. We apply our implementations to more than a dozen different microarchitectures. In the second part of the thesis, we study more general techniques to obtain models of hardware components. In particular, we propose the concept of gray-box learning, and we develop a learning algorithm for Mealy machines that exploits prior knowledge about the system to be learned. Finally, we show how this algorithm can be adapted to minimize incompletely specified Mealy machines—a well-known NP-complete problem. Our implementation outperforms existing exact minimization techniques by several orders of magnitude on a number of hard benchmarks; it is even competitive with state-of-the-art heuristic approaches.Zur Vorhersage, Erklärung oder Optimierung der Leistung von Software auf modernen Mikroprozessoren werden detaillierte Modelle der verwendeten Mikroarchitekturen benötigt. Das Erstellen derartiger Modelle ist oft mit einem hohen Aufwand verbunden, da die erforderlichen Informationen von den Prozessorherstellern typischerweise nicht zur Verfügung gestellt werden. Das Ziel der vorliegenden Arbeit ist es, Techniken zu entwickeln, um derartige Modelle automatisch zu erzeugen. Im ersten Teil beschäftigen wir uns mit aktuellen x86-Mikroarchitekturen. Wir entwickeln zuerst ein Tool, das kleine Microbenchmarks mithilfe von Performance Countern auswerten kann. Danach beschreiben wir Techniken, um automatisch Microbenchmarks zu erzeugen, mit denen die Leistung einzelner Instruktionen gemessen sowie die Cache-Architektur charakterisiert werden kann. Im zweiten Teil der Arbeit betrachten wir allgemeinere Techniken, um Hardwaremodelle zu erzeugen. Wir schlagen das Konzept des “Gray-Box Learning” vor, und wir entwickeln einen Lernalgorithmus für Mealy-Maschinen, der bekannte Informationen über das zu lernende System berücksichtigt. Zum Abschluss zeigen wir, wie dieser Algorithmus auf das Problem der Minimierung unvollständig spezifizierter Mealy-Maschinen übertragen werden kann. Hierbei handelt es sich um ein bekanntes NP-vollständiges Problem. Unsere Implementierung ist in mehreren Benchmarks um Größenordnungen schneller als vorherige Ansätze

Universaar

Acronym