Search CORE

14 research outputs found

Analyse de systèmes embarqués par structuration de traces d'exécution

Author: Correnoz Jérôme
Marangozova-Martin Vania
Martin Alexis
Pagano Generoso
Publication venue: HAL CCSD
Publication date: 22/04/2014
Field of study

International audienceLe traçage d'une application est une technique classique utilisée lors de l'optimisation et du débogage. Toutefois, dans le domaine embarqué, les traces d'exécution sont volumineuses et difficiles à exploiter. Dans ce papier, nous proposons une structuration d'un modèle événe- mentiel de traces qui garde la généricité de représentation des données, tout en améliorant l'efficacité d'analyse. Nous montrons que ce modèle permet un traitement plus rapide avec une empreinte mémoire faible. L'approche est validée grâce à des scénarios réels du monde industriel en collaboration avec STMicroelectronics

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

La visualisation de traces, support à l'analyse, déverminage et optimisation d'applications de calcul haute performance

Author: Dosimont Damien
Huard Guillaume
Vincent Jean-Marc
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

National audienceL'analyse du comportement d'applications logicielles est une tâche de plus en plus difficile à cause de la complexité croissante des systèmes sur lesquels elles s'exécutent. Alors que l'analyse des systèmes embarqués doit faire face à une pile logicielle complexe, celle des systèmes parallèles doit être ca- pable de s'adapter à l'envergure de leur architecture matérielle et à leur indéterminisme. La visualisation de traces obtenues lors du déroulement des applications s'exécutant sur ces plate-formes s'est répandue dans les outils d'analyse pour faire face à ces problématiques. Il existe aujourd'hui un large éventail de techniques qui se distinguent par la quantité d'informations, l'échelle des systèmes, ou les comportements qu'elles sont capables de représenter. Nous nous proposons d'en faire un état de l'art, en discutant des méthodes de visualisation statistiques, comportementales et structurelles de l'application, et des techniques permettant le passage à l'échelle de l'analyse

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Fine-grained Energy / Power Instrumentation for Software-level Efficiency Optimization

Author: Greaves David J
Hopkins Andrew
McDonald-Maier Klaus
Puzovic Milos
Zaidi Ali Mustafa
Publication venue
Publication date: 01/11/2015
Field of study

In the pursuit of both increased energy-efficiency, as well as high-performance, architects are constructing increasingly complex Systems-on-Chip with a variety of processor cores and DMA controllers. This complexity makes software implementation and optimization difficult, particularly when multiple independent applications may be running concurrently on such a heterogeneous platform. In order to take full advantage of the underlying system, increased visibility into the interaction between the software and hardware is needed. This paper proposes on-line and off-line fine-grained instrumentation of SoC components in hardware (e.g. as part of the debug & trace infrastructure) in order to enable improvements and optimization for energy efficiency to be undertaken at higher levels of abstraction, i.e. the programmer and runtime scheduler. Energy counters are incorporated for each component that keep track of energy use. These counters are indexed by customer number tags, that are used to distinguish between the transactions executed on any given component by client applications running in a multitasking SoC environment. The contents of the counters for each augmented component, correlated with the appropriate consumer-numbers, are extracted from a running SoC under test via existing debug & trace interfaces like GDBserver, JTAG and various proprietary trace probes. In addition, auxiliary processing on-chip computes local and global energy figures and offers them through a 4-layer abstraction stack so that programmer-level finegrained energy measurement is made available. Both the O/S scheduler and programmers can adapt their policies and coding styles for their desired energy/performance tradeoff

University of Essex Research Repository

Fine-grained preemption analysis for latency investigation across virtual machines

Author: Francis Giraldeau
Michel R Dagenais
Mohamad Gebai
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

This paper studies the preemption between programs running in different virtual machines on the same computer. One of the current monitoring methods consist of updating the average steal time through collaboration with the hypervisor. However, the average is insufficient to diagnose abnormal latencies in time-sensitive applications. Moreover, the added latency is not directly visible from the virtual machine point of view. The main challenge is to recover the cause of preemption of a task running in a virtual machine, whether it is a task on the host computer or in another virtual machine. We propose a new method to study thread preemption crossing virtual machines boundaries using kernel tracing. The host computer and each monitored virtual machine are traced simultaneously. We developed an efficient and portable trace synchronization method, which is required to account for time offset and drift that occur within each virtual machine. We then devised an algorithm to recover the root cause of preemption between threads at every level. The algorithm successfully detected interactions between multiple competing threads in distinct virtual machines on a multi-core machine

Crossref

Springer - Publisher Connector

PolyPublie

5G mikropalveluiden valvonta Linux kernelin avulla

Author: Oksanen Ilkka
Publication venue
Publication date: 17/06/2019
Field of study

Software industry is adopting a scalable microservice architecture at increasing pace. At the advent of 5G, this introduces major changes for the architectures of telecommunication systems as well. The telecommunications software is moving towards virtualized solutions in form of virtual machines, and more recently, containers. New monitoring solutions have emerged, to efficiently monitor microservices. These tools however can not provide as detailed view to internal functions of the software than what is possible with tools provided by an operating system. Unfortunately, operating system level tracing tools are decreasingly available for the developers or system administrators. This is due to the fact that the virtualized cloud environment, working as a base for microservices, abstracts away the access to the runtime environment of the services. This thesis researches viability of using Linux kernel tooling in microservice monitoring. The viability is explored with a proof of concept container providing access to some of the Linux kernels network monitoring features. The main focus is evaluating the performance overhead caused by the monitor. It was found out that kernel tracing tools have a great potential for providing low overhead tracing data from microservices. However, the low overheads achieved in the networking context could not be reproduced reliably. In the benchmarks, the overhead of tracing rapidly increased as a function of the number of processors used. While the results cannot be generalized out of the networking context, the inconsistency in overhead makes Linux kernel monitoring tools less than ideal applications for a containerized microservice.Ohjelmistoala on yhä suuremmassa määrin siirtymässä skaalautuvien mikropalveluiden käyttöön. 5G:n saapuessa myös tietoliikennejärjestelmien arkkitehtuureissa nähdään suuria muutoksia. Tietoliikennejärjestelmät ovat muun ohjelmistoalan mukana siirtymässä virtualisoituihin ratkaisuihin, kuten virtuaalikoneisiin ja viimeisimpänä kontteihin. Uuden arkkitehtuurin myötä palveluiden valvontaan on syntynyt mikropalveluihin erikoistuneita työkaluja. Nämä työkalut eivät kuitenkaan pysty kilpailemaan käyttöjärjestelmän tarjoamien työkalujen kanssa valvonnan yksityiskohtaisuudessa. Valitettavasti käyttöjärjestelmätason valvontatyökalut ovat arkkitehtuurimuutoksen takia harvemmin ohjelmistokehittäjien ja ylläpitäjien ulottuvilla. Suuri syy tähän on se, että mikropalveluarkkitehtuurin myötä palvelut on virtualisoitu pilveen. Tällöin pääsyä palvelun suoritusympäristöön ei usein ole. Tässä työssä tutkitaan, onko Linux-ytimen valvontatyökalujen hyödyntäminen mikropalveluiden valvonnassa kannattavaa. Kannattavuutta tutkitaan kontissa ajettavalla monitoriprototyypillä, joka tarjoaa pääsyn osaan Linux-ytimen verkonvalvonta-ominaisuuksista. Tutkimuksen pääpaino on selvittää monitorin vaikututus ajossa olevan järjestelmän suorituskykyyn. Tutkimuksessa selvisi, että Linux-ytimen valvontatyökaluilla on optimitilanteessa mahdollista kerätä mikropalveluiden tilaan liittyvää valvontadataa ilman suurta vaikutusta suorituskykyyn. Epäsuotuisassa tilanteessa valvonnan vaikutus nousi kuitenkin merkittävästi. Verkkovalvonnan suhteellisen vaikutuksen havaittiin kasvavan laskentakuormaan käytettyjen prosessorien määrän funktiona. Tuloksia verkkovalvonnasta ei voi suoraan yleistää verkkovalvontakontekstin ulkopuolelle. Valvonnan vaikutuksen kasvun vahva riippuvuus käytetyn isäntäkoneen ominaisuuksista kuitenkin tekee Linux-ytimen valvontatyökaluista epäideaalin ratkaisun mikropalveluiden valvontaan

Aaltodoc Publication Archive

A thread synchronization model for the PREEMPT_RT Linux kernel

Author: Cucinotta Tommaso
de Oliveira Daniel B.
de Oliveira Rômulo S.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

This article proposes an automata-based model for describing and validating sequences of kernel events in Linux PREEMPT_RT and how they influence the timeline of threads’ execution, comprising preemption control, interrupt handling and control, scheduling and locking. This article also presents an extension of the Linux tracing framework that enables the tracing of kernel events to verify the consistency of the kernel execution compared to the event sequences that are legal according to the formal model. This enables cross-checking of a kernel behavior against the formalized one, and in case of inconsistency, it pinpoints possible areas of improvement of the kernel, useful for regression testing. Indeed, we describe in details three problems in the kernel revealed by using the proposed technique, along with a short summary on how we reported and proposed fixes to the Linux kernel community. As an example of the usage of the model, the analysis of the events involved in the activation of the highest priority thread is presented, describing the delays occurred in this operation in the same granularity used by kernel developers. This illustrates how it is possible to take advantage of the model for analyzing the preemption model of Linux

Archivio della ricerca della Scuola Superiore Sant'Anna

Low-level trace correlation on heterogeneous embedded systems

Author: Bertauld Thomas
Dagenais Michel R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/01/2017
Field of study

Tracing is a common method used to debug, analyze, and monitor various systems. Even though standard tools and tracing methodologies exist for standard and distributed environments, it is not the case for heterogeneous embedded systems. This paper proposes to fill this gap and discusses how efficient tracing can be achieved without having common system tools, such as the Linux Trace Toolkit (LTTng), at hand on every core. We propose a generic solution to trace embedded heterogeneous systems and overcome the challenges brought by their peculiar architectures (little available memory, bare-metal CPUs, or exotic components for instance). The solution described in this paper focuses on a generic way of correlating traces among different kinds of processors through traces synchronization, to analyze the global state of the system as a whole. The proposed solution was first tested on the Adapteva Parallella board. It was then improved and thoroughly validated on TI’s Keystone 2 System-on-Chip (SoC)

Crossref

Springer - Publisher Connector

PolyPublie

Predictive model creation approach using layered subsystems quantified data collection from LTE L2 software system

Author: Puerto Valencia J. (Jose)
Publication venue: University of Oulu
Publication date: 12/07/2019
Field of study

Abstract. The road-map to a continuous and efficient complex software system’s improvement process has multiple stages and many interrelated on-going transformations, these being direct responses to its always evolving environment. The system’s scalability on this on-going transformations depends, to a great extent, on the prediction of resources consumption, and systematic emergent properties, thus implying, as the systems grow bigger in size and complexity, its predictability decreases in accuracy. A predictive model is used to address the inherent complexity growth and be able to increase the predictability of a complex system’s performance. The model creation processes are driven by the recollection of quantified data from different layers of the Long-term Evolution (LTE) Data-layer (L2) software system. The creation of such a model is possible due to the multiple system analysis tools Nokia has already implemented, allowing a multiple-layers data gathering flow. The process starts by first, stating the system layers differences, second, the use of a layered benchmark approach for the data collection at different levels, third, the design of a process flow organizing the data transformations from recollection, filtering, pre-processing and visualization, and forth, As a proof of concept, different Performance Measurements (PM) predictive models, trained by the collected pre-processed data, are compared. The thesis contains, in parallel to the model creation processes, the exploration, and comparison of various data visualization techniques that addresses the non-trivial graphical representation of the in-between subsystem’s data relations. Finally, the current results of the model process creation process are presented and discussed. The models were able to explain 54% and 67% of the variance in the two test configurations used in the instantiation of the model creation process proposed in this thesis

University of Oulu Repository - Jultika

Points de trace statiques et dynamiques en mode noyau

Author: Fahem Rafik
Publication venue
Publication date: 01/04/2012
Field of study

RÉSUMÉ En utilisant TRACE_EVENT et UST, il est maintenant possible d’insérer des points d’instrumentation statiques sous Linux en modes noyau et usager respectivement, tout en minimisant l’impact sur la performance du système instrumenté. Toutefois, ces points d’instrumentation peuvent parfois s’avérer insuffisants pour diagnostiquer les origines d’un problème. L’instrumentation dynamique répond à ce besoin en rendant possible l’insertion de points de trace supplémentaires au moment de l’exécution. Récemment, les points de trace dynamiques ont été implémentés dans GDB et GDBServer en mode usager. En utilisant cette technique, on est capable d’associer un ensemble d’actions à n’importe quelle adresse dans le code du programme. Ces actions peuvent servir à collecter les valeurs des registres au moment où le point de trace est rencontré et aussi à évaluer des expressions complexes qui peuvent employer les variables accessibles à partir de cet endroit dans le programme. GDB étant capable de lire les informations de débogage et de localiser l’emplacement de chaque variable, on peut faire référence à une variable dans ces expressions en utilisant directement son nom sans se soucier de son emplacement. Les points de trace statiques et dynamiques de GDB peuvent être conditionnels. Dans ce cas, une expression est utilisée comme condition. Les expressions utilisées dans les conditions et expressions sont converties en code intermédiaire qui est interprété par GDBServer au moment où le point de trace est rencontré. L’intérêt du code intermédiaire est de garantir la portabilité entre les différentes architectures. Cependant, GDBServer peut le transformer en code natif afin d’améliorer la performance. En effet, exécuter du code natif est souvent plus rapide que d’avoir à identifier et exécuter des instructions une par une. Plus récemment, le module KGTP a été proposé comme contribution au noyau Linux. Il se base sur Kprobes pour implémenter les points de trace dynamiques de GDB en mode noyau et communique avec celui-ci en utilisant le protocole RSP (Remote Serial Protocol). Il est seulement capable d’interpréter le code intermédiaire produit par GDB et ne peut pas faire de conversion en code natif. L’objectif de ce travail est d’implémenter un convertisseur en mode noyau pour KGTP pour traduire le code intermédiaire en code natif pour les conditions et aussi les actions afin d’améliorer la performance des points de trace dynamiques. Aussi, nous allons intégrer TRACE_EVENT et GDB en mode noyau à travers KGTP pour être capable, tout comme en mode usager, de lister, activer et désactiver les points de trace statiques du noyau. Le même convertisseur de code intermédiaire à code natif est utilisé avec les points de trace statiques pour pouvoir leur associer des conditions et des expressions supplémentaires à exécuter. Ces expressions doivent elles aussi être capables d’utiliser toutes les variables accessibles au niveau du point de trace statique.----------ABSTRACT With kernel static tracepoints defined using TRACE_EVENT and user-space tracepoints through the UST library, it is now possible to add instrumentation and obtain a low overhead trace of the whole system. However, these static tracepoints may be insufficient to diagnose the source of a problem. Dynamic instrumentation fills the gap by making it possible to insert additional tracepoints in other locations at run time. Recently, GDB was enhanced to support dynamic tracepoints in user-space. Using this feature, tracepoints can be defined in almost every location in a program. A set of actions can be associated to each tracepoint. These actions may be used to collect the values of the registers at the time the tracepoint was hit or to evaluate user-defined expressions. These expressions may be complex and can employ all the program variables accessible from the tracepoint location. GDB being able to read the program debug information and to locate variables, we can refer to variables in these expressions by their names without having to care about their locations. GDB static and dynamic tracepoints may be conditional. In this case, expressions can be used as conditions. In order to simplify evaluation, GDB converts expressions used in conditions and actions to bytecode which is interpreted each time the corresponding tracepoint is hit. Moreover, in some situations, GDB converts the conditions’ bytecodes into native code in order to improve performance. More recently, the KGTP kernel module was submitted as a contribution to the Linux kernel. It uses Kprobes to insert GDB dynamic tracepoints into the kernel, implements the RSP (Remote Serial Protocol) to communicate with GDB and can interpret the bytecode used by GDB to define conditions and actions, but is unable to convert this bytecode to native code. The goal of this work is to extend the KGTP module by implementing a bytecode to native code converter in kernel space for both conditions and actions. GDB will also be integrated with TRACE_EVENT through KGTP in order to be able to list, enable and disable the kernel static tracepoints. Expressions may be used in conditions and additional actions. These expressions will be converted to native code and may employ all the variables accessible from the static tracepoint location

PolyPublie

Traçage de logiciels bénéficiant d'accélération graphique

Author: Couturier David
Publication venue
Publication date: 01/05/2015
Field of study

RÉSUMÉ En programmation, les récents changements d'architecture comme les processeurs à plusieurs cœurs de calcul rendirent la synchronisation des tâches qui y sont exécuté plus complexe à analyser. Pour y remédier, des outils de traçage comme LTTng furent implémentés dans l'optique de fournir des outils d'analyse de processus tout en gardant en tête les défis qu'implique les systèmes multi-cœur. Une seconde révolution dans le monde de l'informatique, les accélérateurs graphiques, créa alors un autre besoin de traçage. Les manufacturiers d'accélérateurs graphiques fournirent alors des outils d'analyse pour accélérateurs graphiques. Ces derniers permettent d'analyser l'exécution de commandes sur les accélérateurs graphiques. Ce mémoire apporte une solution au manque d'outil de traçage unifié entre le système hôte (le processeur central (CPU)) et l'exécution de noyaux de calcul OpenCL sur le périphérique (l'accélérateur graphique (GPU)). Par unifié, nous référons à la capacité d'un outil de prise de traces à collecter une trace du noyau de l'hôte sur lequel un périphérique d'accélération graphique est présent en plus de la trace d'exécution du périphérique d'accélération graphique. L'objectif initial principal de ce mémoire avait été défini comme suit: fournir un outil de traçage et les méthodes d’analyse qui permettent d'acquérir simultanément les traces de l’accélérateur graphique et du processeur central. En plus de l'objectif principal, les objectifs secondaires ajoutaient des critères de performance et de visualisation des traces enregistrés par la solution que ce mémoire présente. Les différentes notions de recherche explorés ont permis d'établir de hypothèses de départ. Ces dernières mentionnaient que le format de trace Common Trace Format (CTF) semblait permettre l'enregistrent de traces à faible surcoût et que des travaux précédents permettront d'effectuer la synchronisation entre les différents espaces temporels du CPU et du GPU. La solution présentée, OpenCL User Space Tracepoint (CLUST) consiste en une librairie qui remplace les symboles de la librairie de calcul GPGPU OpenCL. Pour l'utiliser, elle doit être chargée dynamiquement avant de lancer le programme à tracer. Elle instrumente ensuite toutes les fonctions d'OpenCL grâce aux points de trace LTTng-UST, permettant alors d'enregistrer les appels et de gérer les événements asynchrones communs aux GPUs. La performance de la librairie faisant partie des objectifs de départ, une analyse de la performance des différents cas d'utilisation de cette dernière démontre son faible surcoût : pour les charges de travail d'une taille raisonnable, un surcoût variant entre 0.5 % et 2 % fut mesuré. Cet accomplissement ouvre la porte à plusieurs cas d'utilisation. Effectivement, considérant le faible surcoût d'utilisation, CLUST ne représente pas seulement un outil qui permet l'acquisition de traces pour aider au développement de programmes mais peut aussi servir en tant qu'enregistreur permanent dans les systèmes critiques. La fonction "d'enregistreur de vol" de LTTng permet d'enregistrer une trace au disque seulement lorsque requis : l'ajout de données concernant l'état du GPU peut se révéler être un précieux avantage pour diagnostiquer un problème sur un serveur de production. Le tout sans ralentir le système de façon significative.----------ABSTRACT In the world of computing, programmers now have to face the complex challenges that multi-core processors have brought. To address this problem, tracing frameworks such as LTTng were implemented to provide tools to analyze multi-core systems without adding a major overhead on the system. Recently, Graphical Processing Units (GPUs) started a new revolution: General Purpose Graphical Processing Unit (GPGPU) computing. This allows programs to offload their parallel computation sections to the ultra parallel architecture that GPUs offer. Unfortunately, the tracing tools that were provided by the GPU manufacturers did not interoperate with CPU tracing. We propose a solution, OpenCL User Space Tracepoint (CLUST), that enables OpenCL GPGPU computing tracing as an extension to the LTTng kernel tracer. This allows unifying the CPU trace and the GPU trace in one efficient format that enables advanced trace viewing and analysis, to include both models in the analysis and therefore provide more information to the programmer. The objectives of this thesis are to provide a low overhead unified CPU-GPU tracing extension of LTTng, the required algorithms to perform trace domain synchronization between the CPU and the GPU time source domain, and provide a visualization model for the unified traces. As foundation work, we determined that already existing GPU tracing techniques could incorporate well with LTTng, and that trace synchronization algorithms already presented could be used to synchronize the CPU trace with the GPU trace. Therefore, we demonstrate the low overhead characteristics of the CLUST tracing library for typical applications under different use cases. The unified CPU-GPU tracing overhead is also measured to be insignificant (less than 2%) for a typical GPGPU application. Moreover, we use synchronization methods to determine the trace domain synchronization value between both traces. This solution is a more complete and robust implementation that provides the programmer with the required tools, never before implemented, in the hope of helping programmers develop more efficient OpenCL applications

PolyPublie