26 research outputs found

    Modellierung von On-Chip-Trace-Architekturen für eingebettete Systeme

    Get PDF
    Das als Trace bezeichnete nicht-invasive Aufzeichnen von Systemzuständen, während ein eingebettetes System unter realen Einsatzbedingungen in Echtzeit läuft und mit der Systemumgebung interagiert, ist ein wichtiger Teil von Softwaretests. Die Notwendigkeit für den On-Chip-Trace resultiert aus der rückläufigen Einsetzbarkeit etablierter Werkzeuge für den Off-Chip-Trace. Ein wesentlicher Bestandteil von On-Chip-Trace-Architekturen ist die Volumenreduktion der Tracedaten in deren Entstehungsgeschwindigkeit direkt auf dem Chip. Der Schwerpunkt liegt auf dem Trace des Instruktionsflusses von Prozessoren. Der aktuelle Stand der Forschung zeigt zwei Ausprägungen. Bei einfachen Lösungen ist der Kompressionsfaktor zu klein. Aufwendigere Lösungen liefern einen unvollständigen Instruktionstrace, wenn auch sequentielle Befehle bedingt ausgeführt werden. Bisher existieren keine Lösungen, die einen vollständigen Instruktionstrace mit hoher Kompression realisieren. Diese Lücke wird in der vorliegenden Arbeit geschlossen. Der systematische Entwurf der neuen On-Chip-Trace-Architektur beginnt mit der umfassenden Analyse typischer Benchmarkprogramme. Aus den Ergebnissen werden grundlegende Entwurfsentscheidungen abgeleitet. Diese Bitsequenzen von Ausführungsbits, die bei der bedingten Befehlsausführung entstehen, und die Zieladressen ausgeführter indirekter Sprünge werden in unabhängigen Kompressoren verarbeitet. Ein nachgeschalteter Kompressor für die Messages der anderen beiden Kompressoren ist optional und kann die Kompression weiter steigern. Diese Aufteilung stellt ein architektonisches Novum dar. Die Kompression von Bitsequenzen ist bisher ein weitestgehend unbehandeltes Feld. Implementiert worden ist hierfür ein gleitendes Wörterbuch mit der Granularität von Einzelbits. Die Vergleiche mit den untersuchten existierenden Architekturen zeigen die Überlegenheit der neuen Architektur bei der Kompression. Ein vollständiger Instruktionstrace ist für Prozessoren mit und ohne bedingt ausführbaren sequentiellen Befehlen realisiert worden

    Traçage logiciel assisté par matériel

    Get PDF
    Résumé Les logiciels deviennent de plus en plus complexes. Avec l'avènement de l'informatique embarquée, la limitation des ressources les contraint à s'exécuter en économisant le temps, la mémoire et l'énergie. Dans ce contexte, les développeurs ont besoin d'outils pour déboguer et optimiser les programmes qu'ils écrivent. Parmi ces outils, le traçage est une solution particulièrement adaptée qui enregistre l'occurrence d'événements, en interagissant peu avec l'exécution. Elle permet de mettre en évidence les causes de bogues ou les goulots d'étranglement qui ralentissent le programme. LTTng est un traceur focalisé sur les performances : grâce à des structures de données propres à chaque coeur et à des verrous non-bloquants, l'enregistrement d'un événement prend moins d'une microseconde sur une machine récente. Ce délai est toutefois nonnégligeable,il empêche de tracer un nombre arbitraire de points sans affecter les performances. De plus, le code et les données liées au traçage sont stockés dans l'espace mémoire du processus étudié, ce qui cause un impact sur son exécution. L'utilisation de blocs matériels dédiés au débogage pallie à ces limitations. Il existe une multitude de ces circuits, présents sur la plupart des processeurs du marché, à des fins de débogage et de profilage. En réutilisant leurs capacités à des fins de traçage, nous proposons de soulager la partie logicielle d'outils comme LTTng, et ainsi d'accroître leurs performances. Pour ce faire, nous utilisons les modules matériels STM, ETM et ETB de la suite CoreSight sur les processeurs ARM, ainsi que BTS sur les processeurs x86 d'Intel. Certains offrent une fonctionnalité de traçage d'exécution, c'est-à-dire d'enregistrement de la liste des instructions exécutées; d'autres fournissent des ressources spécialisées pour l'estampillage de temps, l'envoi de messages sur des canaux dédiés, et le stockage de traces. Dans ce mémoire, nous proposons des implémentations de traçage logiciel s'aidant du matériel pour être moins intrusifs que les outils purement logiciels. Nous visons à réduire le surcoût en temps engendré par le traçage, c'est-à-dire le nombre de cycles ajoutés à une exécution normale, tout en gardant le même détail d'information que fournit une trace. Nous montrons que l'utilisation conjointe des modules STM et ETB pour faire transiter les traces par des circuits matériels dédiés économise la mémoire du processus et que la durée des points de trace est divisée par dix par rapport à LTTng. En utilisant ETM et ETB, le surcoût du traçage est lui aussi réduit : entre -30% et -50% par rapport à notre traceur de référence. En revanche, les capacités du traceur d'exécution ETM limitent notre système à seulement quelques points de trace enregistrables dans tout le programme. Finalement, l'utilisation de BTS sur les processeurs Intel est aussi plus efficace : les points de trace sont presque deux fois plus rapides que ceux de LTTng. Cependant, ce système ne permet pas de choisir quels événements tracer : tous les branchements pris par le processeur sont enregistrés. Cette lourdeur rend BTS inutilisable pour faire du traçage d'événements ; néanmoins pour du traçage d'exécution, la ré-implémentation que nous proposons est 65% plus rapide que celle de Perf, l'outil par défaut sous Linux.---------Abstract Software is becoming increasingly complex. With the advent of embedded computing, resource limitations force it to run in a way saving time, memory and energy. In this context, developers need tools to debug and optimize the programs they write. Among these tools, tracing is a particularly well suited solution that records the occurrence of events, while minimally interacting with the execution. It allows to identify the causes of bugs or bottlenecks that slow down the program. LTTng is a tracer focused on performance: through per-core data structures and nonblocking locks, recording an event takes less than one microsecond on a typical computer. However, this delay is not negligible, and tracing an arbitrary number of points is not possible without affecting performance. In addition, the code and data related to tracing are stored in the memory space of the process being studied, causing an impact on its execution. The use of dedicated debug hardware blocks overcomes these limitations. There are a multitude of these circuits, present on most processors on the market, for of debugging and profiling purposes. By reusing their capacity for tracing purposes, we propose to alleviate the software part of tracing tools such as LTTng, and thereby increase their performance. To do this, we use STM, ETM and ETB hardware modules from the CoreSight suite on ARM processors, as well as BTS on Intel x86 processors. Some offer an execution tracing feature, i.e. recording the list of executed instructions; others provide specialized resources for timestamping, transfering messages on dedicated channels, and storing traces. In this thesis, we propose implementations of software tracing that take advantage of hardware to be less intrusive than pure-software tools. We aim to reduce the time overhead induced by tracing, i.e. the number of cycles added to a normal execution, while keeping the same detailed information as a trace provides. We show that the combined use of STM and ETB modules to send traces through dedicated hardware circuits saves process memory and that each tracepoint duration is divided by ten as compared to LTTng. Using ETM and ETB, the overhead of tracing is also reduced: between -30% and -50% as compared to our reference tracer. However, the capacity of the ETM execution tracer limits our system to only a few recordable tracepoints throughout the program. Finally, the use of BTS on Intel processors is also more efficient: tracepoints are almost two times faster than LTTng. However, it is not possible to choose which events to trace with this system: all branches taken by the processor are stored. This limitation makes BTS unusable for event tracing; however, for execution tracing the re-implementation we offer is 65% faster than Perf, the default tool on Linux

    Ge Real-time, Unobtrusive, and Efficient Program Execution Tracing with Stream Caches and Last Stream Predictors *

    No full text
    Abstract—This paper introduces a new hardware mechanism for capturing and compressing program execution traces unobtrusively in real-time. The proposed mechanism is based on two structures called stream cache and last stream predictor. We explore the effectiveness of a trace module based on these structures and analyze the design space. We show that our trace module, with less than 600 bytes of state, achieves a trace-port bandwidth of 0.15 bits/instruction/processor, which is over six times better than state-of-the-art commercial designs. I

    Report from Dagstuhl Seminar 23031: Frontiers of Information Access Experimentation for Research and Education

    Full text link
    This report documents the program and the outcomes of Dagstuhl Seminar 23031 ``Frontiers of Information Access Experimentation for Research and Education'', which brought together 37 participants from 12 countries. The seminar addressed technology-enhanced information access (information retrieval, recommender systems, natural language processing) and specifically focused on developing more responsible experimental practices leading to more valid results, both for research as well as for scientific education. The seminar brought together experts from various sub-fields of information access, namely IR, RS, NLP, information science, and human-computer interaction to create a joint understanding of the problems and challenges presented by next generation information access systems, from both the research and the experimentation point of views, to discuss existing solutions and impediments, and to propose next steps to be pursued in the area in order to improve not also our research methods and findings but also the education of the new generation of researchers and developers. The seminar featured a series of long and short talks delivered by participants, who helped in setting a common ground and in letting emerge topics of interest to be explored as the main output of the seminar. This led to the definition of five groups which investigated challenges, opportunities, and next steps in the following areas: reality check, i.e. conducting real-world studies, human-machine-collaborative relevance judgment frameworks, overcoming methodological challenges in information retrieval and recommender systems through awareness and education, results-blind reviewing, and guidance for authors.Comment: Dagstuhl Seminar 23031, report

    Situation-aware Edge Computing

    Get PDF
    Future wireless networks must cope with an increasing amount of data that needs to be transmitted to or from mobile devices. Furthermore, novel applications, e.g., augmented reality games or autonomous driving, require low latency and high bandwidth at the same time. To address these challenges, the paradigm of edge computing has been proposed. It brings computing closer to the users and takes advantage of the capabilities of telecommunication infrastructures, e.g., cellular base stations or wireless access points, but also of end user devices such as smartphones, wearables, and embedded systems. However, edge computing introduces its own challenges, e.g., economic and business-related questions or device mobility. Being aware of the current situation, i.e., the domain-specific interpretation of environmental information, makes it possible to develop approaches targeting these challenges. In this thesis, the novel concept of situation-aware edge computing is presented. It is divided into three areas: situation-aware infrastructure edge computing, situation-aware device edge computing, and situation-aware embedded edge computing. Therefore, the concepts of situation and situation-awareness are introduced. Furthermore, challenges are identified for each area, and corresponding solutions are presented. In the area of situation-aware infrastructure edge computing, economic and business-related challenges are addressed, since companies offering services and infrastructure edge computing facilities have to find agreements regarding the prices for allowing others to use them. In the area of situation-aware device edge computing, the main challenge is to find suitable nodes that can execute a service and to predict a node’s connection in the near future. Finally, to enable situation-aware embedded edge computing, two novel programming and data analysis approaches are presented that allow programmers to develop situation-aware applications. To show the feasibility, applicability, and importance of situation-aware edge computing, two case studies are presented. The first case study shows how situation-aware edge computing can provide services for emergency response applications, while the second case study presents an approach where network transitions can be implemented in a situation-aware manner

    Towards instantaneous performance analysis using coarse-grain sampled and instrumented data

    Get PDF
    Nowadays, supercomputers deliver an enormous amount of computation power; however, it is well-known that applications only reach a fraction of it. One limiting factor is the single processor performance because it ultimately dictates the overall achieved performance. Performance analysis tools help locating performance inefficiencies and their nature to ultimately improve the application performance. Performance tools rely on two collection techniques to invoke their performance monitors: instrumentation and sampling. Instrumentation refers to inject performance monitors into concrete application locations whereas sampling invokes the installed monitors to external events. Each technique has its advantages. The measurements obtained through instrumentation are directly associated to the application structure while sampling allows a simple way to determine the volume of measurements captured. However, the granularity of the measurements that provides valuable insight cannot be determined a priori. Should analysts study the performance of an application for the first time, they may consider using a performance tool and instrument every routine or use high-frequency sampling rates to provide the most detailed results. These approaches frequently lead to large overheads that impact the application performance and thus alter the measurements gathered and, therefore, mislead the analyst. This thesis introduces the folding mechanism that takes advantage of the repetitiveness found in many applications. The mechanism smartly combines metrics captured through coarse-grain sampling and instrumentation mechanisms to provide instantaneous metric reports within instrumented regions and without perturbing the application execution. To produce these reports, the folding processes metrics from different type of sources: performance and energy counters, source code and memory references. The process depends on their nature. While performance and energy counters represent continuous metrics, the source code and memory references refer to discrete values that point out locations within the application code or address space. This thesis evaluates and validates two fitting algorithms used in different areas to report continuous metrics: a Gaussian interpolation process known as Kriging and piece-wise linear regressions. The folding also takes benefit of analytical performance models to focus on a small set of performance metrics instead of exploring a myriad of performance counters. The folding also correlates the metrics with the source-code using two alternatives: using the outcome of the piece-wise linear regressions and a mechanism inspired by Multi-Sequence Alignment techniques. Finally, this thesis explores the applicability of the folding mechanism to captured memory references to detail which and how data objects are accessed. This thesis proposes an analysis methodology for parallel applications that focus on describing the most time-consuming computing regions. It is implemented on top of a framework that relies on a previously existing clustering tool and the folding mechanism. To show the usefulness of the methodology and the framework, this thesis includes the discussion of multiple first-time seen in-production applications. The discussions include high level of detail regarding the application performance bottlenecks and their responsible code. Despite many analyzed applications have been compiled using aggressive compiler optimization flags, the insight obtained from the folding mechanism has turned into small code transformations based on widely-known optimization techniques that have improved the performance in some cases. Additionally, this work also depicts power monitoring capabilities of recent processors and discusses the simultaneous performance and energy behavior on a selection of benchmarks and in-production applications.Actualment, els supercomputadors ofereixen una àmplia potència de càlcul però les aplicacions només en fan servir una petita fracció. Un dels factors limitants és el rendiment d'un processador, el qual dicta el rendiment en general. Les eines d'anàlisi de rendiment ajuden a localitzar els colls d'ampolla i la seva natura per a, eventualment, millorar el rendiment de l'aplicació. Les eines d'anàlisi de rendiment empren dues tècniques de recol·lecció de dades: instrumentació i mostreig. La instrumentació es refereix a la capacitat d'injectar monitors en llocs específics del codi mentre que el mostreig invoca els monitors quan ocórren esdeveniments externs. Cadascuna d'aquestes tècniques té les seves avantatges. Les mesures obtingudes per instrumentació s'associen directament a l'estructura de l'aplicació mentre que les obtingudes per mostreig permeten una forma senzilla de determinar-ne el volum capturat. Sigui com sigui, la granularitat de les mesures no es pot determinar a priori. Conseqüentment, si un analista vol estudiar el rendiment d'una aplicació sense saber-ne res, hauria de considerar emprar una eina d'anàlisi i instrumentar cadascuna de les rutines o bé emprar freqüències de mostreig altes per a proveir resultats detallats. En qualsevol cas, aquestes alternatives impacten en el rendiment de l'aplicació i per tant alterar les mètriques capturades, i conseqüentment, confondre a l'analista. Aquesta tesi introdueix el mecanisme anomenat folding, el qual aprofita la repetitibilitat existent en moltes aplicacions. El mecanisme combina intel·ligentment mètriques obtingudes mitjançant mostreig de gra gruixut i instrumentació per a proveir informes de mètriques instantànies dins de regions instrumentades sense pertorbar-ne l'execució. Per a produir aquests informes, el mecanisme processa les mètriques de diferents fonts: comptadors de rendiment i energia, codi font i referències de memoria. El procés depen de la natura de les dades. Mentre que les mètriques de rendiment i energia són valors continus, el codi font i les referències de memòria representen valors discrets que apunten ubicacions dins el codi font o l'espai d'adreces. Aquesta tesi evalua i valida dos algorismes d'ajust: un procés d'interpolació anomenat Kriging i una interpolació basada en regressions lineals segmentades. El mecanisme de folding també s'aprofita de models analítics de rendiment basats en comptadors hardware per a proveir un conjunt reduït de mètriques enlloc d'haver d'explorar una multitud de comptadors. El mecanisme també correlaciona les mètriques amb el codi font emprant dues alternatives: per un costat s'aprofita dels resultats obtinguts per les regressions lineals segmentades i per l'altre defineix un mecanisme basat en tècniques d'alineament de multiples seqüències. Aquesta tesi també explora l'aplicabilitat del mecanisme per a referències de memoria per a informar quines i com s'accessedeixen les dades de l'aplicació. Aquesta tesi proposa una metodología d'anàlisi per a aplicacions paral·leles centrant-se en descriure les regions de càlcul que consumeixen més temps. La metodología s'implementa en un entorn de treball que usa un mecanisme de clustering preexistent i el mecanisme de folding. Per a demostrar-ne la seva utilitat, aquesta tesi inclou la discussió de múltiples aplicacions analitzades per primera vegada. Les discussions inclouen un alt nivel de detall en referencia als colls d'ampolla de les aplicacions i de la seva natura. Tot i que moltes d'aquestes aplicacions s'han compilat amb opcions d'optimització agressives, la informació obtinguda per l'entorn de treball es tradueix en petites modificacions basades en tècniques d'optimització que permeten millorar-ne el rendiment en alguns casos. Addicionalment, aquesta tesi també reporta informació sobre el consum energètic reportat per processadors recents i discuteix el comportament simultani d'energia i rendiment en una selecció d'aplicacions sintètiques i aplicacions en producció

    The design and application of an extensible operating system

    Get PDF
    Tanenbaum, A.S. [Promotor

    Mediating disruption in human-computer interaction from implicit metrics of attention

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.Includes bibliographical references (p. 143-150).Multitasking environments cause people to be interrupted constantly, often disrupting their ongoing activities and impeding reaching their goals. This thesis presents a disruption reducing approach designed to support the user's goals and optimize productivity that is based on a model of the user's receptivity to an interruption. The model uses knowledge of the interruption content, context and priority of the task(s) in progress, user actions and goal-related concepts to mediate interruptions. The disruption management model is distinct from previous work by the addition of implicit sensors that deduce the interruption content and user context to help determine when an interruption will disrupt an ongoing activity. Domain-independent implicit sensors include mouse and keyboard behaviors, and goal-related concepts extracted from the user documents. The model also identifies the contextual relationship between interruptions and user goals as an important factor in how interruptions are controlled. The degree to which interruptions are related to the user goal determines how those interruptions will be received. We tested and evolved the model in various cases and showed significant improvement in both productivity and satisfaction. A disruption manager application controls interruptions on common desktop computing activities, such as web browsing and instant messaging. The disruption manager demonstrates that mediating interruptions by supporting the user goals can improve performance and overall productivity. Our evaluation shows an improvement in success of over 25% across prioritization conditions for real life computing environments.(cont.) Goal priority and interruption relevance play an important role in the interruption decision process and several experiments these factors on people's reactions and availability to interruptions, and overall performance. These experiments demonstrate that people recognize the potential benefits of being interrupted and adjust their susceptibility to interruptions during highly prioritized tasks. The outcome of this research includes a usable model that can be extended to tasks as diverse as driving an automobile and performing computer tasks. This thesis supports mediating technologies that will recognize the value of communication and control interruptions so that people are able to maintain concentration amidst their increasingly busy lifestyles.by Ernesto Arroyo Acosta.Ph.D

    Combining SOA and BPM Technologies for Cross-System Process Automation

    Get PDF
    This paper summarizes the results of an industry case study that introduced a cross-system business process automation solution based on a combination of SOA and BPM standard technologies (i.e., BPMN, BPEL, WSDL). Besides discussing major weaknesses of the existing, custom-built, solution and comparing them against experiences with the developed prototype, the paper presents a course of action for transforming the current solution into the proposed solution. This includes a general approach, consisting of four distinct steps, as well as specific action items that are to be performed for every step. The discussion also covers language and tool support and challenges arising from the transformation
    corecore