Search CORE

22 research outputs found

Trace-based Performance Analysis for Hardware Accelerators

Author: Juckeland Guido
Publication venue
Publication date: 05/02/2013
Field of study

This thesis presents how performance data from hardware accelerators can be included in event logs. It extends the capabilities of trace-based performance analysis to also monitor and record data from this novel parallelization layer. The increasing awareness to power consumption of computing devices has led to an interest in hybrid computing architectures as well. High-end computers, workstations, and mobile devices start to employ hardware accelerators to offload computationally intense and parallel tasks, while at the same time retaining a highly efficient scalar compute unit for non-parallel tasks. This execution pattern is typically asynchronous so that the scalar unit can resume other work while the hardware accelerator is busy. Performance analysis tools provided by the hardware accelerator vendors cover the situation of one host using one device very well. Yet, they do not address the needs of the high performance computing community. This thesis investigates ways to extend existing methods for recording events from highly parallel applications to also cover scenarios in which hardware accelerators aid these applications. After introducing a generic approach that is suitable for any API based acceleration paradigm, the thesis derives a suggestion for a generic performance API for hardware accelerators and its implementation with NVIDIA CUPTI. In a next step the visualization of event logs containing data from execution streams on different levels of parallelism is discussed. In order to overcome the limitations of classic performance profiles and timeline displays, a graph-based visualization using Parallel Performance Flow Graphs (PPFGs) is introduced. This novel technical approach is using program states in order to display similarities and differences between the potentially very large number of event streams and, thus, enables a fast way to spot load imbalances. The thesis concludes with the in-depth analysis of a case-study of PIConGPU---a highly parallel, multi-hybrid plasma physics simulation---that benefited greatly from the developed performance analysis methods.Diese Dissertation zeigt, wie der Ablauf von Anwendungsteilen, die auf Hardwarebeschleuniger ausgelagert wurden, als Programmspur mit aufgezeichnet werden kann. Damit wird die bekannte Technik der Leistungsanalyse von Anwendungen mittels Programmspuren so erweitert, dass auch diese neue Parallelitätsebene mit erfasst wird. Die Beschränkungen von Computersystemen bezüglich der elektrischen Leistungsaufnahme hat zu einer steigenden Anzahl von hybriden Computerarchitekturen geführt. Sowohl Hochleistungsrechner, aber auch Arbeitsplatzcomputer und mobile Endgeräte nutzen heute Hardwarebeschleuniger um rechenintensive, parallele Programmteile auszulagern und so den skalaren Hauptprozessor zu entlasten und nur für nicht parallele Programmteile zu verwenden. Dieses Ausführungsschema ist typischerweise asynchron: der Skalarprozessor kann, während der Hardwarebeschleuniger rechnet, selbst weiterarbeiten. Die Leistungsanalyse-Werkzeuge der Hersteller von Hardwarebeschleunigern decken den Standardfall (ein Host-System mit einem Hardwarebeschleuniger) sehr gut ab, scheitern aber an einer Unterstützung von hochparallelen Rechnersystemen. Die vorliegende Dissertation untersucht, in wie weit auch multi-hybride Anwendungen die Aktivität von Hardwarebeschleunigern aufzeichnen können. Dazu wird die vorhandene Methode zur Erzeugung von Programmspuren für hochparallele Anwendungen entsprechend erweitert. In dieser Untersuchung wird zuerst eine allgemeine Methodik entwickelt, mit der sich für jede API-gestützte Hardwarebeschleunigung eine Programmspur erstellen lässt. Darauf aufbauend wird eine eigene Programmierschnittstelle entwickelt, die es ermöglicht weitere leistungsrelevante Daten aufzuzeichnen. Die Umsetzung dieser Schnittstelle wird am Beispiel von NVIDIA CUPTI darstellt. Ein weiterer Teil der Arbeit beschäftigt sich mit der Darstellung von Programmspuren, welche Aufzeichnungen von den unterschiedlichen Parallelitätsebenen enthalten. Um die Einschränkungen klassischer Leistungsprofile oder Zeitachsendarstellungen zu überwinden, wird mit den parallelen Programmablaufgraphen (PPFGs) eine neue graphenbasisierte Darstellungsform eingeführt. Dieser neuartige Ansatz zeigt eine Programmspur als eine Folge von Programmzuständen mit gemeinsamen und unterchiedlichen Abläufen. So können divergierendes Programmverhalten und Lastimbalancen deutlich einfacher lokalisiert werden. Die Arbeit schließt mit der detaillierten Analyse von PIConGPU -- einer multi-hybriden Simulation aus der Plasmaphysik --, die in großem Maße von den in dieser Arbeit entwickelten Analysemöglichkeiten profiert hat

Technische Universität Dresden: Qucosa

Concepts for In-memory Event Tracing: Runtime Event Reduction with Hierarchical Memory Buffers

Author: Wagner Michael
Publication venue
Publication date: 03/07/2015
Field of study

This thesis contributes to the field of performance analysis in High Performance Computing with new concepts for in-memory event tracing. Event tracing records runtime events of an application and stores each with a precise time stamp and further relevant metrics. The high resolution and detailed information allows an in-depth analysis of the dynamic program behavior, interactions in parallel applications, and potential performance issues. For long-running and large-scale parallel applications, event-based tracing faces three challenges, yet unsolved: the number of resulting trace files limits scalability, the huge amounts of collected data overwhelm file systems and analysis capabilities, and the measurement bias, in particular, due to intermediate memory buffer flushes prevents a correct analysis. This thesis proposes concepts for an in-memory event tracing workflow. These concepts include new enhanced encoding techniques to increase memory efficiency and novel strategies for runtime event reduction to dynamically adapt trace size during runtime. An in-memory event tracing workflow based on these concepts meets all three challenges: First, it not only overcomes the scalability limitations due to the number of resulting trace files but eliminates the overhead of file system interaction altogether. Second, the enhanced encoding techniques and event reduction lead to remarkable smaller trace sizes. Finally, an in-memory event tracing workflow completely avoids intermediate memory buffer flushes, which minimizes measurement bias and allows a meaningful performance analysis. The concepts further include the Hierarchical Memory Buffer data structure, which incorporates a multi-dimensional, hierarchical ordering of events by common metrics, such as time stamp, calling context, event class, and function call duration. This hierarchical ordering allows a low-overhead event encoding, event reduction and event filtering, as well as new hierarchy-aided analysis requests. An experimental evaluation based on real-life applications and a detailed case study underline the capabilities of the concepts presented in this thesis. The new enhanced encoding techniques reduce memory allocation during runtime by a factor of 3.3 to 7.2, while at the same do not introduce any additional overhead. Furthermore, the combined concepts including the enhanced encoding techniques, event reduction, and a new filter based on function duration within the Hierarchical Memory Buffer remarkably reduce the resulting trace size up to three orders of magnitude and keep an entire measurement within a single fixed-size memory buffer, while still providing a coarse but meaningful analysis of the application. This thesis includes a discussion of the state-of-the-art and related work, a detailed presentation of the enhanced encoding techniques, the event reduction strategies, the Hierarchical Memory Buffer data structure, and a extensive experimental evaluation of all concepts

Technische Universität Dresden: Qucosa

Advanced Memory Data Structures for Scalable Event Trace Analysis

Author: Knüpfer Andreas
Publication venue: Technische Universität Dresden
Publication date: 16/12/2008
Field of study

The thesis presents a contribution to the analysis and visualization of computational performance based on event traces with a particular focus on parallel programs and High Performance Computing (HPC). Event traces contain detailed information about speciﬁed incidents (events) during run-time of programs and allow minute investigation of dynamic program behavior, various performance metrics, and possible causes of performance ﬂaws. Due to long running and highly parallel programs and very ﬁne detail resolutions, event traces can accumulate huge amounts of data which become a challenge for interactive as well as automatic analysis and visualization tools. The thesis proposes a method of exploiting redundancy in the event traces in order to reduce the memory requirements and the computational complexity of event trace analysis. The sources of redundancy are repeated segments of the original program, either through iterative or recursive algorithms or through SPMD-style parallel programs, which produce equal or similar repeated event sequences. The data reduction technique is based on the novel Complete Call Graph (CCG) data structure which allows domain speciﬁc data compression for event traces in a combination of lossless and lossy methods. All deviations due to lossy data compression can be controlled by constant bounds. The compression of the CCG data structure is incorporated in the construction process, such that at no point substantial uncompressed parts have to be stored. Experiments with real-world example traces reveal the potential for very high data compression. The results range from factors of 3 to 15 for small scale compression with minimum deviation of the data to factors &gt; 100 for large scale compression with moderate deviation. Based on the CCG data structure, new algorithms for the most common evaluation and analysis methods for event traces are presented, which require no explicit decompression. By avoiding repeated evaluation of formerly redundant event sequences, the computational effort of the new algorithms can be reduced in the same extent as memory consumption. The thesis includes a comprehensive discussion of the state-of-the-art and related work, a detailed presentation of the design of the CCG data structure, an elaborate description of algorithms for construction, compression, and analysis of CCGs, and an extensive experimental validation of all components.Diese Dissertation stellt einen neuartigen Ansatz für die Analyse und Visualisierung der Berechnungs-Performance vor, der auf dem Ereignis-Tracing basiert und insbesondere auf parallele Programme und das Hochleistungsrechnen (High Performance Computing, HPC) zugeschnitten ist. Ereignis-Traces (Ereignis-Spuren) enthalten detaillierte Informationen über spezifizierte Ereignisse während der Laufzeit eines Programms und erlauben eine sehr genaue Untersuchung des dynamischen Verhaltens, verschiedener Performance-Metriken und potentieller Performance-Probleme. Aufgrund lang laufender und hoch paralleler Anwendungen und dem hohen Detailgrad kann das Ereignis-Tracing sehr große Datenmengen produzieren. Diese stellen ihrerseits eine Herausforderung für interaktive und automatische Analyse- und Visualisierungswerkzeuge dar. Die vorliegende Arbeit präsentiert eine Methode, die Redundanzen in den Ereignis-Traces ausnutzt, um sowohl die Speicheranforderungen als auch die Laufzeitkomplexität der Trace-Analyse zu reduzieren. Die Ursachen für Redundanzen sind wiederholt ausgeführte Programmabschnitte, entweder durch iterative oder rekursive Algorithmen oder durch SPMD-Parallelisierung, die gleiche oder ähnliche Ereignis-Sequenzen erzeugen. Die Datenreduktion basiert auf der neuartigen Datenstruktur der &quot;Vollständigen Aufruf-Graphen&quot; (Complete Call Graph, CCG) und erlaubt eine Kombination von verlustfreier und verlustbehafteter Datenkompression. Dabei können konstante Grenzen für alle Abweichungen durch verlustbehaftete Kompression vorgegeben werden. Die Datenkompression ist in den Aufbau der Datenstruktur integriert, so dass keine umfangreichen unkomprimierten Teile vor der Kompression im Hauptspeicher gehalten werden müssen. Das enorme Kompressionsvermögen des neuen Ansatzes wird anhand einer Reihe von Beispielen aus realen Anwendungsszenarien nachgewiesen. Die dabei erzielten Resultate reichen von Kompressionsfaktoren von 3 bis 5 mit nur minimalen Abweichungen aufgrund der verlustbehafteten Kompression bis zu Faktoren &gt; 100 für hochgradige Kompression. Basierend auf der CCG_Datenstruktur werden außerdem neue Auswertungs- und Analyseverfahren für Ereignis-Traces vorgestellt, die ohne explizite Dekompression auskommen. Damit kann die Laufzeitkomplexität der Analyse im selben Maß gesenkt werden wie der Hauptspeicherbedarf, indem komprimierte Ereignis-Sequenzen nicht mehrmals analysiert werden. Die vorliegende Dissertation enthält eine ausführliche Vorstellung des Stands der Technik und verwandter Arbeiten in diesem Bereich, eine detaillierte Herleitung der neu eingeführten Daten-strukturen, der Konstruktions-, Kompressions- und Analysealgorithmen sowie eine umfangreiche experimentelle Auswertung und Validierung aller Bestandteile

Technische Universität Dresden: Qucosa

GTI: A Generic Tools Infrastructure for Event-Based Tools in Parallel Systems

Author: de Supinski B R
Hilbrich T
Mueller M S
Nagel W E
Schulz M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/01/2012
Field of study

Abstract not provide

Crossref

UNT Digital Library

Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework

Author: Dietrich Robert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/11/2019
Field of study

The efficient parallel execution of scientific applications is a key challenge in high-performance computing (HPC). With growing parallelism and heterogeneity of compute resources as well as increasingly complex software, performance analysis has become an indispensable tool in the development and optimization of parallel programs. This thesis presents a framework for systematic performance analysis of scalable, heterogeneous applications. Based on event traces, it automatically detects the critical path and inefficiencies that result in waiting or idle time, e.g. due to load imbalances between parallel execution streams. As a prerequisite for the analysis of heterogeneous programs, this thesis specifies inefficiency patterns for computation offloading. Furthermore, an essential contribution was made to the development of tool interfaces for OpenACC and OpenMP, which enable a portable data acquisition and a subsequent analysis for programs with offload directives. At present, these interfaces are already part of the latest OpenACC and OpenMP API specification. The aforementioned work, existing preliminary work, and established analysis methods are combined into a generic analysis process, which can be applied across programming models. Based on the detection of wait or idle states, which can propagate over several levels of parallelism, the analysis identifies wasted computing resources and their root cause as well as the critical-path share for each program region. Thus, it determines the influence of program regions on the load balancing between execution streams and the program runtime. The analysis results include a summary of the detected inefficiency patterns and a program trace, enhanced with information about wait states, their cause, and the critical path. In addition, a ranking, based on the amount of waiting time a program region caused on the critical path, highlights program regions that are relevant for program optimization. The scalability of the proposed performance analysis and its implementation is demonstrated using High-Performance Linpack (HPL), while the analysis results are validated with synthetic programs. A scientific application that uses MPI, OpenMP, and CUDA simultaneously is investigated in order to show the applicability of the analysis

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Profilage et débogage par prise de traces efficaces d'applications hybrides multi-threadées HPC

Author: Besnard Jean-Baptiste
Publication venue: HAL CCSD
Publication date: 16/07/2014
Field of study

Supercomputers’ evolution is at the source of both hardware and software challenges. In the quest for the highest computing power, the interdependence in-between simulation components is becoming more and more impacting, requiring new approaches. This thesis is focused on the software development aspect and particularly on the observation of parallel software when being run on several thousand cores. This observation aims at providing developers with the necessary feedback when running a program on an execution substrate which has not been modeled yet because of its complexity. In this purpose, we firstly introduce the development process from a global point of view, before describing developer tools and related work. In a second time, we present our contribution which consists in a trace based profiling and debugging tool and its evolution towards an on-line coupling method which as we will show is more scalable as it overcomes IOs limitations. Our contribution also covers our time-stamp synchronisation algorithm for tracing purposes which relies on a probabilistic approach with quantified error. We also present a tool allowing machine characterisation from the MPI aspect and demonstrate the presence of machine noise for both point to point and collectives, justifying the use of an empirical approach. In summary, this work proposes and motivates an alternative approach to trace based event collection while preserving event granularity and a reduced overheadL’évolution des supercalculateurs est à la source de défis logiciels et architecturaux. Dans la quête de puissance de calcul, l’interdépendance des éléments du processus de simulation devient de plus en plus impactante et requiert de nouvelles approches. Cette thèse se concentre sur le développement logiciel et particulièrement sur l’observation des programmes parallèles s’exécutant sur des milliers de cœurs. Dans ce but, nous décrivons d’abord le processus de développement de manière globale avant de présenter les outils existants et les travaux associés. Dans un second temps, nous détaillons notre contribution qui consiste d’une part en des outils de débogage et profilage par prise de traces, et d’autre part en leur évolution vers un couplage en ligne qui palie les limitations d’entrées–sorties. Notre contribution couvre également la synchronisation des horloges pour la prise de traces avec la présentation d’un algorithme de synchronisation probabiliste dont nous avons quantifié l’erreur. En outre, nous décrivons un outil de caractérisation machine qui couvre l’aspect MPI. Un tel outil met en évidence la présence de bruit aussi bien sur les communications de type point-à-point que de type collective. Enfin, nous proposons et motivons une alternative à la collecte d’événements par prise de traces tout en préservant la granularité des événements et un impact réduit sur les performances, tant sur le volet utilisation CPU que sur les entrées–sortie

Techniques To Facilitate the Understanding of Inter-process Communication Traces

Author: Alawneh Lu'ay
Publication venue
Publication date: 12/04/2012
Field of study

High Performance Computing (HPC) systems play an important role in today’s heavily digitized world, which is in a constant demand for higher speed of calculation and performance. HPC applications are used in multiple domains such as telecommunication, health, scientific research, and more. With the emergence of multi-core and cloud computing platforms, the HPC paradigm is quickly becoming the design of choice of many service providers. HPC systems are also known to be complex to debug and analyze due to the large number of processes they involve and the way these processes communicate with each other to perform specific tasks. As a result, software engineers must spend extensive amount of time understanding the complex interactions among a system’s processes. This is usually done through the analysis of execution traces generated from running the system at hand. Traces, however, are very difficult to work with due to the overwhelming size of typical traces. The objective of this research is to present a set of techniques that facilitates the understanding of the behaviour of HPC applications through the analysis of system traces. The first technique consists of building an exchange format called MTF (MPI Trace Format) for representing and exchanging traces generated from HPC applications based on the MPI (Message Passing Interface) standard, which is a de facto standard for inter-process communication for high performance computing systems. The design of MTF is validated against well-known requirements for a standard exchange format. The second technique aims to facilitate the understanding of large traces of inter-process communication by automatically extracting communication patterns that characterize their main behaviour. Two algorithms are presented. The first one permits the recognition of repeating patterns in traces of MPI (Message Passing Interaction) applications whereas the second algorithm searches if a given communication pattern occurs in a trace. Both algorithms are based on the n-gram extraction technique used in natural language processing. Finally, we developed a technique to abstract MPI traces by detecting the different execution phases in a program based on concepts from information theory. Using this approach, software engineers can examine the trace as a sequence of high-level computational phases instead of a mere flow of low-level events. The techniques presented in this thesis have been tested on traces generated from real HPC programs. The results from several case studies demonstrate the usefulness and effectiveness of our techniques

Concordia University Research Repository

Structures de données hautement extensibles pour le stockage sur disque de séries temporelles hétérogènes

Author: Prieur-Drevon Loïc
Publication venue
Publication date: 01/03/2017
Field of study

Résumé Les systèmes informatiques deviennent de plus en plus complexes, et les développeurs ont plus que jamais besoin de comprendre comment interagissent les nombreux composants de leurs systèmes. De nombreux outils existent pour instrumenter, mesurer et analyser les comportements et la performance des logiciels. Le traçage est une technique qui enregistre de nombreux points associés à des événements du système et l'estampille de temps à laquelle ils se sont produits. L'analyse manuelle des traces produites permet de comprendre différents problèmes, mais elle devient fastidieuse lorsque ces historiques contiennent de très grands nombres de points. Il existe des logiciels pour automatiser ces analyses et fournir des visualisations, mais ces derniers peuvent aussi montrer leurs limites pour se mettre à l'échelle des systèmes les plus étendus. Dans des travaux précédents, Montplaisir et coll. ont présenté une structure de données sur disque, optimisée pour stocker les résultats des analyses de traces sous forme d'intervalles d'états: . La structure, nommée State History Tree (SHT), est un arbre pour lequel chaque nœud est associée à un bloc de disque, chaque nœud dispose donc d'une capacité fixe pour stocker des intervalles et est défini par un intervalle de temps tel que cet intervalle est inclus dans celui du nœud parent et que les intervalles de deux enfants ne se superposent pas. Cette structure était plus efficace que d'autres solutions génériques, mais pouvait dégénérer, dans des cas avec un très grand nombre de clés, pour une trace avec de nombreux fils par exemple, la profondeur de l'arbre était alors proportionnelle au nombre de fils, et de très nombreux nœuds "vides" étaient écrits sur disque, gaspillant de l'espace. De plus, les requêtes pour extraire les informations de la structure étaient souvent le goulot d'étranglement pour l'affichage des données. Dans ce travail, nous analysons les limites de la base de données actuelle qui la conduisent à dégénérer et nous étudierons les cas d'utilisation des requêtes. Nous proposons des modifications structurelles permettant d'éliminer les cas de dégénérescence lorsque la trace contient de nombreux attributs, tout en réduisant la taille sur disque de la structure pour tous types de traces. Nous ajoutons aussi des métadonnées aux nœuds de l'arbre pour réduire le nombre de nœuds lus pendant les requêtes. Ceci permet de réduire la durée des requêtes de 50% dans la plupart des cas. Ensuite, nous cherchons à optimiser le processus d'insertion des intervalles dans les nœuds de l'arbre afin de regrouper les intervalles qui seront demandés dans une même requête pour limiter le nombre de blocs de disque à lire pour répondre. Le nombre d'intervalles pris en compte dans l'optimisation peut augmenter avec le nombre de clés par exemple, ce qui permet de maintenir un équilibre entre le temps supplémentaire requis pour l'optimisation et les gains constatés sur les requêtes qui deviennent plus flagrants lorsque l'analyse produit de nombreuses clés. Nous introduirons aussi un nouveau type de requête profitant de ces optimisations et permettant de retourner en une requête un ensemble d'intervalles qui précédemment prenait plusieurs requêtes. De plus cette requête assure que chaque nœud est lu au plus une fois, alors que l'utilisation de plusieurs requêtes impliquait que certains nœuds étaient lus plusieurs fois. Nous montrons que l'utilisation de cette requête dans une des vues principales du logiciel de visualisation augmente considérablement sa réactivité. Nous profiterons ensuite de ces apprentissages pour faciliter la mise à l'échelle d'une seconde structure de données du logiciel d'analyse de trace, qui stocke des objets nommés "segments", sous la forme de . Ces objets étaient précédemment stockés en mémoire et donc le nombre que nous pouvions stocker était limité. Nous utilisons une structure en arbre fortement inspirée du SHT. Nous montrons que la structure sur disque est au pire un ordre de grandeur plus lent que les structures en mémoire à la lecture. De plus, cette structure est particulièrement efficace pour un cas d'usage qui demande à retourner des segments triés. En effet, nous utilisons un algorithme réalisant l'évaluation à la demande et un tri partiel entre les nœuds, qui utilise moins de mémoire que le tri de tous les segments.----------ABSTRACT Computer systems are becoming more and more complex, and developers need more than ever to be able to understand how different components interact. Many tools have been developed for instrumenting, measuring and analysing the behavior and performance of software. One of these techniques is tracing, which records data-points and a timestamp associated to system events. Trace analysis can help solve a number of problems, but manual analysis becomes a daunting task when these traces contain a large number of points. Automated trace analysis software has been developed for this use case, but they too can face difficulties scaling up to the largest systems. In previous work, Montplaisir et al. presented a disk-based data structure, optimized for storing the results of trace analysis as state intervals: . This State History Tree (SHT) is a tree for which each node is mapped to a block on disk, such that each node has a fixed capacity to store intervals and is defined by a time range that must be included in its parent's range and must not overlap with its siblings' ranges. This structure was demonstrated to be more efficient that other, generic solutions, but could still degenerate for corner cases with many keys, from traces with many threads for example. The tree's depth would then be proportional to the number of threads and many empty nodes would be written to disk, wasting space. Moreover, queries to extract data from the data structure were often the bottleneck for displaying data. In this work, we analyse the limitations of the current database which cause it to degenerate and study the different use cases for queries. We suggest structural changes to the data structure, which eliminate the corner case behavior for traces with many attributes, while reducing the disk usage for all types of traces. We also add meta data to the nodes to reduce the number of nodes searched during queries, speeding them up by 50%. Then, we look into optimizing the nodes into which intervals are inserted, so that those which will be queried together will be grouped. This helps to reduce the number of disk blocks that must be read to answer the query. The number of intervals and nodes taken into account by the optimization process can increase along with the number of attributes, as they are the main cause of query slowdown. This helps to balance the extra time required for the optimized insertion and the gains provided on the queries. We also introduce a new type of query to benefit from these optimizations and return all desired intervals in a single query instead of the many queries previously required. This single query reads each node at most once, while the previous version with many queries would read some nodes several times. We show that using this query for one of the main views in the trace visualization software makes it considerably more reactive. We benefit from all these lessons learned to increase the scalability of another internal backend, the segment store, used for the following type of objects: . These were previously stored in memory, which would strongly limit the maximum capacity. We propose a new tree structure similar to the SHT instead. We show that the disk based structure is, in the worst case, only an order of magnitude slower for reads than the in-memory structures. Moreover, this structure is especially efficient for a typical segment store use case, which is a sorted segment query. Indeed by using partial sorts between nodes, memory usage is dramatically reduced compared to sorting all segments in memory

PolyPublie

Performance Optimization Strategies for Transactional Memory Applications

Author: Schindewolf Martin Otto
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2013
Field of study

This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications

KITopen

Verification Witnesses

Author: Beyer D
Dangl M
Dietsch D
Heizmann M
Lemberger T
Tautschnig M
Publication venue: Association for Computing Machinery
Publication date: 01/01/2022
Field of study

Over the last years, witness-based validation of verification results has become an established practice in software verification: An independent validator re-establishes verification results of a software verifier using verification witnesses, which are stored in a standardized exchange format. In addition to validation, such exchangable information about proofs and alarms found by a verifier can be shared across verification tools, and users can apply independent third-party tools to visualize and explore witnesses to help them comprehend the causes of bugs or the reasons why a given program is correct. To achieve the goal of making verification results more accessible to engineers, it is necessary to consider witnesses as first-class exchangeable objects, stored independently from the source code and checked independently from the verifier that produced them, respecting the important principle of separation of concerns. We present the conceptual principles of verification witnesses, give a description of how to use them, provide a technical specification of the exchange format for witnesses, and perform an extensive experimental study on the application of witness-based result validation, using the validators CPAchecker, UAutomizer, CPA-witness2test, and FShell-witness2test

Queen Mary Research Online