199 research outputs found
Performance Improvements Using Dynamic Performance Stubs
This thesis proposes a new methodology to extend the software performance engineering process.
Common performance measurement and tuning principles mainly target to improve the software
function itself. Hereby, the application source code is studied and improved independently of the
overall system performance behavior. Moreover, the optimization of the software function has to be
done without an estimation of the expected optimization gain. This often leads to an under- or overoptimization,
and hence, does not utilize the system sufficiently.
The proposed performance improvement methodology and framework, called dynamic performance
stubs, improves the before mentioned insufficiencies by evaluating the overall system performance
improvement. This is achieved by simulating the performance behavior of the original software
functionality depending on an adjustable optimization level prior to the real optimization. So, it
enables the software performance analyst to determine the systems’ overall performance behavior
considering possible outcomes of different improvement approaches. Moreover, by using the
dynamic performance stubs methodology, a cost-benefit analysis of different optimizations regarding
the performance behavior can be done.
The approach of the dynamic performance stubs is to replace the software bottleneck by a stub. This
stub combines the simulation of the software functionality with the possibility to adjust the
performance behavior depending on one or more different performance aspects of the replaced
software function. A general methodology for using dynamic performance stubs as well as several
methodologies for simulating different performance aspects is discussed. Finally, several case
studies to show the application and usability of the dynamic performance stubs approach are
presented
A practical guide to computer simulations
Here practical aspects of conducting research via computer simulations are
discussed. The following issues are addressed: software engineering,
object-oriented software development, programming style, macros, make files,
scripts, libraries, random numbers, testing, debugging, data plotting, curve
fitting, finite-size scaling, information retrieval, and preparing
presentations.
Because of the limited space, usually only short introductions to the
specific areas are given and references to more extensive literature are cited.
All examples of code are in C/C++.Comment: 69 pages, with permission of Wiley-VCH, see http://www.wiley-vch.de
(some screenshots with poor quality due to arXiv size restrictions) A
comprehensively extended version will appear in spring 2009 as book at
Word-Scientific, see http://www.worldscibooks.com/physics/6988.htm
CROCHET: Checkpoint and Rollback via Lightweight Heap Traversal on Stock JVMs
Checkpoint/rollback (CR) mechanisms create snapshots of the state of a running application, allowing it to later be restored to that checkpointed snapshot. Support for checkpoint/rollback enables many program analyses and software engineering techniques, including test generation, fault tolerance, and speculative execution.
Fully automatic CR support is built into some modern operating systems. However, such systems perform checkpoints at the coarse granularity of whole pages of virtual memory, which imposes relatively high overhead to incrementally capture the changing state of a process, and makes it difficult for applications to checkpoint only some logical portions of their state. CR systems implemented at the application level and with a finer granularity typically require complex developer support to identify: (1) where checkpoints can take place, and (2) which program state needs to be copied. A popular compromise is to implement CR support in managed runtime environments, e.g. the Java Virtual Machine (JVM), but this typically requires specialized, non-standard runtime environments, limiting portability and adoption of this approach.
In this paper, we present a novel approach for Checkpoint ROllbaCk via lightweight HEap Traversal (Crochet), which enables fully automatic fine-grained lightweight checkpoints within unmodified commodity JVMs (specifically Oracle\u27s HotSpot and OpenJDK). Leveraging key insights about the internal design common to modern JVMs, Crochet works entirely through bytecode rewriting and standard debug APIs, utilizing special proxy objects to perform a lazy heap traversal that starts at the root references and traverses the heap as objects are accessed, copying or restoring state as needed and removing each proxy immediately after it is used. We evaluated Crochet on the DaCapo benchmark suite, finding it to have very low runtime overhead in steady state (ranging from no overhead to 1.29x slowdown), and that it often outperforms a state-of-the-art system-level checkpoint tool when creating large checkpoints
Cache performance of chronological garbage collection
This thesis presents cache performance analysis of the Chronological Garbage Collection
Algorithm used in LVM system. LVM is a new Logic Virtual Machine for Prolog. It
adopts one stack policy for all dynamic memory requirements and cooperates with an
efficient garbage collection algorithm, the Chronological Garbage Collection, to recycle
space, not as a deliberate garbage collection operation, but as a natural activity of the
LVM engine to gather useful objects. This algorithm combines the advantages of the
traditional copying, mark-compact, generational, and incremental garbage collection
schemes.
In order to determine the improvement of cache performance under our garbage-
collection algorithm, we developed a simulator to do trace-driven cache simulation.
Direct-mapped cache and set-associative cache with different cache sizes, write policies,
block sizes and set associativities are simulated and measured. A comparison of LVM
and SICStus 3.1 for the same benchmarks was performed.
From the simulation results, we found important factors influencing the
performance of the CGC algorithm. Meanwhile, the results from the cache simulator fully
support the experimental results gathered from the LVM system: the cost of CGC Is
almost paid by the improved cache performance. Further, we found that the memory
reference patterns of our benchmarks share the same properties: most writes are for
allocation and most reads are to recently written objects. In addition, the results also
showed that the write-miss policy can have a dramatic effect on the cache performance of
the benchmarks and a write-validate policy gives the best performance. The comparison
shows that when the input size of benchmarks is small, SICStus is about 3-8 times faster
than LVM. This is an acceptable range of performance ratio for comparing a binary-code
engine against a byte-code emulator. When we increase the input sizes, some benchmarks
maintain this performance ratio, whereas others greatly narrow the performance gap and
at certain breakthrough points perform better than their counterparts under SICStus
Information Flow Control with System Dependence Graphs - Improving Modularity, Scalability and Precision for Object Oriented Languages
Die vorliegende Arbeit befasst sich mit dem Gebiet der statischen Programmanalyse
— insbesondere betrachten wir Analysen, deren Ziel es ist,
bestimmte Sicherheitseigenschaften, wie etwa Integrität und Vertraulichkeit,
fĂĽr Programme zu garantieren. HierfĂĽr verwenden wir sogenannte
Abhängigkeitsgraphen, welche das potentielle Verhalten des Programms
sowie den Informationsfluss zwischen einzelnen Programmpunkten
abbilden. Mit Hilfe dieser Technik können wir sicherstellen, dass z.B. ein
Programm keinerlei Information ĂĽber ein geheimes Passwort preisgibt.
Im Speziellen liegt der Fokus dieser Arbeit auf Techniken, die das
Erstellen des Abhängigkeitsgraphen verbessern, da dieser die Grundlage
fĂĽr viele weiterfĂĽhrende Sicherheitsanalysen bildet. Die vorgestellten
Algorithmen und Verbesserungen wurden in unser Analysetool Joana
integriert und als Open-Source öffentlich verfügbar gemacht. Zahlreiche
Kooperationen und Veröffentlichungen belegen, dass die Verbesserungen
an Joana auch in der Forschungspraxis relevant sind.
Diese Arbeit besteht im Wesentlichen aus drei Teilen. Teil 1 befasst sich
mit Verbesserungen bei der Berechnung des Abhängigkeitsgraphen, Teil 2
stellt einen neuen Ansatz zur Analyse von unvollständigen Programmen
vor und Teil 3 zeigt aktuelle Verwendungsmöglichkeiten von Joana an
konkreten Beispielen.
Im ersten Teil gehen wir detailliert auf die Algorithmen zum Erstellen
eines Abhängigkeitsgraphen ein, dabei legen wir besonderes Augenmerk
auf die Probleme und Herausforderung bei der Analyse von Objektorientierten
Sprachen wie Java. So stellen wir z.B. eine Analyse vor,
die den durch Exceptions ausgelösten Kontrollfluss präzise behandeln
kann. Hauptsächlich befassen wir uns mit der Modellierung von
Seiteneffekten, die bei der Kommunikation ĂĽber Methodengrenzen hinweg
entstehen können. Bei Abhängigkeitsgraphen werden Seiteneffekte, also
Speicherstellen, die von einer Methode gelesen oder verändert werden,
in Form von zusätzlichen Knoten dargestellt. Dabei zeigen wir, dass die
Art und Weise der Darstellung, das sogenannte Parametermodel, enormen
Einfluss sowohl auf die Präzision als auch auf die Laufzeit der gesamten
Analyse hat. Wir erklären die Schwächen des alten Parametermodels,
das auf Objektbäumen basiert, und präsentieren unsere Verbesserungen
in Form eines neuen Modells mit Objektgraphen. Durch das gezielte
Zusammenfassen von redundanten Informationen können wir die Anzahl
der berechneten Parameterknoten deutlich reduzieren und zudem
beschleunigen, ohne dabei die Präzision des resultierenden Abhängigkeitsgraphen
zu verschlechtern. Bereits bei kleineren Programmen im
Bereich von wenigen tausend Codezeilen erreichen wir eine im Schnitt
8-fach bessere Laufzeit — während die Präzision des Ergebnisses in der
Regel verbessert wird. Bei größeren Programmen ist der Unterschied
sogar noch deutlicher, was dazu führt, dass einige unserer Testfälle und
alle von uns getesteten Programme ab einer Größe von 20000 Codezeilen
nur noch mit Objektgraphen berechenbar sind. Dank dieser Verbesserungen
kann Joana mit erhöhter Präzision und bei wesentlich größeren
Programmen eingesetzt werden.
Im zweiten Teil befassen wir uns mit dem Problem, dass bisherige,
auf Abhängigkeitsgraphen basierende Sicherheitsanalysen nur vollständige
Programme analysieren konnten. So war es z.B. unmöglich,
Bibliothekscode ohne Kenntnis aller Verwendungsstellen zu betrachten
oder vorzuverarbeiten. Wir entdeckten bei der bestehenden Analyse
eine Monotonie-Eigenschaft, welche es uns erlaubt, Analyseergebnisse
von Programmteilen auf beliebige Verwendungsstellen zu ĂĽbertragen.
So lassen sich zum einen Programmteile vorverarbeiten und zum anderen
auch generelle Aussagen ĂĽber die Sicherheitseigenschaften von
Programmteilen treffen, ohne deren konkrete Verwendungsstellen zu
kennen. Wir definieren die Monotonie-Eigenschaft im Detail und skizzieren
einen Beweis fĂĽr deren Korrektheit. Darauf aufbauend entwickeln
wir eine Methode zur Vorverarbeitung von Programmteilen, die es uns
ermöglicht, modulare Abhängigkeitsgraphen zu erstellen. Diese Graphen
können zu einem späteren Zeitpunkt der jeweiligen Verwendungsstelle
angepasst werden. Da die präzise Erstellung eines modularen Abhängigkeitsgraphen
sehr aufwendig werden kann, entwickeln wir einen
Algorithmus basierend auf sogenannten Zugriffspfaden, der die Skalierbarkeit
verbessert. Zuletzt skizzieren wir einen Beweis, der zeigt, dass
dieser Algorithmus tatsächlich immer eine konservative Approximation
des modularen Graphen berechnet und deshalb die Ergebnisse darauf
aufbauender Sicherheitsanalysen weiterhin gĂĽltig sind.
Im dritten Teil präsentieren wir einige erfolgreiche Anwendungen
von Joana, die im Rahmen einer Kooperation mit Ralf KĂĽsters von der
Universität Trier entstanden sind. Hier erklären wir zum einen, wie
man unser Sicherheitswerkzeug Joana generell verwenden kann. Zum
anderen zeigen wir, wie in Kombination mit weiteren Werkzeugen und
Techniken kryptographische Sicherheit fĂĽr ein Programm garantiert
werden kann - eine Aufgabe, die bisher fĂĽr auf Informationsfluss basierende
Analysen nicht möglich war. In diesen Anwendungen wird
insbesondere deutlich, wie die im Rahmen dieser Arbeit vereinfachte
Bedienung die Verwendung von Joana erleichtert und unsere Verbesserungen
der Präzision des Ergebnisses die erfolgreiche Analyse erst
ermöglichen
Compiling Prolog to Logic-inference Virtual Machine
The Logic-inference Virtual Machine (LVM) is a new Prolog execution model
consisting of a set of high-level instructions and memory architecture for handling control
and unification. Different from the well-known Warren's Abstract Machine [1], which uses
Structure Copying method, the LVM adopts a hybrid of Program Sharing [2] and
Structure Copying to represent first-order terms. In addition, the LVM employs a single
stack paradigm for dynamic memory allocation and embeds a very efficient garbage
collection algorithm to reclaim the useless memory cells. In order to construct a complete
Prolog system based on the LVM, a corresponding compiler must be written.
In this thesis, a design of such LVM compiler is presented and all important
components of the compiler are described. The LVM compiler is developed to translate
Prolog programs into LVM bytecode instructions, so that a Prolog program is compiled
once and can run anywhere.
The first version of LVM compiler (about 8000 lines of C code) has been
developed. The compilation time is approximately proportional to the size of source
codes. About 80 percent of the time are spent on the global analysis. Some compiled
programs have been tested under a LVM emulator. Benchmarks show that the LVM
system is very promising in memory utilization and performance
Building CPU stubs to optimize CPU bound systems: An application of dynamic performance stubs.
Dynamic performance stubs provide a framework
for the simulation of the performance behavior of software
modules and functions. Hence, they can be used as an exten-
sion to software performance engineering methodologies. The
methodology of dynamic performance stubs can be used for a
gain oriented performance improvement. It is also possible to
identify “hidden” bottlenecks and to prioritize optimization
possibilities. Nowadays, the processing power of CPUs is
mainly increased by adding more cores to the architecture.
To have benefits from this, new software is mostly designed
for parallel processing, especially, in large software projects.
As software performance optimizations can be difficult in
these environments, new methodologies have to be defined.
This paper evaluates a possibility to simulate the functional
behavior of software algorithms by the use of the simulated
software functionality. These can be used by the dynamic
performance stub framework, e.g., to build a CPU stub, to
replace the algorithm. Thus, it describes a methodology as well
as an implementation and evaluates both in an industrial case
study. Moreover, it presents an extension to the CPU stubs by
applying these stubs to simulate multi-threaded applications.
The extension is evaluated by a case study as well. We
show show that the functionality of software algorithms can
be replaced by software simulation functions. This stubbing
approach can be used to create dynamic performance stubs,
such as CPU stubs. Additionally, we show that the concept of
CPU stubs can be applied to multi-threaded applications
Lazy Sequentialization for TSO and PSO via Shared Memory Abstractions
Lazy sequentialization is one of the most effective approaches for the bounded verification of concurrent programs. Existing tools assume sequential consistency (SC), thus the feasibility of lazy sequentializations for weak memory models (WMMs) remains untested. Here, we describe the first lazy sequentialization approach for the total store order (TSO) and partial store order (PSO) memory models. We replace all shared memory accesses with operations on a shared memory abstraction (SMA), an abstract data type that encapsulates the semantics of the underlying WMM and implements it under the simpler SC model. We give efficient SMA implementations for TSO and PSO that are based on temporal circular doubly-linked lists, a new data structure that allows an efficient simulation of the store buffers. We show experimentally, both on the SV-COMP concurrency benchmarks and a real world instance, that this approach works well in combination with lazy sequentialization on top of bounded model checking
- …