287 research outputs found
Performance Optimization Strategies for Transactional Memory Applications
This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications
OSCAR. A Noise Injection Framework for Testing Concurrent Software
“Moore’s Law” is a well-known observable phenomenon in computer science that describes a
visible yearly pattern in processor’s die increase. Even though it has held true for the last 57
years, thermal limitations on how much a processor’s core frequencies can be increased, have
led to physical limitations to their performance scaling. The industry has since then shifted
towards multicore architectures, which offer much better and scalable performance, while in
turn forcing programmers to adopt the concurrent programming paradigm when designing new
software, if they wish to make use of this added performance. The use of this paradigm comes
with the unfortunate downside of the sudden appearance of a plethora of additional errors in
their programs, stemming directly from their (poor) use of concurrency techniques.
Furthermore, these concurrent programs themselves are notoriously hard to design and to
verify their correctness, with researchers continuously developing new, more effective and effi-
cient methods of doing so. Noise injection, the theme of this dissertation, is one such method. It
relies on the “probe effect” — the observable shift in the behaviour of concurrent programs upon
the introduction of noise into their routines. The abandonment of ConTest, a popular proprietary
and closed-source noise injection framework, for testing concurrent software written using the
Java programming language, has left a void in the availability of noise injection frameworks for
this programming language.
To mitigate this void, this dissertation proposes OSCAR — a novel open-source noise injection
framework for the Java programming language, relying on static bytecode instrumentation for
injecting noise. OSCAR will provide a free and well-documented noise injection tool for research,
pedagogical and industry usage. Additionally, we propose a novel taxonomy for categorizing new
and existing noise injection heuristics, together with a new method for generating and analysing
concurrent software traces, based on string comparison metrics.
After noising programs from the IBM Concurrent Benchmark with different heuristics, we
observed that OSCAR is highly effective in increasing the coverage of the interleaving space, and
that the different heuristics provide diverse trade-offs on the cost and benefit (time/coverage) of
the noise injection process.Resumo
A “Lei de Moore” é um fenómeno, bem conhecido na área das ciências da computação, que
descreve um padrĂŁo evidente no aumento anual da densidade de transĂstores num processador.
Mesmo mantendo-se válido nos últimos 57 anos, o aumento do desempenho dos processadores
continua garrotado pelas limitações térmicas inerentes `a subida da sua frequência de funciona-
mento. Desde entĂŁo, a industria transitou para arquiteturas multi nĂşcleo, com significativamente
melhor e mais escalável desempenho, mas obrigando os programadores a adotar o paradigma
de programação concorrente ao desenhar os seus novos programas, para poderem aproveitar o
desempenho adicional que advém do seu uso. O uso deste paradigma, no entanto, traz consigo,
por consequência, a introdução de uma panóplia de novos erros nos programas, decorrentes
diretamente da utilização (inadequada) de técnicas de programação concorrente.
Adicionalmente, estes programas concorrentes sĂŁo conhecidos por serem consideravelmente
mais difĂceis de desenhar e de validar, quanto ao seu correto funcionamento, incentivando investi-
gadores ao desenvolvimento de novos métodos mais eficientes e eficazes de o fazerem. A injeção
de ruĂdo, o tema principal desta dissertação, Ă© um destes mĂ©todos. Esta baseia-se no “efeito sonda”
(do inglês “probe effect”) — caracterizado por uma mudança de comportamento observável em
programas concorrentes, ao terem ruĂdo introduzido nas suas rotinas. Com o abandono do Con-
Test, uma framework popular, proprietária e de código fechado, de análise dinâmica de programas
concorrentes atravĂ©s de injecção de ruĂdo, escritos com recurso `a linguagem de programação Java,
viu-se surgir um vazio na oferta de framework de injeção de ruĂdo, para esta mesma linguagem.
Para mitigar este vazio, esta dissertação propõe o OSCAR — uma nova framework de injeção de
ruĂdo, de cĂłdigo-aberto, para a linguagem de programação Java, que utiliza manipulação estática
de bytecode para realizar a introdução de ruĂdo. O OSCAR pretende oferecer uma ferramenta
livre e bem documentada de injeção de ruĂdo para fins de investigação, pedagĂłgicos ou atĂ© para
a indústria. Adicionalmente, a dissertação propõe uma nova taxonomia para categorizar os dife-
rentes tipos de heurĂsticas de injecção de ruĂdos novos e existentes, juntamente com um mĂ©todo
para gerar e analisar traces de programas concorrentes, com base em métricas de comparação de
strings.
ApĂłs inserir ruĂdo em programas do IBM Concurrent Benchmark, com diversas heurĂsticas, ob-
servámos que o OSCAR consegue aumentar significativamente a dimensĂŁo da cobertura do espaço de estados de programas concorrentes. Adicionalmente, verificou-se que diferentes heurĂsticas
produzem um leque variado de prós e contras, especialmente em termos de eficácia versus
eficiĂŞncia
Analyse des performances de stockage, en mémoire et sur les périphériques d'entrée/sortie, à partir d'une trace d'exécution
Le stockage des données est vital pour l’industrie informatique. Les supports de stockage doivent être rapides et fiables pour répondre aux demandes croissantes des entreprises. Les technologies de stockage peuvent être classifiées en deux catégories principales : stockage
de masse et stockage en mémoire. Le stockage de masse permet de sauvegarder une grande quantité de données à long terme. Les données sont enregistrées localement sur des périphériques d’entrée/sortie, comme les disques durs (HDD) et les Solid-State Drive (SSD), ou en ligne sur des systèmes de stockage distribué. Le stockage en mémoire permet de garder temporairement les données nécessaires pour les programmes en cours d’exécution. La mémoire vive est caractérisée par sa rapidité d’accès, indispensable pour fournir rapidement les données à l’unité de calcul du processeur. Les systèmes d’exploitation utilisent plusieurs mécanismes pour gérer les périphériques de stockage, par exemple les ordonnanceurs de disque et les allocateurs de mémoire. Le temps de traitement d’une requête de stockage est affecté par l’interaction entre plusieurs soussystèmes,
ce qui complique la tâche de débogage. Les outils existants, comme les outils d’étalonnage, permettent de donner une vague idée sur la performance globale du système, mais
ne permettent pas d’identifier précisément les causes d’une mauvaise performance. L’analyse dynamique par trace d’exécution est très utile pour l’étude de performance des systèmes. Le traçage permet de collecter des données précises sur le fonctionnement du système, ce qui
permet de détecter des problèmes de performance difficilement identifiables. L’objectif de cette thèse est de fournir un outil permettant d’analyser les performances de stockage, en mémoire et sur les périphériques d’entrée/sortie, en se basant sur les traces d’exécution. Les défis relevés par cet outil sont : collecter les données nécessaires à l’analyse depuis le noyau et les programmes en mode utilisateur, limiter le surcoût du traçage et la
taille des traces générées, synchroniser les différentes traces, fournir des analyses multiniveau couvrant plusieurs aspects de la performance et enfin proposer des abstractions permettant aux utilisateurs de facilement comprendre les traces.----------ABSTRACT: Data storage is an essential resource for the computer industry. Storage devices must be fast and reliable to meet the growing demands of the data-driven economy. Storage technologies can be classified into two main categories: mass storage and main memory storage. Mass storage can store large amounts of data persistently. Data is saved locally on input/output devices, such as Hard Disk Drives (HDD) and Solid-State Drives (SSD), or remotely on distributed storage systems. Main memory storage temporarily holds the necessary data for running programs. Main memory is characterized by its high access speed, essential to quickly provide data to the Central Processing Unit (CPU). Operating systems use several mechanisms to manage storage devices, such as disk schedulers and memory allocators. The processing time of a storage request is affected by the interaction between several subsystems, which complicates the debugging task. Existing tools, such as benchmarking tools, provide a general idea of the overall system performance, but do not accurately identify the causes of poor performance. Dynamic analysis through execution tracing is a solution for the detailed runtime analysis of storage systems. Tracing collects precise data about the internal behavior of the system, which helps in detecting performance problems that are difficult to identify. The goal of this thesis is to provide a tool to analyze storage performance based on lowlevel trace events. The main challenges addressed by this tool are: collecting the required data using kernel and userspace tracing, limiting the overhead of tracing and the size of the generated traces, synchronizing the traces collected from different sources, providing multi-level analyses covering several aspects of storage performance, and lastly proposing
abstractions allowing users to easily understand the traces.
We carefully designed and inserted the instrumentation needed for the analyses. The tracepoints provide full visibility into the system and track the lifecycle of storage requests, from creation to processing. The Linux Trace Toolkit Next Generation (LTTng), a free and
low-overhead tracer, is used for data collection. This tracer is characterized by its stability, and efficiency with highly parallel applications, thanks to the lock-free synchronization mechanisms used to update the content of the trace buffers. We also contributed to the creation
of a patch that allows LTTng to capture the call stacks of userspace events
Towards Meta-Level Engineering and Tooling for Complex Concurrent Systems
With the widespread use of multicore processors, software becomes more and more diverse in its use of parallel computing resources. To address all application requirements, each with the appropriate abstraction, developers mix and match various concurrency abstractions made available to them via libraries and frameworks. Unfortunately, today's tools such as debuggers and profilers do not support the diversity of these abstractions. Instead of enabling developers to reason about the high-level programming concepts, they used to express their programs, the tools work only on the library's implementation level. While this is a common problem also for other libraries and frameworks, the complexity of concurrency exacerbates the issue further, and reasoning on the higher levels of the concurrency abstractions is essential to manage the associated complexity. In this position paper, we identify open research issues and propose to build tools based on a common meta-level interface to enable developers to reasons about their programs based on the high-level concepts they used to implement them
System and Application Performance Analysis Patterns Using Software Tracing
Software systems have become increasingly complex, which makes it difficult to detect the root causes of performance degradation. Software tracing has been used extensively to analyze the system at run-time to detect performance issues and uncover the causes. There exist several studies that use tracing and other dynamic analysis techniques for performance analysis. These studies focus on specific system characteristics such as latency, performance bugs, etc. In this thesis, we review the literature to build a catalogue of performance analysis patterns that can be detected using trace data. The goal is to help developers debug run-time and performance issues more efficiently. The patterns are formalized and implemented so that they can be readily referred to by developers while analyzing large execution traces. The thesis focuses on the traces of system calls generated by the Linux kernel. This is because no application is an island and that we cannot ignore the complex interactions that an application has with the operating system kernel if we are to detect potential performance issues
An automated refactoring approach to improve IoT software quality
Internet of Things (IoT) software should provide good support for IoT devices as IoT devices are growing in quantity and complexity. Communication between IoT devices is largely realized in a concurrent way. How to ensure the correctness of concurrent access becomes a big challenge to IoT software development. This paper proposes a general refactoring framework for fine-grained read-write locking and implements an automatic refactoring tool to help developers convert built-in monitors into fine-grained ReentrantReadWriteLocks. Several program analysis techniques, such as visitor pattern analysis, alias analysis, and side-effect analysis, are used to assist with refactoring. Our tool is tested by several real-world applications including HSQLDB, Cassandra, JGroups, Freedomotic, and MINA. A total of 1072 built-in monitors are refactored into ReentrantReadWriteLocks. The experiments revealed that our tool can help developers with refactoring for ReentrantReadWriteLocks and save their time and energy.This research is supported by the Guangdong Province Key Research and Development Plan (2019B010137004), the National Key research and Development Plan (2018YEB1004003), the National Natural Science Foundation of China (U1636215,61871140,61872100), in part by the Scientific Research Foundation of Hebei Educational Department under Grant ZD2019093, in part by the Fundamental Research Foundation of Hebei Province under Grant 18960106D, and Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme (2019)
C++ Design Patterns for Low-latency Applications Including High-frequency Trading
This work aims to bridge the existing knowledge gap in the optimisation of
latency-critical code, specifically focusing on high-frequency trading (HFT)
systems. The research culminates in three main contributions: the creation of a
Low-Latency Programming Repository, the optimisation of a market-neutral
statistical arbitrage pairs trading strategy, and the implementation of the
Disruptor pattern in C++. The repository serves as a practical guide and is
enriched with rigorous statistical benchmarking, while the trading strategy
optimisation led to substantial improvements in speed and profitability. The
Disruptor pattern showcased significant performance enhancement over
traditional queuing methods. Evaluation metrics include speed, cache
utilisation, and statistical significance, among others. Techniques like Cache
Warming and Constexpr showed the most significant gains in latency reduction.
Future directions involve expanding the repository, testing the optimised
trading algorithm in a live trading environment, and integrating the Disruptor
pattern with the trading algorithm for comprehensive system benchmarking. The
work is oriented towards academics and industry practitioners seeking to
improve performance in latency-sensitive applications
- …