    Report from GI-Dagstuhl Seminar 16394: Software Performance Engineering in the DevOps World

    This report documents the program and the outcomes of GI-Dagstuhl Seminar 16394 "Software Performance Engineering in the DevOps World". The seminar addressed the problem of performance-aware DevOps. Both, DevOps and performance engineering have been growing trends over the past one to two years, in no small part due to the rise in importance of identifying performance anomalies in the operations (Ops) of cloud and big data systems and feeding these back to the development (Dev). However, so far, the research community has treated software engineering, performance engineering, and cloud computing mostly as individual research areas. We aimed to identify cross-community collaboration, and to set the path for long-lasting collaborations towards performance-aware DevOps. The main goal of the seminar was to bring together young researchers (PhD students in a later stage of their PhD, as well as PostDocs or Junior Professors) in the areas of (i) software engineering, (ii) performance engineering, and (iii) cloud computing and big data to present their current research projects, to exchange experience and expertise, to discuss research challenges, and to develop ideas for future collaborations

    Performance Problem Diagnostics by Systematic Experimentation

    Diagnostics of performance problems requires deep expertise in performance engineering and entails a high manual effort. As a consequence, performance evaluations are postponed to the last minute of the development process. In this thesis, we introduce an automatic, experiment-based approach for performance problem diagnostics in enterprise software systems. With this approach, performance engineers can concentrate on their core competences instead of conducting repeating tasks

    In this book, we introduce an automatic, experiment-based approach for performance problem diagnostics in enterprise software systems. The proposed approach systematically searches for root causes of detected performance problems by executing series of systematic performance tests. The presented approach is evaluated by various case studies showing that the presented approach is applicable to a wide range of contexts

    Software Batch Testing to Reduce Build Test Executions

    Testing is expensive and batching tests have the potential to reduce test costs. The continuous integration strategy of testing each commit or change individually helps to quickly identify faults but leads to a maximum number of test executions. Large companies that have a large number of commits, e.g. Google and Facebook, or have expensive test infrastructure, e.g. Ericsson, must batch changes together to reduce the number of total test runs. For example, if eight builds are batched together and there is no failure, then we have tested eight builds with one execution saving seven executions. However, when a failure occurs it is not immediately clear which build is the cause of the failure. A bisection is run to isolate the failing build, i.e. the culprit build. In our eight builds example, a failure will require an additional 6 executions, resulting in a saving of one execution. The goal of this work is to improve the efficiency of the batch testing. We evaluate six approaches. The first is the baseline approach that tests each build individually. The second, is the existing bisection approach. The third uses a batch size of four, which we show mathematically reduces the number of execution without requiring bisection. The fourth combines the two prior techniques introducing a stopping condition to the bisection. The final two approaches use models of build change risk to isolate risky changes and test them in smaller batches. We evaluate the approaches on nine open source projects that use Travis CI. Compared to the TestAll baseline, on average, the approaches reduce the number of build test executions across projects by 46%, 48%, 50%, 44%, and 49% for BatchBisect, Batch4, BatchStop4, RiskTopN, and RiskBatch, respectively. The greatest reduction is BatchStop4 at 50%. However, the simple approach of Batch4 does not require bisection and achieves a reduction of 48%. We recommend that all CI pipelines use a batch size of at least four. We release our scripts and data for replication. Regardless of the approach, on average, we save around half the build test executions compared to testing each change individually. We release the BatchBuilder tool that automatically batches submitted changes on GitHub for testing on Travis CI. Since the tool reports individual results for each pull-request or pushed commit, the batching happens in the background and the development process is unchanged

    An investigation into hazard-centric analysis of complex autonomous systems

    This thesis proposes a hypothesis that a conventional, and essentially manual, HAZOP process can be improved with information obtained with model-based dynamic simulation, using a Monte Carlo approach, to update a Bayesian Belief model representing the expected relations between cause and effects – and thereby produce an enhanced HAZOP. The work considers how the expertise of a hazard and operability study team might be augmented with access to behavioural models, simulations and belief inference models. This incorporates models of dynamically complex system behaviour, considering where these might contribute to the expertise of a hazard and operability study team, and how these might bolster trust in the portrayal of system behaviour. With a questionnaire containing behavioural outputs from a representative systems model, responses were collected from a group with relevant domain expertise. From this it is argued that the quality of analysis is dependent upon the experience and expertise of the participants but this might be artificially augmented using probabilistic data derived from a system dynamics model. Consequently, Monte Carlo simulations of an improved exemplar system dynamics model are used to condition a behavioural inference model and also to generate measures of emergence associated with the deviation parameter used in the study. A Bayesian approach towards probability is adopted where particular events and combinations of circumstances are effectively unique or hypothetical, and perhaps irreproducible in practice. Therefore, it is shown that a Bayesian model, representing beliefs expressed in a hazard and operability study, conditioned by the likely occurrence of flaw events causing specific deviant behaviour from evidence observed in the system dynamical behaviour, may combine intuitive estimates based upon experience and expertise, with quantitative statistical information representing plausible evidence of safety constraint violation. A further behavioural measure identifies potential emergent behaviour by way of a Lyapunov Exponent. Together these improvements enhance the awareness of potential hazard cases


    A systemic mock circulatory loop plays a pivotal role as the in vitro assessment tool for left heart medical devices. The standard design employed by many research groups dates to the early 1970\u27s, and lacks the acuity needed for the advanced device designs currently being explored. The necessity to update the architecture of this in vitro tool has become apparent as the historical design fails to deliver the performance needed to simulate conditions and events that have been clinically identified as challenges for future device designs. In order to appropriately deliver the testing solution needed, a comprehensive evaluation of the functionality demanded must be understood. The resulting system is a fully automated systemic mock circulatory loop, inclusive of anatomical geometries at critical flow sections, and accompanying software tools to execute precise investigations of cardiac device performance. Delivering this complete testing solution will be achieved through three research aims: (1) Utilization of advanced physical modeling tools to develop a high fidelity computational model of the in vitro system. This model will enable control design of the logic that will govern the in vitro actuators, allow experimental settings to be evaluated prior to execution in the mock circulatory loop, and determination of system settings that replicate clinical patient data. (2) Deployment of a fully automated mock circulatory loop that allows for runtime control of all the settings needed to appropriately construct the conditions of interest. It is essential that the system is able to change set point on the fly; simulation of cardiovascular dynamics and event sequences require this functionality. The robustness of an automated system with incorporated closed loop control logic yields a mock circulatory loop with excellent reproducibility, which is essential for effective device evaluation. (3) Incorporating anatomical geometry at the critical device interfaces; ascending aorta and left atrium. These anatomies represent complex shapes; the flows present in these sections are complex and greatly affect device performance. Increasing the fidelity of the local flow fields at these interfaces delivers a more accurate representation of the device performance in vivo

    Survey on Machine Learning Algorithms Enhancing the Functional Verification Process

    The continuing increase in functional requirements of modern hardware designs means the traditional functional verification process becomes inefficient in meeting the time-to-market goal with sufficient level of confidence in the design. Therefore, the need for enhancing the process is evident. Machine learning (ML) models proved to be valuable for automating major parts of the process, which have typically occupied the bandwidth of engineers; diverting them from adding new coverage metrics to make the designs more robust. Current research of deploying different (ML) models prove to be promising in areas such as stimulus constraining, test generation, coverage collection and bug detection and localization. An example of deploying artificial neural network (ANN) in test generation shows 24.5× speed up in functionally verifying a dual-core RISC processor specification. Another study demonstrates how k-means clustering can reduce redundancy of simulation trace dump of an AHB-to-WHISHBONE bridge by 21%, thus reducing the debugging effort by not having to inspect unnecessary waveforms. The surveyed work demonstrates a comprehensive overview of current (ML) models enhancing the functional verification process from which an insight of promising future research areas is inferred

    Formal Verification of a MESI-based Cache Implementation

    Cache coherency is crucial to multi-core systems with a shared memory programming model. Coherency protocols have been formally verified at the architectural level with relative ease. However, several subtle issues creep into the hardware realization of cache in a multi-processor environment. The assumption, made in the abstract model, that state transitions are atomic, is invalid for the HDL implementation. Each transition is composed of many concurrent multi-core operations. As a result, even with a blocking bus, several transient states come into existence. Most modern processors optimize communication with a split-transaction bus, this results in further transient states and race conditions. Therefore, the design and verification of cache coherency is increasingly complex and challenging. Simulation techniques are insufficient to ensure memory consistency and the absence of deadlock, livelock, and starvation. At best, it is tediously complex and time consuming to reach confidence in functionality with simulation. Formal methods are ideally suited to identify the numerous race conditions and subtle failures. In this study, we perform formal property verification on the RTL of a multi-core level-1 cache design based on snooping MESI protocol. We demonstrate full-proof verification of the coherence module in JasperGold using complexity reduction techniques through parameterization. We verify that the assumptions needed to constrain inputs of the stand-alone cache coherence module are satisfied as valid assertions in the instantiation environment. We compare results obtained from formal property verification against a state-of-the-art UVM environment. We highlight the benefits of a synergistic collaboration between simulation and formal techniques. We present formal analysis as a generic toolkit with numerous usage models in the digital design process

    Automated root cause isolation in performance regression testing

    Testing of software is an important aspect of software development. There exist multiple kinds of tests, like unit tests and integration tests. The tests this thesis will focus on will be load tests, which are used to observe a system’s behavior under load. The presented approach will use these load tests in order to observe and analyze the performance of a system, like e.g. the response times of methods. Next these observations are compared with those made on other versions of the system, in order to detect performance regressions, deteriorations in performance, between versions. Another goal of the approach will be to identify the root cause of the regressions, which is the source code change responsible for introducing them. By doing this, the task of fixing this problem will be made easier for the software engineer, since he has an entry point for the problem.Das Testen von Software ist ein wichtiger Bestandteil der Software-Entwicklung. Es existieren viele Arten von Tests, wie Unit-Tests und Integrationstests. Die Tests, auf welche sich diese Thesis fokussiert, sind Lasttests. Diese werden genutzt um zu beobachten, wie ein System sich unter Belastung verhält. Der vorgestellte Ansatz wird diese Lasttests nutzen, um das Betriebsverhalten eines Systems zu erfassen und analysieren, wie z.B. das Antwortzeitverhalten von einzelnen Methoden. Als Nächstes werden diese Beobachtungen mit denen verglichen, die auf anderen Versionen des Systems gemacht wurden, um Regressionen im Betriebsverhalten, wie Verschlechterungen des Antwortzeitverhaltens, zwischen den Versionen zu finden. Ein weiteres Ziel des Ansatzes wird es sein, die Hauptursache einer Regression zu identifizieren, welches die Quellcodeänderung ist, die für die Einführung der Regression verantwortlich ist. Dies wird es dem Software-Entwickler, der beauftragt wurde die Regression zu verbessern, einfacher machen dies zu tun, da er bereits einen festen Ansatzpunkt geliefert bekommen hat
