62 research outputs found

    Predicting unstable software benchmarks using static source code features

    Full text link
    Software benchmarks are only as good as the performance measurements they yield. Unstable benchmarks show high variability among repeated measurements, which causes uncertainty about the actual performance and complicates reliable change assessment. However, if a benchmark is stable or unstable only becomes evident after it has been executed and its results are available. In this paper, we introduce a machine-learning-based approach to predict a benchmark’s stability without having to execute it. Our approach relies on 58 statically-computed source code features, extracted for benchmark code and code called by a benchmark, related to (1) meta information, e.g., lines of code (LOC), (2) programming language elements, e.g., conditionals or loops, and (3) potentially performance-impacting standard library calls, e.g., file and network input/output (I/O). To assess our approach’s effectiveness, we perform a large-scale experiment on 4,461 Go benchmarks coming from 230 open-source software (OSS) projects. First, we assess the prediction performance of our machine learning models using 11 binary classification algorithms. We find that Random Forest performs best with good prediction performance from 0.79 to 0.90, and 0.43 to 0.68, in terms of AUC and MCC, respectively. Second, we perform feature importance analyses for individual features and feature categories. We find that 7 features related to meta-information, slice usage, nested loops, and synchronization application programming interfaces (APIs) are individually important for good predictions; and that the combination of all features of the called source code is paramount for our model, while the combination of features of the benchmark itself is less important. Our results show that although benchmark stability is affected by more than just the source code, we can effectively utilize machine learning models to predict whether a benchmark will be stable or not ahead of execution. This enables spending precious testing time on reliable benchmarks, supporting developers to identify unstable benchmarks during development, allowing unstable benchmarks to be repeated more often, estimating stability in scenarios where repeated benchmark execution is infeasible or impossible, and warning developers if new benchmarks or existing benchmarks executed in new environments will be unstable

    Multi-Objective Search-Based Software Microbenchmark Prioritization

    Full text link
    Ensuring that software performance does not degrade after a code change is paramount. A potential solution, particularly for libraries and frameworks, is regularly executing software microbenchmarks, a performance testing technique similar to (functional) unit tests. This often becomes infeasible due to the extensive runtimes of microbenchmark suites, however. To address that challenge, research has investigated regression testing techniques, such as test case prioritization (TCP), which reorder the execution within a microbenchmark suite to detect larger performance changes sooner. Such techniques are either designed for unit tests and perform sub-par on microbenchmarks or require complex performance models, reducing their potential application drastically. In this paper, we propose a search-based technique based on multi-objective evolutionary algorithms (MOEAs) to improve the current state of microbenchmark prioritization. The technique utilizes three objectives, i.e., coverage to maximize, coverage overlap to minimize, and historical performance change detection to maximize. We find that our technique improves over the best coverage-based, greedy baselines in terms of average percentage of fault-detection on performance (APFD-P) and Top-3 effectiveness by 26 percentage points (pp) and 43 pp (for Additional) and 17 pp and 32 pp (for Total) to 0.77 and 0.24, respectively. Employing the Indicator-Based Evolutionary Algorithm (IBEA) as MOEA leads to the best effectiveness among six MOEAs. Finally, the technique's runtime overhead is acceptable at 19% of the overall benchmark suite runtime, if we consider the enormous runtimes often spanning multiple hours. The added overhead compared to the greedy baselines is miniscule at 1%.These results mark a step forward for universally applicable performance regression testing techniques.Comment: 17 pages, 5 figure

    Software Microbenchmarking in the Cloud. How Bad is it Really?

    Get PDF
    Rigorous performance engineering traditionally assumes measuring on bare-metal environments to control for as many confounding factors as possible. Unfortunately, some researchers and practitioners might not have access, knowledge, or funds to operate dedicated performance-testing hardware, making public clouds an attractive alternative. However, shared public cloud environments are inherently unpredictable in terms of the system performance they provide. In this study, we explore the effects of cloud environments on the variability of performance test results and to what extent slowdowns can still be reliably detected even in a public cloud. We focus on software microbenchmarks as an example of performance tests and execute extensive experiments on three different well-known public cloud services (AWS, GCE, and Azure) using three different cloud instance types per service. We also compare the results to a hosted bare-metal offering from IBM Bluemix. In total, we gathered more than 4.5 million unique microbenchmarking data points from benchmarks written in Java and Go. We find that the variability of results differs substantially between benchmarks and instance types (by a coefficient of variation from 0.03% to > 100%). However, executing test and control experiments on the same instances (in randomized order) allows us to detect slowdowns of 10% or less with high confidence, using state-of-the-art statistical tests (i.e., Wilcoxon rank-sum and overlapping bootstrapped confidence intervals). Finally, our results indicate that Wilcoxon rank-sum manages to detect smaller slowdowns in cloud environments

    Spanische Staatspleiten, genuesische Finanziers, holländische Konkurrenz

    Get PDF
    Ziel der Arbeit ist es, die Verbindungen zwischen der genuesischen Geschäftswelt mit dem spanischen Markt zu analysieren. Genua hatte als wichtigster Kreditgeber der spanischen Krone und Distributor von spanischen Silber eine potente Rolle inne. Die Vereinigten Provinzen der Niederlande strebten zu Beginn des 17. Jahrhunderts zu einer wirtschaftlichen Großmacht auf. Die wirtschaftlichen und politischen Umwälzungen hatten hierbei einen hohen Einfluss auf das System Genuas, welches sich durch eine fragile Politik und hohe Wirtschaftskraft auszeichnet. Exogene und endogene Faktoren führten zu einer Verdrängung vom spanischen Kreditmarkt. Klimatische Faktoren, welche zum Teil hausgemacht waren, führten zu einer Nahrungsmittelkrise, welche das Vordringen der Vereinigten Provinzen in den Markt der Méditerrané zur Folge hatte. Die Auslagerung des Schiffsverkehrs, aber auch der militärischen Kompetenz Genuas, wirkte sich auf den wirtschaftlichen Niedergang Genuas katalysatorisch aus. Eine Entwicklung, welche das genuesische mit dem spanischen Schicksal eint. Innenpolitisch angestrebte Reformen konnten nur geringfügig umgesetzt werden. Resultat war eine wirtschaftliche und politisch Krise in Genua und Spanien

    Applying test case prioritization to software microbenchmarks

    Full text link
    Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative

    Applying test case prioritization to software microbenchmarks

    Get PDF
    Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative

    Data-Driven Decisions and Actions in Today’s Software Development

    Full text link
    Today’s software development is all about data: data about the software product itself, about the process and its different stages, about the customers and markets, about the development, the testing, the integration, the deployment, or the runtime aspects in the cloud. We use static and dynamic data of various kinds and quantities to analyze market feedback, feature impact, code quality, architectural design alternatives, or effects of performance optimizations. Development environments are no longer limited to IDEs in a desktop application or the like but span the Internet using live programming environments such as Cloud9 or large-volume repositories such as BitBucket, GitHub, GitLab, or StackOverflow. Software development has become “live” in the cloud, be it the coding, the testing, or the experimentation with different product options on the Internet. The inherent complexity puts a further burden on developers, since they need to stay alert when constantly switching between tasks in different phases. Research has been analyzing the development process, its data and stakeholders, for decades and is working on various tools that can help developers in their daily tasks to improve the quality of their work and their productivity. In this chapter, we critically reflect on the challenges faced by developers in a typical release cycle, identify inherent problems of the individual phases, and present the current state of the research that can help overcome these issues

    What is White Arabic? New labels in a changing Arab world

    Get PDF
    El árabe ha sido descrito tradicionalmente como uno de los ejemplos canónicos de lenguas afectadas por el fenómeno de la diglosia (Ferguson, 1959), con el árabe estándar actuando como variedad alta y las variedades vernáculas habladas como bajas. Sin embargo, investigaciones más recientes han demostrado que la situación lingüística actual de los países arabófonos no refleja esta dicotomía, sino, más bien, un continuo estratificado en el que diferentes variedades –y a veces lenguas– interactúan cumpliendo diferentes funciones comunicativas y portando múltiples valores simbólicos. En este mar de variedades, la etiqueta metalingüística «White Arabic» (‘árabe blanco ’, a partir de ahora WA) ha ganado importancia en la última década, coincidiendo con el aumento de interconexión en el mundo árabe. Aunque la noción de WA ha sido tratada de forma tangencial en investigaciones previas (Al-Rojaie, 2020; Dufour, 2008; Germanos, 2009; ONeill, 2017), ninguna de ellas trata la cuestión como objeto principal de estudio y no parece haber un consenso claro en la definición del término. De hecho, los datos apuntan hacia diferentes formas de entender el concepto en Líbano, Jordania, Emiratos Árabes Unidos, Arabia Saudí, Yemen, Egipto, Túnez, Argelia y Marruecos. Por tanto, el objetivo de este estudio consiste en explorar cómo entienden y perciben los hablantes esta noción. Para ello, se ha llevado a cabo un análisis metalingüístico de entrevistas, intervenciones y comentarios realizados por hablantes nativos en medios de comunicación tradicional (periódicos y revistas) y en línea (podcasts, blogs, vídeos, etc.) y en plataformas de redes sociales (facebook, youtube, twitter, etc.). Estos datos se han complementado con los resultados obtenidos mediante el análisis de cuestionarios cualitativos distribuidos en línea entre hablantes de cinco de los países árabes mencionados anteriormente

    Usability of polymer film heat exchangers in the chemical industry

    Get PDF
    The main goal of this work was the study of the applicability of a polymer film heat exchanger concept for the applications in the chemical industry, such as the condensation of organic solvents. The polymer film heat exchanger investigated is a plate heat exchanger with very thin (0.025 – 0.1 mm) plates or films, which separate the fluids and enable the heat transfer. After a successful application of this concept to seawater desalination in a previous work, a further step is in chemical engineering, where the good chemical resistance of polymers in aggressive fluids is the challenge. Two approaches were performed in this work. The first one was experimental and included the study of the chemical and mechanical resistance of preselected films, made of polymer materials, such as polyimide (PI), polyethylene terephthalate (PET) and polytetrafluoroethylene (PTFE). To simulate realistic operating conditions in a heat exchanger the films were exposed to a combined thermal (up to 90°C) and mechanical pressure loads (4-6 bar) with permanent contact with the relevant organic solvents, such as toluene, hexane, heptane and tetrahydrofuran (THF). Furthermore, a lab-scale apparatus and a full-scale demonstrator were manufactured in cooperation with two industrial partners. These were used for the investigation of the heat transfer performance for operating modes with and without phase change. In addition to the experimental work, a coupled finite element –computational fluid dynamics (FEM-CFD)-model was developed, based on the fluid-structure-interaction (FSI). Two major tasks had to be solved here. The first one was the modelling of the condensation process, based on available mathematical models and energy balances. The second one was the consideration of the partially reversible deformation of the used film during operation. Since this deformation changes the geometry of the fluid channels also has an influence on the overall performance of the apparatus, a coupled FEM-CFD model was developed. During the experimental study of the chemical resistance of the films, the PTFE film showed the best performance, and hence can be used for all four tested solvents. For the polyimide film, failures while exposed to THF were observed, and the PET film can only be used with water and hexane. With the used lab-scale heat exchanger and the full-scale demonstrator competitive overall heat transfer coefficients between 270 W/m²K and 700 W/m²K could be reached for the liquid-liquid (water-water, water-hexane) operation mode without phase change. For the condensation process, overall heat transfer coefficients of up to 1700/m²K could be obtained. The numerical approach led to a well-functioning coupled model in a very small scale (1 cm²). An upscale, however, failed due to enormous hardware resources necessary required for the simulation of the entire full-scale demonstrator. The main reason for this is the very low thickness of the films, which leads to tiny mesh element sizes (<0.05 mm) necessary to model the deformation of the film. The modelling of the liquid-liquid heat transfer provided an acceptable accuracy (approx. 10%), but at very low rates the deviations were then higher (over 30%). The results of the condensation modelling were ambivalent. One the one hand a physically plausible model was developed, which could map the entire condensation process. On the other hand, the corresponding energy balance revealed major inaccuracy and hence could not be used for the determination of the overall heat transfer and showed the current limits of the FEM-CFD approach.Das Ziel dieser Arbeit war die Untersuchung der Anwendbarkeit von auf Polymerfolien basierten Wärmeübertragern in der chemischen Industrie, für zum Beispiel solche Aufgaben wie die Kondensation organsicher Lösungsmittel. Ein solcher Wärmeübertrager ist im Grunde wie ein Plattenwärmeübertrager aufgebaut, statt starrer Platten kamen hier allerdings sehr dünne (0.025 – 0.1 mm) Kunststofffolien zum Einsatz, die die beiden Fluide voneinander trennen und den Wärmeübergang ermöglichen. Nach einer erfolgreichen Anwendung im Bereich der Meerwasserentsalzung, wurde in dieser Folgearbeit versucht den Anwendungsbereich auf die chemische Industrie zu erweitern, wo die teils sehr gute chemische Beständigkeit von Polymeren vor allem bei aggressiven Medien Vorteile bieten kann. Das gestellte Ziel der Arbeit wurde auf zwei Ebenen verfolgt. Zum einen experimentell, was die Untersuchungen der chemischen und der mechanischen Beständigkeit der vorausgewählten Folien aus Polyimid (PI), Polyethylenterephthalat (PET) und Polytetrafluorethylen (PTFE) einschloss. Um möglichst realistische Betriebsbedingungen im späteren Einsatz abzubilden, wurden die Folien einer kombinierten thermischen (bis 90°C) und mechanischen Druckbelastung (4-6 bar) bei ständigem Kontakt mit einem organischen Lösungsmittel ausgesetzt. Verwendet wurden hierfür Toluol, Hexan, Heptan und Tetrahydrofuran (THF). Darüber hinaus wurden ein kleinerer Wärmeübertrager im Labormaßstab und ein Demonstrator in anwendungsrelevanter Größe in Zusammenarbeit mit zwei Industriepartnern hergestellt. Mit Hilfe dieser Apparate wurde der Wärmeübergang experimentell untersucht, und zwar sowohl für mit und ohne Phasenwechsel (mit und ohne Kondensation). Zusätzlich zu den experimentellen Arbeiten, wurde ein gekoppeltes Modell aus FEM (Finite Element Methode) und CFD (Computational Fluid Dynamics) entwickelt, das auf dem Prinzip der Fluid-Struktur-Interaktion (FSI) basiert, das die Durchströmung und den Wärmeübergang in den Apparaten beschreibt. Hierzu mussten zwei Aufgaben gelöst werden. Zum einen wurde auf der Basis mathematischer Modelle und Energiegleichungen der Kondesationsprozess abgebildet. Zum anderen wurde die Verformung der Folien im Betrieb und dessen Auswirkungen auf die Leistungsfähigkeit der Wärmeübertragers betrachtet, was insgesamt zum kombinierten FEM-CFD Modell führt. Bei den experimentellen Untersuchungen der chemischen Beständigkeit zeigte die PTFE-Folie die besten Eigenschaften und kann uneingeschränkt für alle vier Lösungsmittel verwendet werden. Die Polyimidfolie erreichte ähnliche Ergebnisse, versagte allerdings bei THF. Die PET-Folie zeigte nur bei Wasser und Hexan eine ausreichende Beständigkeit. Mit den untersuchten Apparaten konnten zu gängigen metallischen Wärmeübertragern konkurrenzfähige Gesamtwärmeübergangszahlen zwischen 270 W/m²K und 700W/m²K für den Flüssig-flüssig-Betrieb erreicht werden und zwar sowohl mit nur Wasser als auch mit der Paarung Hexan-Wasser. Bei der Kondensation konnten im Demonstrator bis zu 1700W/m²K erreicht werden. Die numerischen Untersuchungen ergaben ein im kleinen Maßstab (1 cm²) funktionierendes gekoppeltes Modell. Es scheiterte die Maßstabsvergrößerung an den enormen Ressourcenanforderungen der Software, die für die Anwendung auf den finalen Demonstrator benötigt wurden. Dies lag hauptsächlich daran, dass die Folien sehr dünn waren und dadurch winzig kleine Netzgrößen (<0.05mm) gewählt werden mussten, um die Verformung realistisch abzubilden. Die Modellierung des Flüssig-flüssig-Wärmeübergangs konnte mit einer akzeptablen Abweichung von rund 10% für höhere Volumenströme umgesetzt werden. Bei kleineren Volumenströmen allerdings nahmen die Abweichungen auf bis zum 30% zu. Die Ergebnisse der Kondensationsmodellierung zeigten die Limits des Simulationsansatzes. Es wurde ein physikalisch plausibles Modell erstellt, das den Kondensationsprozess an sich abbildete. Allerdings wurde die entsprechende Energiebilanz beziehungsweise der Wärmeübergang nicht sinnvoll berechnet, jedoch konnten die Strömungsverhältnisse sehr gut abgebildet werden
    corecore