246 research outputs found

    An Annotation-based Approach for Finding Bugs in Neural Network Programs

    Full text link
    As neural networks are increasingly included as core components of safety-critical systems, developing effective testing techniques specialized for them becomes crucial. The bulk of the research has focused on testing neural-network models; but these models are defined by writing programs, and there is growing evidence that these neural-network programs often have bugs too. This paper presents aNNoTest: an approach to generating test inputs for neural-network programs. A fundamental challenge is that the dynamically-typed languages (e.g., Python) commonly used to program neural networks cannot express detailed constraints about valid function inputs (e.g., matrices with certain dimensions). Without knowing these constraints, automated test-case generation is prone to producing invalid inputs, which trigger spurious failures and are useless for identifying real bugs. To address this problem, we introduce a simple annotation language tailored for concisely expressing valid function inputs in neural-network programs. aNNoTest takes as input an annotated program, and uses property-based testing to generate random inputs that satisfy the validity constraints. In the paper, we also outline guidelines that simplify writing aNNoTest annotations. We evaluated aNNoTest on 19 neural-network programs from Islam et al's survey., which we manually annotated following our guidelines -- producing 6 annotations per tested function on average. aNNoTest automatically generated test inputs that revealed 94 bugs, including 63 bugs that the survey reported for these projects. These results suggest that aNNoTest can be a valuable approach to finding widespread bugs in real-world neural-network programs.Comment: New content: (1) discussing test oracles in the related work section, (2) adding future work, and (3) discussing the possibility of using other back-end

    Integrated modelling of control and adaptive building envelope: development of a modelling solution using a co-simulation approach

    Get PDF
    Adaptive building envelopes can dynamically adapt to environmental changes, often supported by a control system. Although adaptive building envelopes can play a significant role in improving thermal building performance, uncertainties and risks have led to a slow uptake in the built environment. A reason for this is the reluctance of practitioners to consider integrating adaptive building envelopes in building design. This may be due to Building Performance Simulation (BPS) tools that can be employed for performance prediction of design proposals with adaptive building envelopes. However, a shortcoming of existing tools is their limited adaptation that hinders proper modelling of the influence of control decisions on the dynamic behaviour of these building envelopes. This thesis investigates an approach for the integrated modelling of control and adaptive building envelope. To this aim, an interview-based industry study with experts in adaptive building envelope simulation was conducted. The interview study aimed to advance the understanding of the limitations of adaptive building envelope simulation in current design practice and to identify implications for future tool developments. The feedback from the interviewees was then used to inform the development of an integrated modelling approach using co-simulation, the accuracy and functionality of which were subsequently tested through a validation study and a multiple case study. The findings of the interview study outline the need for more flexible modelling approaches that enable designers to fully exploit adaptive building envelopes in building design. The proposed modelling approach for predicting the thermal performance of adaptive building envelopes has shown that its co-simulation setup seems to offer more flexibility in integrating the dynamic behaviour of adaptive building envelopes. What is now needed is to observe the execution of the modelling approach in design practice to obtain realistic feedback from its users and to verify that it works as intended

    Mimicking Production Behavior with Generated Mocks

    Full text link
    Mocking in the context of automated software tests allows testing program units in isolation. Designing realistic interactions between a unit and its environment, and understanding the expected impact of these interactions on the behavior of the unit, are two key challenges that software testers face when developing tests with mocks. In this paper, we propose to monitor an application in production to generate tests that mimic realistic execution scenarios through mocks. Our approach operates in three phases. First, we instrument a set of target methods for which we want to generate tests, as well as the methods that they invoke, which we refer to mockable method calls. Second, in production, we collect data about the context in which target methods are invoked, as well as the parameters and the returned value for each mockable method call. Third, offline, we analyze the production data to generate test cases with realistic inputs and mock interactions. The approach is automated and implemented in an open-source tool called RICK. We evaluate our approach with three real-world, open-source Java applications. RICK monitors the invocation of 128 methods in production across the three applications and captures their behavior. Next, RICK analyzes the production observations in order to generate test cases that include rich initial states and test inputs, mocks and stubs that recreate actual interactions between the method and its environment, as well as mock-based oracles. All the test cases are executable, and 52.4% of them successfully mimic the complete execution context of the target methods observed in production. We interview 5 developers from the industry who confirm the relevance of using production observations to design mocks and stubs

    Tools for detection and analysis of flaky software tests

    Get PDF
    Abstract. Software testing is an essential part of developing a high-quality product. Test cases that pass or fail in a nondeterministic manner can cause severe problems and be difficult to fix. Unstable test cases cause increased resource usage on many different levels, as well as delays for the project they appear on. Increased resource usage might also cause delays over the project boundaries. There are multiple different reasons why tests might become unstable. A common one is the order dependency between test cases, which can also make the addition of new test cases more difficult. At the start of this work, the test cases of an under development version of eCPRI module were rather unstable. To improve the stability level, this work introduces three tools designed for test cases based on CppUTest framework with multiple use cases to help the developers to tackle the issue of unstable test cases occurring due to order dependencies. The tools are aimed to investigate different types of order dependencies that could occur between test cases. The main methods the tools are utilizing are test repeating and shuffling the testing order. In some cases, the tools were able to increase the reproducibility of the occurring failures, which is very important when solving the root cause for the failures. The tools also provide an easy way to perform an extensive, automated test running which offers a chance for the user to be absent while the logs are gathered from multiple test runs, which can be helpful when investigating random failures. Combined with the principle of binary search on reducing the executed test cases, the usage of these tools was useful when pinpointing the tests and reasons for failures. The stability level of the eCPRI module’s test cases was improved with the methods and tools presented in this thesis. The stability level of shuffled test runs was brought up from 0 % to 70 % and the stability level of normal test order was brought from 82.5 % to 100 % for one of the compilers used.Työkaluja epästabiilien ohjelmistotestien tunnistamiseen ja analysointiin. Tiivistelmä. Ohjelmistotestaus on välttämätön osa korkealaatuisen tuotteen kehitystä. Testit, joiden lopputulos vaihtelee epämääräisesti, voivat aiheuttaa vaikeita ongelmia ja ne voivat olla hankalia korjattavia. Epästabiilit testit kasvattavat resurssien käyttöä useilla eri tavoilla, minkä lisäksi ne aiheuttavat viivästyksiä projektin etenemiseen. Kasvanut resurssien käyttöaste voi myös aiheuttaa viivästyksiä toisiin meneillä oleviin projekteihin. On useita syitä miksi testistä voi tulla epästabiili. Yksi yleisimmistä syistä on testien riippuvuus niiden suorittamisjärjestyksestä, joka voi myös hankaloittaa uusien testien lisäämistä. Työn alkaessa kehitteillä olevan eCPRI moduulin testit olivat suhteellisen epästabiileja. Osittaisena ratkaisuna tähän ongelmaan tässä työssä esitellään kolme työkalua, jotka on suunniteltu helpottamaan eCPRI moduulin CppUTest sovelluskehystä hyödyntävien testien stabiiliuden parantamista. Työkalut keskittyvät erityyppisten testien välillä olevien järjestysriippuvuuksien tunnistamiseen ja paikantamiseen. Tärkeimmät metodit, joita työkalut käyttävät, ovat testien toistaminen ja testijärjestyksen sekoittaminen. Työkaluilla onnistuttiin parantamaan joidenkin ilmenneiden suoritusvirheiden toistettavuutta, joka on äärimmäisen tärkeää paikannettaessa virheen aiheuttajaa. Työkalut myös tarjoavat helpon tavan automatisoidun laajamittaisen testauksen suorittamiseen, mikä mahdollistaa useiden testiajojen logien keräämisen ilman tarvetta jatkuvalle käyttäjän läsnäololle. Tämä helpottaa tärkeiden tietojen keräämistä ja kokoamista harvoin tapahtuvista testitai suoritusvirheistä. Näiden työkalujen käyttö, yhdistettynä puolitushaun periaatteeseen ennen virhettä suoritettujen testien määrän vähentämisessä, oli hyödyllistä virheiden aiheuttavien testien ja syiden paikantamisessa. Tässä työssä esitettyjen työkalujen ja metodien käyttämisellä onnistuttiin parantamaan eCPRI moduulin testien stabiilisuutta. Sekoitettujen testiajojen stabiilisuusaste oli työn alkaessa 0 % ja se onnistuttiin nostamaan 70 %:iin. Tavallisen testijärjestyksen stabiilisuusaste oli työn alussa 82.5 % ja se onnistuttiin nostamaan 100 %:iin

    Understanding and Mitigating Flaky Software Test Cases

    Get PDF
    A flaky test is a test case that can pass or fail without changes to the test case code or the code under test. They are a wide-spread problem with serious consequences for developers and researchers alike. For developers, flaky tests lead to time wasted debugging spurious failures, tempting them to ignore future failures. While unreliable, flaky tests can still indicate genuine issues in the code under test, so ignoring them can lead to bugs being missed. The non-deterministic behaviour of flaky tests is also a major snag to continuous integration, where a single flaky test can fail an entire build. For researchers, flaky tests challenge the assumption that a test failure implies a bug, an assumption that many fundamental techniques in software engineering research rely upon, including test acceleration, mutation testing, and fault localisation. Despite increasing research interest in the topic, open problems remain. In particular, there has been relatively little attention paid to the views and experiences of developers, despite a considerable body of empirical work. This is essential to guide the focus of research into areas that are most likely to be beneficial to the software engineering industry. Furthermore, previous automated techniques for detecting flaky tests are typically either based on exhaustively rerunning test cases or machine learning classifiers. The prohibitive runtime of the rerunning approach and the demonstrably poor inter-project generalisability of classifiers leaves practitioners with a stark choice when it comes to automatically detecting flaky tests. In response to these challenges, I set two high-level goals for this thesis: (1) to enhance the understanding of the manifestation, causes, and impacts of flaky tests; and (2) to develop and empirically evaluate efficient automated techniques for mitigating flaky tests. In pursuit of these goals, this thesis makes five contributions: (1) a comprehensive systematic literature review of 76 published papers; (2) a literature-guided survey of 170 professional software developers; (3) a new feature set for encoding test cases in machine learning-based flaky test detection; (4) a novel approach for reducing the time cost of rerunning-based techniques for detecting flaky tests by combining them with machine learning classifiers; and (5) an automated technique that detects and classifies existing flaky tests in a project and produces reusable project-specific machine learning classifiers able to provide fast and accurate predictions for future test cases in that project

    Test Quality Assurance for E2E Web Test Suites: Parallelization of Dependent Test Suites and Test Flakiness Prevention

    Get PDF
    Web applications support a wide range of activities today, from e-commerce to health management, and ensuring their quality is a fundamental task. Nevertheless, testing these systems is hard because of their dynamic and asynchronous nature and their heterogeneity. Quality assurance of Web applications is usually performed through testing, performed at different levels of abstraction. At the End-to-end (E2E) level, test scripts interact with the application through the web browser, as a human user would do. This kind of testing is usually time consuming, and its execution time can be reduced by running the test suite in parallel. However, the presence of dependen- cies in the test suite can make test parallelization difficult. Best practices prescribe that test scripts in a test suite should be independent (i.e. they should not assume that the system under test is already in an expected state), but this is not always done in practice: dependent tests are a serious problem that affects end-to-end web test suites. Moreover, test dependencies are a problem because they enforce an execution order for the test suite, preventing the use of techniques like test selection, test prioritization, and test parallelization. Another issue that affects E2E Web test suites is test flakiness: a test script is called flaky when it may non-deterministically pass or fail on the same version of the Ap- plication Under Test. Test flakiness is usually caused by multiple factors, that can be very hard to determine: most common causes of flakiness are improper waiting for async operations, not respected test order dependencies and concurrency problems (e.g. race conditions, deadlocks, atomicity violations). Test flakiness is a problem that affects E2E test execution in general, but it can have a greater impact in presence of dependencies, since 1) if a test script fails due to flakiness, other test scripts that depend on it will probably fail as well, 2) most dependency-detection approaches and tools rely on multiple executions of test schedules in different orders to detect dependencies. In order to do that, execution results must be deterministic: if test scripts can pass or fail non-deterministically, those dependency detection tools can not work. This thesis proposes to improve the quality assurance for E2E Web test suites in two different directions: 1. enabling the parallel execution of dependent E2E Web test suites in a opti- mized, efficient way 2. preventing flakiness by automated refactoring of E2E Web test suites, in order to adopt the proper waiting strategies for page elements For the first research direction we propose STILE (teST suIte paralLElizer), a tool- based approach that allows parallel execution of E2E Web test suites. Our approach generates a set of test schedules that respect two important constraints: 1) every schedule respects existing test dependencies, 2) all test scripts in the test suite are executed at least once, considering all the generated schedules. For the second research direction we propose SleepReplacer, a tool-based approach to automatically refactor E2E Web test suites in order to prevent flakiness. Both of the tool-based approaches has been fully implemented in two functioning and publicly available tools, and empirically validated on different test suites

    Javan yksikkö- ja integraatiotestaus: JVM:n käyttäytymisvetoiset testaustyökalut vastaan JUnit

    Get PDF
    This master’s thesis studied how do Behavior-Driven Development testing frameworks change the testing of Java-code compared to JUnit. The research was done as a case study. The case study was conducted in industry context at Vincit Plc, were two projects changed new unit and integration tests classes to use a new BDD-testing framework instead of JUnit. Before designing the study methods, related research and their findings were reviewed to guide the study to inspect problematic areas found in unit testing. Case study data collection methods included surveys, interviews and test code analysis. Case study provided promising results for problematic areas highlighted by earlier research. To summarize the developer practice changes, the collected data displayed an increase in unit test case granularity. Results also displayed unanimously that BDD-testing frameworks guide to write more self-documenting tests than JUnit. The structure of BDD tests highlighted better the different parts of the test. Study also revealed that the majority of participants had easier time understanding tests and removing repetition from test code. Developer perception changes in testing included the majority of study participants enjoying writing of tests more than with JUnit. The same majority also perceived that BDD-testing frameworks promote in writing higher quality test code than JUnit. Generally new test code was perceived more understandable and maintainable than tests with JUnit, although this was not unanimous. Learning curve to be effective varied between studied frameworks. Tool support of BDD-testing frameworks for testing Java Spring Framework were found ranging from adequate to good. In conclusion, this thesis results provide small scale evidence that BDD-testing frameworks could potentially ease the maintainability and readability of unit and integration tests while same time rising the enjoyment in testing.Tässä diplomityössä tutkittiin, kuinka käyttäytymisvetoiset testisovelluskehykset muuttavat Java-koodin testausta verrattuna JUnit:iin. Tutkimus suoritettiin tapaustutkimuksen menetelmin Vincit Oy:ssa. Tutkimukseen valittiin kaksi projektia, joissa uudet yksikkö- ja integraatiotestausluokat kirjoitettiin käyttäytymisvetoisilla testaussovelluskehyksillä JUnit:in sijaan. Työhön liittyvät aiempien tutkimusten havainnot ohjasivat työtä tarkastelemaan näissä löydettyjä ongelmallisia alueita. Tiedonkeruukeinoina käytettiin kyselyitä, haastatteluita sekä testikoodin analyysia. Työn tulokset osoittautuivat lupaaviksi ratkaisuksi aiemmin löydettyihin ongelmallisiin seikkoihin. Kokonaisuudessaan sovelluskehittäjien testauskäytännöissä löytyi useita muutoksia. Yksikkötestien rakenne ohjautui aiempaa hienojakoisemmaksi. Tulokset osoittivat myös yksimielisesti, että käyttäytymisvetoiset testaussovelluskehykset ohjaavat kirjoittamaan aiempaa paremmin itseänsä dokumentoivia testejä. Myös testin eri loogiset osat olivat uusien testien rakenteesta helpommin luettavissa. Suurimmalla osalla tutkimukseen osallistuneista testit olivat aiempaa helpompia ymmärtää sekä niistä oli helpompi poistaa toistoa. Suurin osa koki testien kirjoittamisen myös aiempaa nautittavampana. Valtaosa vastaajista koki uusien menetelmien ohjaavan kirjoittamaan laadukkaampaa testikoodia kuin aiemmin. Yleisesti ottaen uutta testikoodia pidettiin ymmärrettävämpänä ja ylläpidettävämpänä kuin JUnit testejä, tosin ei täysin yksimielisesti. Oppimiskäyrä uusien testauskehyksien parissa vaihteli tutkittujen kehysten välillä. Java Spring-sovelluskehyksen testaustuki vaihteli riittävästä tuesta hyvään tukeen. Kokonaisuudessaan työ tarjosi pienessä skaalassa näyttöä siitä, että käyttäytymisvetoiset testaussovelluskehykset voivat mahdollisesti helpottaa yksikkö- ja integraatiotestien ylläpidettävyyttä, luettavuutta sekä koettua nautintoa näiden parissa
    corecore