827 research outputs found

    Empirical evaluation of Pareto efficient multi-objective regression test case prioritisation

    Get PDF
    The aim of test case prioritisation is to determine an ordering of test cases that maximises the likelihood of early fault revelation. Previous prioritisation techniques have tended to be single objective, for which the additional greedy algorithm is the current state-of-the-art. Unlike test suite minimisation, multi objective test case prioritisation has not been thoroughly evaluated. This paper presents an extensive empirical study of the effectiveness of multi objective test case prioritisation, evaluating it on multiple versions of five widely-used benchmark programs and a much larger real world system of over 1 million lines of code. The paper also presents a lossless coverage compaction algorithm that dramatically scales the performance of all algorithms studied by between 2 and 4 orders of magnitude, making prioritisation practical for even very demanding problems. Copyright is held by the owner/author(s)

    Empirical Evaluation of Mutation-based Test Prioritization Techniques

    Full text link
    We propose a new test case prioritization technique that combines both mutation-based and diversity-based approaches. Our diversity-aware mutation-based technique relies on the notion of mutant distinguishment, which aims to distinguish one mutant's behavior from another, rather than from the original program. We empirically investigate the relative cost and effectiveness of the mutation-based prioritization techniques (i.e., using both the traditional mutant kill and the proposed mutant distinguishment) with 352 real faults and 553,477 developer-written test cases. The empirical evaluation considers both the traditional and the diversity-aware mutation criteria in various settings: single-objective greedy, hybrid, and multi-objective optimization. The results show that there is no single dominant technique across all the studied faults. To this end, \rev{we we show when and the reason why each one of the mutation-based prioritization criteria performs poorly, using a graphical model called Mutant Distinguishment Graph (MDG) that demonstrates the distribution of the fault detecting test cases with respect to mutant kills and distinguishment

    Improvements to Test Case Prioritisation considering Efficiency and Effectiveness on Real Faults

    Get PDF
    Despite the best efforts of programmers and component manufacturers, software does not always work perfectly. In order to guard against this, developers write test suites that execute parts of the code and compare the expected result with the actual result. Over time, test suites become expensive to run for every change, which has led to optimisation techniques such as test case prioritisation. Test case prioritisation reorders test cases within the test suite with the goal of revealing faults as soon as possible. Test case prioritisation has received a lot of research that has indicated that prioritised test suites can reveal faults faster, but due to a lack of real fault repositories available for research, prior evaluations have often been conducted on artificial faults. This thesis aims to investigate whether the use of artificial faults represents a threat to the validity of previous studies, and proposes new strategies for test case prioritisation that increase the effectiveness of test case prioritisation on real faults. This thesis conducts an empirical evaluation of existing test case prioritisation strategies on real and artificial faults, which establishes that artificial faults provide unreliable results for real faults. The study found that there are four occasions on which a strategy for test case prioritisation would be considered no better than the baseline when using one fault type, but would be considered a significant improvement over the baseline when using the other. Moreover, this evaluation reveals that existing test case prioritisation strategies perform poorly on real faults, with no strategies significantly outperforming the baseline. Given the need to improve test case prioritisation strategies for real faults, this thesis proceeds to consider other techniques that have been shown to be effective on real faults. One such technique is defect prediction, a technique that provides estimates that a class contains a fault. This thesis proposes a test case prioritisation strategy, called G-Clef, that leverages defect prediction estimates to reorder test suites. While the evaluation of G-Clef indicates that it outperforms existing test case prioritisation strategies, the average predicted location of a faulty class is 13% of all classes in a system, which shows potential for improvement. Finally, this thesis conducts an investigative study as to whether sentiments expressed in commit messages could be used to improve the defect prediction element of G-Clef. Throughout the course of this PhD, I have created a tool called Kanonizo, an open-source tool for performing test case prioritisation on Java programs. All of the experiments and strategies used in this thesis were implemented into Kanonizo

    Search based software engineering: Trends, techniques and applications

    Get PDF
    © ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version is available from the link below.In the past five years there has been a dramatic increase in work on Search-Based Software Engineering (SBSE), an approach to Software Engineering (SE) in which Search-Based Optimization (SBO) algorithms are used to address problems in SE. SBSE has been applied to problems throughout the SE lifecycle, from requirements and project planning to maintenance and reengineering. The approach is attractive because it offers a suite of adaptive automated and semiautomated solutions in situations typified by large complex problem spaces with multiple competing and conflicting objectives. This article provides a review and classification of literature on SBSE. The work identifies research trends and relationships between the techniques applied and the applications to which they have been applied and highlights gaps in the literature and avenues for further research.EPSRC and E

    A Bridge Rehabilitation Strategy Based on the Analysis of a Dataset of Bridge Inspections in Co. Cork

    Get PDF
    Ageing highway structures present a challenge throughout the developed world. The introduction of bridge management systems (BMS) allows bridge owners to assess the condition of their bridge stock and formulate bridge rehabilitation strategies under the constraints of limited budgets and resources. This research presents a decision-support system for bridge owners in the selection of the best strategy for bridge rehabilitation on a highway network. The basis of the research is an available dataset of 1,367 bridge inspection records for County Cork that has been prepared to the Eirspan BMS inspection standard and which includes bridge structure condition ratings and rehabilitation costs. There has been no previous research on a regional Irish bridge stock of this magnitude. Research objectives are the consolidation of the dataset into a usable format, the review of previous research and the formulation of a methodology for the development of a network wide bridge rehabilitation strategy model. A procedure proposed by previous research on the prioritisation of theoretical bridge rehabilitation projects on the Chilean road network has been built upon. Statistical analysis of both recent rehabilitation projects in County Cork and of a survey of experts has led to the formulation of rehabilitation project prioritisation indices. The application of these derived indices allows the forecasting and calculation of funding requirements for network wide improvements. A review of the functional life expectancies of bridges has been undertaken. A deterioration rate which predicts the annual disimprovement in condition rating of each bridge has been calculated using statistical regression analysis and provides a basis for the estimation of investment requirements for an overarching rehabilitation strategy. An economic assessment of four rehabilitation intervention strategies has been undertaken using the Net Present Worth method. A system performance method developed in this research and which uses efficiency and effectiveness indicators taken from UK, New Zealand and French practice has determined that the range of annual investment amounts equivalent to 0.27% and 1% respectively of the bridge stock replacement cost are required to achieve full bridge network rehabilitation and provide a minimum 85 year service life for all structures. A benchmarking comparison with reported international practice has confirmed the applicability of the developed methodology

    Sentinel: A Hyper-Heuristic for the Generation of Mutant Reduction Strategies

    Get PDF
    Mutation testing is an effective approach to evaluate and strengthen software test suites, but its adoption is currently limited by the mutants' execution computational cost. Several strategies have been proposed to reduce this cost (a.k.a. mutation cost reduction strategies), however none of them has proven to be effective for all scenarios since they often need an ad-hoc manual selection and configuration depending on the software under test (SUT). In this paper, we propose a novel multi-objective evolutionary hyper-heuristic approach, dubbed Sentinel, to automate the generation of optimal cost reduction strategies for every new SUT. We evaluate Sentinel by carrying out a thorough empirical study involving 40 releases of 10 open-source real-world software systems and both baseline and state-of-the-art strategies as a benchmark. We execute a total of 4,800 experiments, and evaluate their results with both quality indicators and statistical significance tests, following the most recent best practice in the literature. The results show that strategies generated by Sentinel outperform the baseline strategies in 95% of the cases always with large effect sizes. They also obtain statistically significantly better results than state-of-the-art strategies in 88% of the cases, with large effect sizes for 95% of them. Also, our study reveals that the mutation strategies generated by Sentinel for a given software version can be used without any loss in quality for subsequently developed versions in 95% of the cases. These results show that Sentinel is able to automatically generate mutation strategies that reduce mutation testing cost without affecting its testing effectiveness (i.e. mutation score), thus taking off from the tester's shoulders the burden of manually selecting and configuring strategies for each SUT.Comment: in IEEE Transactions on Software Engineerin

    Automated Realistic Test Input Generation and Cost Reduction in Service-centric System Testing

    Get PDF
    Service-centric System Testing (ScST) is more challenging than testing traditional software due to the complexity of service technologies and the limitations that are imposed by the SOA environment. One of the most important problems in ScST is the problem of realistic test data generation. Realistic test data is often generated manually or using an existing source, thus it is hard to automate and laborious to generate. One of the limitations that makes ScST challenging is the cost associated with invoking services during testing process. This thesis aims to provide solutions to the aforementioned problems, automated realistic input generation and cost reduction in ScST. To address automation in realistic test data generation, the concept of Service-centric Test Data Generation (ScTDG) is presented, in which existing services used as realistic data sources. ScTDG minimises the need for tester input and dependence on existing data sources by automatically generating service compositions that can generate the required test data. In experimental analysis, our approach achieved between 93% and 100% success rates in generating realistic data while state-of-the-art automated test data generation achieved only between 2% and 34%. The thesis addresses cost concerns at test data generation level by enabling data source selection in ScTDG. Source selection in ScTDG has many dimensions such as cost, reliability and availability. This thesis formulates this problem as an optimisation problem and presents a multi-objective characterisation of service selection in ScTDG, aiming to reduce the cost of test data generation. A cost-aware pareto optimal test suite minimisation approach addressing testing cost concerns during test execution is also presented. The approach adapts traditional multi-objective minimisation approaches to ScST domain by formulating ScST concerns, such as invocation cost and test case reliability. In experimental analysis, the approach achieved reductions between 69% and 98.6% in monetary cost of service invocations during testin

    Network-level maintenance decisions for flexible pavement using a soft computing-based framework

    Get PDF
    An effective pavement management system (PMS) is one that is guided by a software program that ensures that all pavement sections are maintained at adequately high serviceability levels and structural conditions with a low budget and resource usage, without causing any significant negative effect on environment, safe traffic operations and social activities. PMS comprises of section classification; performance prediction; and optimisation for decision-making. For section classification, this research presents a fuzzy inference system (FIS), with appropriate membership functions for section classifications and for calculating the pavement condition index (PCI). The severity and extent of seven distress types (alligator cracking, block cracking, longitudinal and transverse cracking, patching, potholes, bleeding and ravelling) were used as fuzzy inputs. The result showed a good correlation for fuzzy model. A sensitivity analysis showed a pavement crack has the greatest influence on section classification compared to the other distress types. A novel network level deterministic deterioration model was developed for flexible pavement on arterial and collector roads in four climatic zones considering the impact of maintenance, age, area and length of cracks, and traffic loading. The prediction models showed good accuracy with high determination coefficient (R2). The cross-validation study showed that the models for arterial roads yield better accuracy than the models for collector roads. A sensitivity analysis showed that the area and length of cracks have the most significant impact on the model performance. A novel discrete barebones multi-objective particle swarm algorithm was applied for a discrete multi-objective problem. Conventional particle swarm optimisation (PSO) techniques require a manual selection of various control parameters for the velocity term. In contrast, the bare-bones PSO has the advantage of being velocity-free, hence, does not involve any parameter selection. The discrete barebones multi-objective PSO algorithm was applied to find optimal rehabilitation scheduling considering the two objectives of the minimisation of the total pavement rehabilitation cost and the minimisation of the sum of all residual PCI values. The results showed that the optimal maintenance plan found by the novel algorithm is the better than found by conventional algorithm. Although the results of performance metrics showed that the both algorithms perform on a par, the novel algorithm is clearly advantageous as it does not need parameter selection

    Prioritisation of requests, bugs and enhancements pertaining to apps for remedial actions. Towards solving the problem of which app concerns to address initially for app developers

    Get PDF
    Useful app reviews contain information related to the bugs reported by the app’s end-users along with the requests or enhancements (i.e., suggestions for improvement) pertaining to the app. App developers expend exhaustive manual efforts towards the identification of numerous useful reviews from a vast pool of reviews and converting such useful reviews into actionable knowledge by means of prioritisation. By doing so, app developers can resolve the critical bugs and simultaneously address the prominent requests or enhancements in short intervals of apps’ maintenance and evolution cycles. That said, the manual efforts towards the identification and prioritisation of useful reviews have limitations. The most common limitations are: high cognitive load required to perform manual analysis, lack of scalability associated with limited human resources to process voluminous reviews, extensive time requirements and error-proneness related to the manual efforts. While prior work from the app domain have proposed prioritisation approaches to convert reviews pertaining to an app into actionable knowledge, these studies have limitations and lack benchmarking of the prioritisation performance. Thus, the problem to prioritise numerous useful reviews still persists. In this study, initially, we conducted a systematic mapping study of the requirements prioritisation domain to explore the knowledge on prioritisation that exists and seek inspiration from the eminent empirical studies to solve the problem related to the prioritisation of numerous useful reviews. Findings of the systematic mapping study inspired us to develop automated approaches for filtering useful reviews, and then to facilitate their subsequent prioritisation. To filter useful reviews, this work developed six variants of the Multinomial Naïve Bayes method. Next, to prioritise the order in which useful reviews should be addressed, we proposed a group-based prioritisation method which initially classified the useful reviews into specific groups using an automatically generated taxonomy, and later prioritised these reviews using a multi-criteria heuristic function. Subsequently, we developed an individual prioritisation method that directly prioritised the useful reviews after filtering using the same multi-criteria heuristic function. Some of the findings of the conducted systematic mapping study not only provided the necessary inspiration towards the development of automated filtering and prioritisation approaches but also revealed crucial dimensions such as accuracy and time that could be utilised to benchmark the performance of a prioritisation method. With regards to the proposed automated filtering approach, we observed that the performance of the Multinomial Naïve Bayes variants varied based on their algorithmic structure and the nature of labelled reviews (i.e., balanced or imbalanced) that were made available for training purposes. The outcome related to the automated taxonomy generation approach for classifying useful review into specific groups showed a substantial match with the manual taxonomy generated from domain knowledge. Finally, we validated the performance of the group-based prioritisation and individual prioritisation methods, where we found that the performance of the individual prioritisation method was superior to that of the group-based prioritisation method when outcomes were assessed for the accuracy and time dimensions. In addition, we performed a full-scale evaluation of the individual prioritisation method which showed promising results. Given the outcomes, it is anticipated that our individual prioritisation method could assist app developers in filtering and prioritising numerous useful reviews to support app maintenance and evolution cycles. Beyond app reviews, the utility of our proposed prioritisation solution can be evaluated on software repositories tracking bugs and requests such as Jira, GitHub and so on
    corecore