34 research outputs found

    A Hyper-heuristic for Multi-objective Integration and Test Ordering in Google Guava

    Get PDF
    Integration testing seeks to find communication problems between different units of a software system. As the order in which units are considered can impact the overall effort required to perform integration testing, deciding an appropriate sequence to integrate and test units is vital. Here we apply a multi-objective hyper-heuristic set within an NSGA-II framework to the Integration and Test Order Problem (ITO) for Google Guava, a set of open-source common libraries for Java. Our results show that an NSGA-II based hyper-heuristic employing a simplified version of Choice Function heuristic selection, outperforms standard NSGA-II for this problem

    Sentinel: A Hyper-Heuristic for the Generation of Mutant Reduction Strategies

    Get PDF
    Mutation testing is an effective approach to evaluate and strengthen software test suites, but its adoption is currently limited by the mutants' execution computational cost. Several strategies have been proposed to reduce this cost (a.k.a. mutation cost reduction strategies), however none of them has proven to be effective for all scenarios since they often need an ad-hoc manual selection and configuration depending on the software under test (SUT). In this paper, we propose a novel multi-objective evolutionary hyper-heuristic approach, dubbed Sentinel, to automate the generation of optimal cost reduction strategies for every new SUT. We evaluate Sentinel by carrying out a thorough empirical study involving 40 releases of 10 open-source real-world software systems and both baseline and state-of-the-art strategies as a benchmark. We execute a total of 4,800 experiments, and evaluate their results with both quality indicators and statistical significance tests, following the most recent best practice in the literature. The results show that strategies generated by Sentinel outperform the baseline strategies in 95% of the cases always with large effect sizes. They also obtain statistically significantly better results than state-of-the-art strategies in 88% of the cases, with large effect sizes for 95% of them. Also, our study reveals that the mutation strategies generated by Sentinel for a given software version can be used without any loss in quality for subsequently developed versions in 95% of the cases. These results show that Sentinel is able to automatically generate mutation strategies that reduce mutation testing cost without affecting its testing effectiveness (i.e. mutation score), thus taking off from the tester's shoulders the burden of manually selecting and configuring strategies for each SUT.Comment: in IEEE Transactions on Software Engineerin

    A multi-armed bandit approach for enhancing test case prioritization in continuous integration environments

    Get PDF
    Orientador: Silvia Regina VergilioTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 10/12/2021Inclui referênciasÁrea de concentração: Ciência da ComputaçãoResumo: A Integração Contínua (do inglês Continuous Integration, CI) é uma prática comum e amplamente adotada na indústria que permite a integração frequente de mudanças de software, tornando a evolução do software mais rápida e econômica. Em ambientes que adotam CI, o Teste de Regressão (do inglês Regression Testing, RT) é fundamental para assegurar que mudanças realizadas não afetaram negativamente o comportamento do sistema. No entanto, RT é uma tarefa cara. Para reduzir os custos do RT, o uso de técnicas de priorização de casos de teste (do inglês Test Case Prioritization, TCP) desempenha um papel importante. Essas técnicas visam a identificar a ordem para os casos de teste que maximiza objetivos específicos, como a detecção antecipada de falhas. Recentemente, muitos estudos surgiram no contexto de TCP para ambientes de CI (do inglês Test Case Prioritization in Continuous Integration, TCPCI), mas poucos estudos consideram particularidades destes ambientes, tais como restrições de tempo e a volatilidade dos casos de teste, ou seja, eles não consideram o ambiente dinâmico do ciclo de vida do software no qual novos casos de teste podem ser adicionados ou removidos (descontinuados) de um ciclo para outro. A volatilidade de casos de teste está relacionada ao dilema de Exploração versus Intensificação (do inglês Exploration versus Exploitation, EvE). Para resolver este dilema uma abordagem precisa balancear: i) a diversidade do conjunto de testes; e ii) a quantidade de novos casos de teste e testes que possuem alta probabilidade de revelar defeitos. Para lidar com isso, a maioria das abordagens usa, além do histórico de falhas, outras métricas que consideram instrumentação de código ou necessitam de informações adicionais, tais como a cobertura de testes. Contudo, manter as informações atualizadas pode ser difícil e consumir tempo, e não ser escalável devido ao orçamento de teste do ambiente de CI. Neste contexto, e para lidar apropriadamente com o problema de TCPCI, este trabalho apresenta uma abordagem baseada em problemas Multi-Armed Bandit (MAB) chamada COLEMAN (Combinatorial VOlatiLE Multi-Armed BANdiT). Problemas MAB são uma classe de problemas de decisão sequencial que são intensamente estudados para resolver o dilema de EvE. O problema de TCPCI enquadra-se na categoria volátil e combinatorial, pois múltiplos braços (casos de teste) necessitam ser selecionados, e eles são adicionados ou removidos ao longos dos ciclos. COLEMAN foi avaliada em diferentes sistemas do mundo real, orçamentos de teste, funções de recompensa, e políticas MAB, em relação a diferentes abordagens da literatura, e também no contexto de Sistemas Altamente Configuráveis (do inglês Highly-Configurable Software, HCS). Diferentes indicadores de qualidade foram utilizados, englobando diferentes perspectivas tais como a eficácia da detecção de defeitos (com e sem considerar custo), rápida detecção de defeitos, redução do tempo de teste, tempo de priorização, e acurácia. Os resultados mostram que a abordagem COLEMAN é promissora e endossam sua aplicabilidade no problema de TCPCI. Em comparação com RETECS, uma abordagem do estado da arte baseada em Aprendizado por Reforço, COLEMAN apresenta uma melhor eficácia em detectar defeitos em ˜ 82% dos casos, e detecta-os mais rapidamente em 100% dos casos. COLEMAN gasta um tempo negligível, menos do que um segundo para executar, e é mais estável do que a abordagem RETECS, ou seja, melhor se adapta para lidar com os picos de defeitos. Quando comparada com uma abordagem baseada em busca, COLEMAN provê soluções próximas das ótimas em ˜ 90% dos casos, e soluções razoáveis em ˜ 92% dos casos em comparação com uma abordagem determinística. Portanto, a contribuição deste trabalho é introduzir uma abordagem eficiente e eficaz para o problema de TCPCI.Abstract: Continuous Integration (CI) is a practice commonly and widely adopted in the industry to allowfrequent integration of software changes, making software evolution faster and cost-effective. In CIenvironments, Regression Testing (RT) is fundamental to ensure that changes have not adverselyaffected existing features of the system. However, RT is an expensive task. To reduce RT costs,the use of Test Case Prioritization (TCP) techniques plays an important role. These techniquesattempt to identify the test case order that maximizes specific goals, such as early fault detection.Recently, many studies on TCP in CI environments (TCPCI) have arisen, but few pieces of workconsider CI particularities, such as the time constraint and the test case volatility, that is, they donot consider the dynamic environment of the software life-cycle in which new test cases can beadded or removed (discontinued) over time. The test case volatility is a characteristic related tothe Exploration versus Exploitation (EvE) dilemma. To solve such a dilemma an approach needsto balance: i) the diversity of the test suite; and ii) the quantity of new test cases and test casesthat are error-prone or that comprise high fault-detection capabilities. To deal with this, mostapproaches use, besides the failure-history, other measures that rely on code instrumentation orrequire additional information, such as testing coverage. However, maintaining this informationupdated can be difficult and time-consuming, not scalable due to the test budget of CI environments.In this context, and to properly deal with the TCPCI problem, this work presents an approachbased on Multi-Armed Bandit (MAB) called COLEMAN (Combinatorial VOlatiLE Multi-ArmedBANdiT). The MAB problems are a class of sequential decision problems that are intensivelystudied for solving the EvE dilemma. The TCPCI problem falls into the category of volatileand combinatorial MAB, because multiple arms (test cases) need to be selected, and they areadded or removed over the cycles. COLEMAN was evaluated under different real-world softwaresystems, time budgets, reward functions, and MAB policies, against different approaches fromthe literature, and also considering the Highly-Configurable Software context. Different qualityindicators were used to encompass different perspectives such as fault detection effectiveness (andwith cost consideration), early fault detection, test time reduction, prioritization time, and accuracy.The outcomes show that COLEMAN is promising and endorse its applicability for the TCPCIproblem. COLEMAN outperforms RETECS, a state-of-the-art approach based on ReinforcementLearning, and stands out mainly regarding fault detection effectiveness (in ~ 82% of the cases)and early fault detection (in 100%). COLEMAN spends a negligible time, less than one second toexecute, and is more stable than RETECS, that is, adapts better to deal with peak of faults. Whencompared with a search-based approach, COLEMAN provides near-optimal solutions in ~ 90% ofthe cases, and in comparison with a deterministic approach, provides reasonable solutions in 92%of the cases. Thus, the main contribution of this work is to provide an efficient and efficaciousMAB-based approach for the TCPCI problem

    Parameter-less Late Acceptance Hill-climbing: Foundations & Applications.

    Get PDF
    PhD Theses.Stochastic Local Search (SLS) methods have been used to solve complex hard combinatorial problems in a number of elds. Their judicious use of randomization, arguably, simpli es their design to achieve robust algorithm behaviour in domains where little is known. This feature makes them a general purpose approach for tackling complex problems. However, their performance, usually, depends on a number of parameters that should be speci ed by the user. Most of these parameters are search-algorithm related and have little to do with the user's problem. This thesis presents search techniques for combinatorial problems that have fewer parameters while delivering good anytime performance. Their parameters are set automatically by the algorithm itself in an intelligent way, while making sure that they use the entire given time budget to explore the search space with a high probability of avoiding the stagnation in a single basin of attraction. These algorithms are suitable for general practitioners in industry that do not have deep insight into search methodologies and their parameter tuning. Note that, to all intents and purposes, in realworld search problems the aim is to nd a good enough quality solution in a pre-de ned time. In order to achieve this, we use a technique that was originally introduced for automating population sizing in evolutionary algorithms. In an intelligent way, we adapted it to a particular one-point stochastic local search algorithm, namely Late Acceptance Hill-Climbing (LAHC), to eliminate the need to manually specify the value of the sole parameter of this algorithm. We then develop a mathematically sound dynamic cuto time strategy that is able to reliably detect the stagnation point for these search algorithms. We evaluated the suitability and scalability of the proposed methods on a range of classical combinatorial optimization problems as well as a real-world software engineering proble

    Q(sqrt(-3))-Integral Points on a Mordell Curve

    Get PDF
    We use an extension of quadratic Chabauty to number fields,recently developed by the author with Balakrishnan, Besser and M ̈uller,combined with a sieving technique, to determine the integral points overQ(√−3) on the Mordell curve y2 = x3 − 4

    Empirically-Grounded Construction of Bug Prediction and Detection Tools

    Get PDF
    There is an increasing demand on high-quality software as software bugs have an economic impact not only on software projects, but also on national economies in general. Software quality is achieved via the main quality assurance activities of testing and code reviewing. However, these activities are expensive, thus they need to be carried out efficiently. Auxiliary software quality tools such as bug detection and bug prediction tools help developers focus their testing and reviewing activities on the parts of software that more likely contain bugs. However, these tools are far from adoption as mainstream development tools. Previous research points to their inability to adapt to the peculiarities of projects and their high rate of false positives as the main obstacles of their adoption. We propose empirically-grounded analysis to improve the adaptability and efficiency of bug detection and prediction tools. For a bug detector to be efficient, it needs to detect bugs that are conspicuous, frequent, and specific to a software project. We empirically show that the null-related bugs fulfill these criteria and are worth building detectors for. We analyze the null dereferencing problem and find that its root cause lies in methods that return null. We propose an empirical solution to this problem that depends on the wisdom of the crowd. For each API method, we extract the nullability measure that expresses how often the return value of this method is checked against null in the ecosystem of the API. We use nullability to annotate API methods with nullness annotation and warn developers about missing and excessive null checks. For a bug predictor to be efficient, it needs to be optimized as both a machine learning model and a software quality tool. We empirically show how feature selection and hyperparameter optimizations improve prediction accuracy. Then we optimize bug prediction to locate the maximum number of bugs in the minimum amount of code by finding the most cost-effective combination of bug prediction configurations, i.e., dependent variables, machine learning model, and response variable. We show that using both source code and change metrics as dependent variables, applying feature selection on them, then using an optimized Random Forest to predict the number of bugs results in the most cost-effective bug predictor. Throughout this thesis, we show how empirically-grounded analysis helps us achieve efficient bug prediction and detection tools and adapt them to the characteristics of each software project

    Diverse Intrusion-tolerant Systems

    Get PDF
    Over the past 20 years, there have been indisputable advances on the development of Byzantine Fault-Tolerant (BFT) replicated systems. These systems keep operational safety as long as at most f out of n replicas fail simultaneously. Therefore, in order to maintain correctness it is assumed that replicas do not suffer from common mode failures, or in other words that replicas fail independently. In an adversarial setting, this requires that replicas do not include similar vulnerabilities, or otherwise a single exploit could be employed to compromise a significant part of the system. The thesis investigates how this assumption can be substantiated in practice by exploring diversity when managing the configurations of replicas. The thesis begins with an analysis of a large dataset of vulnerability information to get evidence that diversity can contribute to failure independence. In particular, we used the data from a vulnerability database to devise strategies for building groups of n replicas with different Operating Systems (OS). Our results demonstrate that it is possible to create dependable configurations of OSes, which do not share vulnerabilities over reasonable periods of time (i.e., a few years). Then, the thesis proposes a new design for a firewall-like service that protects and regulates the access to critical systems, and that could benefit from our diversity management approach. The solution provides fault and intrusion tolerance by implementing an architecture based on two filtering layers, enabling efficient removal of invalid messages at early stages in order to decrease the costs associated with BFT replication in the later stages. The thesis also presents a novel solution for managing diverse replicas. It collects and processes data from several data sources to continuously compute a risk metric. Once the risk increases, the solution replaces a potentially vulnerable replica by another one, trying to maximize the failure independence of the replicated service. Then, the replaced replica is put on quarantine and updated with the available patches, to be prepared for later re-use. We devised various experiments that show the dependability gains and performance impact of our prototype, including key benchmarks and three BFT applications (a key-value store, our firewall-like service, and a blockchain).Unidade de investigação LASIGE (UID/CEC/00408/2019) e o projeto PTDC/EEI-SCR/1741/2041 (Abyss