17 research outputs found

    Higher Order Mutation Testing

    Get PDF
    Mutation testing is a fault-based software testing technique that has been studied widely for over three decades. To date, work in this field has focused largely on first order mutants because it is believed that higher order mutation testing is too computationally expensive to be practical. This thesis argues that some higher order mutants are potentially better able to simulate real world faults and to reveal insights into programming bugs than the restricted class of first order mutants. This thesis proposes a higher order mutation testing paradigm which combines valuable higher order mutants and non-trivial first order mutants together for mutation testing. To overcome the exponential increase in the number of higher order mutants a search process that seeks fit mutants (both first and higher order) from the space of all possible mutants is proposed. A fault-based higher order mutant classification scheme is introduced. Based on different types of fault interactions, this approach classifies higher order mutants into four categories: expected, worsening, fault masking and fault shifting. A search-based approach is then proposed for locating subsuming and strongly subsuming higher order mutants. These mutants are a subset of fault mask and fault shift classes of higher order mutants that are more difficult to kill than their constituent first order mutants. Finally, a hybrid test data generation approach is introduced, which combines the dynamic symbolic execution and search based software testing approaches to generate strongly adequate test data to kill first and higher order mutants

    Dependability of IT Systems in Emergency Situations – Theory and Practice

    Get PDF
    As our dependence on IT systems increases, evaluating the dependability of critical IT systems becomes more important. One of the main challenges in software reliability engineering is the sensitivity of software systems to a changing usage. This is especially important for systems that are critical in the aftermath of a crisis and for which reliability is the most important aspect of dependability. The crisis might change the usage of the system, and this could have a negative effect on the reliability. Because crisis situations are typically rare events, both the reliability and the criticality of IT systems after a crisis situation are hard to predict. The first part of this thesis focuses on the analysis of the sensitivity of the reliability of IT systems to changes in their usage. With the help of statistical methods the effects of changing usage profiles, modelled through the use of Markov models, can be examined. After a theoretical derivation of the properties of different models for the usage of software systems, the results were validated by applying the models to the data collected from the logfiles of a webserver. Swedish municipalities also depend more and more on IT systems for their daily work. Because of their important role in the relief coordination after a crisis, the dependability of their IT systems during these emergency situations is especially critical. The evaluation of this dependability requires the combination of two kinds of information: how critically needed the IT systems are in the aftermath of a crisis and how trustworthy the critical systems are. To avoid that a failing IT system disturbs the relief work, risk and vulnerability analyses need to take into account the dependability of critical IT systems. This way, municipalities can make sure that the relief work is not critically dependent on systems that are not sufficiently reliable. The second part of this thesis describes a case study on how two Swedish municipalities deal with these issues. The study focuses especially on the division of responsibilities in the municipalities and on their current methods. The study shows that today there is much room for improvement, especially in the communication between IT personnel and emergency managers. The main goal of these case studies is to form a basis for the development of practical methods that can assist Swedish municipalities in evaluating the dependability of their IT systems and integration of this information in their emergency planning in the near future

    It’s like flossing your teeth: On the Importance and Challenges of Reproducible Builds for Software Supply Chain Security

    Get PDF
    The 2020 Solarwinds attack was a tipping point that caused a heightened awareness about the security of the software supply chain and in particular the large amount of trust placed in build systems. Reproducible Builds (R-Bs) provide a strong foundation to build defenses for arbitrary attacks against build systems by ensuring that given the same source code, build environment, and build instructions, bitwise-identical artifacts are created. Unfortunately, much of the software industry believes R-Bs are too far out of reach for most projects. The goal of this paper is to help identify a path for R-Bs to become a commonplace property. To this end, we conducted a series of 24 semi-structured expert interviews with participants from the Reproducible-Builds.org project, and iterated on our questions with the reproducible builds community. We identified a range of motivations that can encourage open source developers to strive for R-Bs, including indicators of quality, security benefits, and more efficient caching of artifacts. We identify experiences that help and hinder adoption, which heavily include communication with upstream projects. We conclude with recommendations on how to better integrate R-Bs with the efforts of the open source and free software community

    Open source software GitHub ecosystem: a SEM approach

    Get PDF
    Open source software (OSS) is a collaborative effort. Getting affordable high-quality software with less probability of errors or fails is not far away. Thousands of open-source projects (termed repos) are alternatives to proprietary software development. More than two-thirds of companies are contributing to open source. Open source technologies like OpenStack, Docker and KVM are being used to build the next generation of digital infrastructure. An iconic example of OSS is 'GitHub' - a successful social site. GitHub is a hosting platform that host repositories (repos) based on the Git version control system. GitHub is a knowledge-based workspace. It has several features that facilitate user communication and work integration. Through this thesis I employ data extracted from GitHub, and seek to better understand the OSS ecosystem, and to what extent each of its deployed elements affects the successful development of the OSS ecosystem. In addition, I investigate a repo's growth over different time periods to test the changing behavior of the repo. From our observations developers do not follow one development methodology when developing, and growing their project, and such developers tend to cherry-pick from differing available software methodologies. GitHub API remains the main OSS location engaged to extract the metadata for this thesis's research. This extraction process is time-consuming - due to restrictive access limitations (even with authentication). I apply Structure Equation Modelling (termed SEM) to investigate the relative path relationships between the GitHub- deployed OSS elements, and I determine the path strength contributions of each element to determine the OSS repo's activity level. SEM is a multivariate statistical analysis technique used to analyze structural relationships. This technique is the combination of factor analysis and multiple regression analysis. It is used to analyze the structural relationship between measured variables and/or latent constructs. This thesis bridges the research gap around longitude OSS studies. It engages large sample-size OSS repo metadata sets, data-quality control, and multiple programming language comparisons. Querying GitHub is not direct (nor simple) yet querying for all valid repos remains important - as sometimes illegal, or unrepresentative outlier repos (which may even be quite popular) do arise, and these then need to be removed from each initial OSS's language-specific metadata set. Eight top GitHub programming languages, (selected as the most forked repos) are separately engaged in this thesis's research. This thesis observes these eight metadata sets of GitHub repos. Over time, it measures the different repo contributions of the deployed elements of each metadata set. The number of stars-provided to the repo delivers a weaker contribution to its software development processes. Sometimes forks work against the repo's progress by generating very minor negative total effects into its commit (activity) level, and by sometimes diluting the focus of the repo's software development strategies. Here, a fork may generate new ideas, create a new repo, and then draw some original repo developers off into this new software development direction, thus retarding the original repo's commit (activity) level progression. Multiple intermittent and minor version releases exert lesser GitHub JavaScript repo commit (or activity) changes because they often involve only slight OSS improvements, and because they only require minimal commit/commits contributions. More commit(s) also bring more changes to documentation, and again the GitHub OSS repo's commit (activity) level rises. There are both direct and indirect drivers of the repo's OSS activity. Pulls and commits are the strongest drivers. This suggests creating higher levels of pull requests is likely a preferred prime target consideration for the repo creator's core team of developers. This study offers a big data direction for future work. It allows for the deployment of more sophisticated statistical comparison techniques. It offers further indications around the internal and broad relationships that likely exist between GitHub's OSS big data. Its data extraction ideas suggest a link through to business/consumer consumption, and possibly how these may be connected using improved repo search algorithms that release individual business value components

    Coordinated atomic actions in Java EE platform

    Get PDF
    Orientador: Cecilia Mary Fischer RubiraDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: À medida que os sistemas de software evoluem, precisam garantir requisitos funcionais e de qualidade cada vez mais complexos e com maior rigor de qualidade. Nos _últimos anos, várias abordagens e ferramentas t^em sido propostas para guiar o processo de desenvolvimento de software visando atingir altos níveis de qualidade. O Desenvolvimento Baseado em Componentes (DBC) _e uma das técnicas mais bem aceitas tanto na indústria quanto no meio acadêmico e se propõe a compor sistemas de software a partir de componentes reutilizáveis já prontos e, se possível, de uma relativamente pequena quantidade de linhas de código específicas para a aplicação. Existem diversas plataformas para DBC, das quais Java Enterprise Edition (Java EE) _e uma das mais populares. Por outro lado, tolerância a falhas _e uma das abordagens mais empregadas para construir sistemas que consigam prover seus serviços especificados mesmo na presença de diferentes tipos de falhas de forma a atingir os níveis desejados de confiabilidade. O conceito de Ação Atômica Coordenada (CA Action) foi proposto para prover tolerância a falhas em sistemas concorrentes orientados a objetos, integrando os conceitos complementares de conversação (concorrência cooperativa) e transação atômica (concorrência competitiva) e estabelecendo uma semântica para tratamento de exceções concorrentes (exceções lançadas simultaneamente por threads concorrentes) além de dar suporte ao uso conjunto de recuperação de erro por avanço e por retrocesso. A proposta deste trabalho _e acrescentar _a plataforma de desenvolvimento baseado em componentes Java Enterprise Edition (Java EE) alguns mecanismos de tolerância a falhas propostos pelo conceito de CA Action. A implementação da solução proposta foi baseada em Java, programação orientada a aspectos e no conceito de comunicação assíncrona implementada pelos componentes message-driven beans da plataforma Java Enterprise Edition. A solução foi avaliada através da construção de 2 estudos de caso: (i) uma aplicação JBoss baseada em message-driven beans e (ii) um sistema real de faturamento de energia elétrica. Desta forma, procuramos demonstrar a factibilidade de proporcionar mecanismos simples para adaptações permitindo que aplicações desta plataforma possam usufruir de mais benefícios de tolerância a falhas sem grandes modificações em seu código fonte já previamente implementado e implantadoAbstract: As software systems evolve, they should provide stronger functional and quality requirements. In the last years, many diferent approaches and tools have been proposed to guide software development process aiming to achieve higher quality levels. Component-Based Development (CBD) is one of the most accepted techniques in the academy as well as in the industry and proposes to build software systems from pre-existing reusable components and, if possible, a relative low quantity of application specific glue code. There are many CBD platforms and Java Enterprise Edition (Java EE) is one of the most popular. Fault tolerance is one of the most adopted means to build up systems that are capable of providing their intended service, even if only partially, when faults occur, so as the desired reliability levels be achieved. The Coordinated Atomic Action (CA Action) concept was proposed to provide fault tolerance in concurrent object-oriented software systems and to integrate two complementary concepts, conversations (cooperative concurrency) and transactions (competitive concurrency). It establishes a semantic for concurrent exception handling and also supports the combined use of forward and backward error recovery. This work proposes to extend the component-based development platform Java Enterprise Edition (Java EE) with some of the fault tolerance means proposed by CA Action's concept by incorporating a concurrent exception handling mechanism to the platform. The proposed solution implementation was based on Java, aspect oriented programming and on the asynchronous communication concept implemented by Java EE message-driven bean components. The solution was assessed by two case studies: (i) a JBoss application based on message-driven beans and (ii) a real billing system for electric power companies by which we try to demonstrate the feasibility of providing simple means for adapting Java Enterprise Edition applications in a way that they could appropriate more fault tolerance benefits without big changes in their previously implemented and deployed source codeMestradoEngenharia de SoftwareMestre em Ciência da Computaçã

    On the Use of Fault Injection to Discover Security Vulnerabilities in Applications

    Get PDF
    The advent of the Internet has enabled developers to write and share software components with each other more easily. Developers have become increasingly reliant on code other than their own for application development; code that is often not well tested, and lacking any kind of security review, thus exposing its consumers to security vulnerabilities. The goal of this thesis is to adapt existing techniques, and discover new approaches that can be used to discover security vulnerabilities in applications. We use fault injection in each of our techniques and define a set of criteria to evaluate these approaches. The hierarchy of approaches, starting from a black box and ending in a full white box approach, allows a security reviewer to choose a technique depending on the amount of information available about the application under review, time constraints, and extent of security analysis and confidence desired in the program

    New Approaches to Software Security Metrics and Measurements

    Get PDF
    Meaningful metrics and methods for measuring software security would greatly improve the security of software ecosystems. Such means would make security an observable attribute, helping users make informed choices and allowing vendors to ‘charge’ for it—thus, providing strong incentives for more security investment. This dissertation presents three empirical measurement studies introducing new approaches to measuring aspects of software security, focusing on Free/Libre and Open Source Software (FLOSS). First, to revisit the fundamental question of whether software is maturing over time, we study the vulnerability rate of packages in stable releases of the Debian GNU/Linux software distribution. Measuring the vulnerability rate through the lens of Debian stable: (a) provides a natural time frame to test for maturing behavior, (b) reduces noise and bias in the data (only CVEs with a Debian Security Advisory), and (c) provides a best-case assessment of maturity (as the Debian release cycle is rather conservative). Overall, our results do not support the hypothesis that software in Debian is maturing over time, suggesting that vulnerability finding-and-fixing does not scale and more effort should be invested in significantly reducing the introduction rate of vulnerabilities, e.g. via ‘security by design’ approaches like memory-safe programming languages. Second, to gain insights beyond the number of reported vulnerabilities, we study how long vulnerabilities remain in the code of popular FLOSS projects (i.e. their lifetimes). We provide the first, to the best of our knowledge, method for automatically estimating the mean lifetime of a set of vulnerabilities based on information in vulnerability-fixing commits. Using this method, we study the lifetimes of ~6 000 CVEs in 11 popular FLOSS projects. Among a number of findings, we identify two quantities of particular interest for software security metrics: (a) the spread between mean vulnerability lifetime and mean code age at the time of fix, and (b) the rate of change of the aforementioned spread. Third, to gain insights into the important human aspect of the vulnerability finding process, we study the characteristics of vulnerability reporters for 4 popular FLOSS projects. We provide the first, to the best of our knowledge, method to create a large dataset of vulnerability reporters (>2 000 reporters for >4 500 CVEs) by combining information from a number of publicly available online sources. We proceed to analyze the dataset and identify a number of quantities that, suitably combined, can provide indications regarding the health of a project’s vulnerability finding ecosystem. Overall, we showed that measurement studies carefully designed to target crucial aspects of the software security ecosystem can provide valuable insights and indications regarding the ‘quality of security’ of software. However, the road to good security metrics is still long. New approaches covering other important aspects of the process are needed, while the approaches introduced in this dissertation should be further developed and improved

    From examples to knowledge in model-driven engineering : a holistic and pragmatic approach

    Full text link
    Le Model-Driven Engineering (MDE) est une approche de développement logiciel qui propose d’élever le niveau d’abstraction des langages afin de déplacer l’effort de conception et de compréhension depuis le point de vue des programmeurs vers celui des décideurs du logiciel. Cependant, la manipulation de ces représentations abstraites, ou modèles, est devenue tellement complexe que les moyens traditionnels ne suffisent plus à automatiser les différentes tâches. De son côté, le Search-Based Software Engineering (SBSE) propose de reformuler l’automatisation des tâches du MDE comme des problèmes d’optimisation. Une fois reformulé, la résolution du problème sera effectuée par des algorithmes métaheuristiques. Face à la pléthore d’études sur le sujet, le pouvoir d’automatisation du SBSE n’est plus à démontrer. C’est en s’appuyant sur ce constat que la communauté du Example-Based MDE (EBMDE) a commencé à utiliser des exemples d’application pour alimenter la reformulation SBSE du problème d’apprentissage de tâche MDE. Dans ce contexte, la concordance de la sortie des solutions avec les exemples devient un baromètre efficace pour évaluer l’aptitude d’une solution à résoudre une tâche. Cette mesure a prouvé être un objectif sémantique de choix pour guider la recherche métaheuristique de solutions. Cependant, s’il est communément admis que la représentativité des exemples a un impact sur la généralisabilité des solutions, l'étude de cet impact souffre d’un manque de considération flagrant. Dans cette thèse, nous proposons une formulation globale du processus d'apprentissage dans un contexte MDE incluant une méthodologie complète pour caractériser et évaluer la relation qui existe entre la généralisabilité des solutions et deux propriétés importantes des exemples, leur taille et leur couverture. Nous effectuons l’analyse empirique de ces deux propriétés et nous proposons un plan détaillé pour une analyse plus approfondie du concept de représentativité, ou d’autres représentativités.Model-Driven Engineering (MDE) is a software development approach that proposes to raise the level of abstraction of languages in order to shift the design and understanding effort from a programmer point of view to the one of decision makers. However, the manipulation of these abstract representations, or models, has become so complex that traditional techniques are not enough to automate its inherent tasks. For its part, the Search-Based Software Engineering (SBSE) proposes to reformulate the automation of MDE tasks as optimization problems. Once reformulated, the problem will be solved by metaheuristic algorithms. With a plethora of studies on the subject, the power of automation of SBSE has been well established. Based on this observation, the Example-Based MDE community (EB-MDE) started using application examples to feed the reformulation into SBSE of the MDE task learning problem. In this context, the concordance of the output of the solutions with the examples becomes an effective barometer for evaluating the ability of a solution to solve a task. This measure has proved to be a semantic goal of choice to guide the metaheuristic search for solutions. However, while it is commonly accepted that the representativeness of the examples has an impact on the generalizability of the solutions, the study of this impact suffers from a flagrant lack of consideration. In this thesis, we propose a thorough formulation of the learning process in an MDE context including a complete methodology to characterize and evaluate the relation that exists between two important properties of the examples, their size and coverage, and the generalizability of the solutions. We perform an empirical analysis, and propose a detailed plan for further investigation of the concept of representativeness, or of other representativities
    corecore