    Quality definitions and defect classes used in experiments on software inspection

    I oppgaven er det analysert 15 artikler om software inspection, med mål å finne ut hvordan kvalitet og defekt klasser er definert i disse artiklene

    Are the perspectives really different? Further experimentation on scenario-based reading of requirements

    Perspective-Based Reading (PBR) is a scenario based inspection technique where several reviewers read a document from different perspectives (e.g. user, designer, tester). The reading is made according to a special scenario, specific for each perspective. The basic assumption behind PBR is that the perspectives find different defects and a combination of several perspectives detects more defects compared to the same amount of reading with a single perspective. The paper presents a study which analyses the differences in perspectives. The study is a partial replication of previous studies. It is conducted in an academic environment using graduate students as subjects. Each perspective applies a specific modelling technique: use case modelling for the user perspective, equivalence partitioning for the tester perspective and structured analysis for the design perspective. A total of 30 subjects were divided into 3 groups, giving 10 subjects per perspective. The analysis results show that: (1) there is no significant difference among the three perspectives in terms of defect detection rate and number of defects found per hour, (2) there is no significant difference in the defect coverage of the three perspectives, and (3) a simulation study shows that 30 subjects is enough to detect relatively small perspective differences with the chosen statistical test. The results suggest that a combination of multiple perspectives may not give higher coverage of the defects compared to single-perspective reading, but further studies are needed to increase the understanding of perspective differenc

    A Design Theory for Secure Semantic E-Business Processes (SSEBP)

    This dissertation develops and evaluates a Design theory. We follow the design science approach (Hevener, et al., 2004) to answer the following research question: "How can we formulate a design theory to guide the analysis and design of Secure Semantic eBusiness processes (SSeBP)?" Goals of SSeBP design theory include (i) unambiguously represent information and knowledge resources involved in eBusiness processes to solve semantic conflicts and integrate heterogeneous information systems; (ii) analyze and model business processes that include access control mechanisms to prevent unauthorized access to resources; and (iii) facilitate the coordination of eBusiness process activities-resources by modeling their dependencies. Business processes modeling techniques such as Business Process Modeling Notation (BPMN) (BPMI, 2004) and UML Activity Diagrams (OMG, 2003) lack theoretical foundations and are difficult to verify for correctness and completeness (Soffer and Wand, 2007). Current literature on secure information systems design methods are theoretically underdeveloped and consider security as a non-functional requirement and as an afterthought (Siponen et al. 2006, Mouratidis et al., 2005). SSeBP design theory is one of the first attempts at providing theoretically grounded guidance to design richer secure eBusiness processes for secure and coordinated seamless knowledge exchange among business partners in a value chain. SSeBP design theory allows for the inclusion of non-repudiation mechanisms into the analysis and design of eBusiness processes which lays the foundations for auditing and compliance with regulations such as Sarbanes-Oxley. SSeBP design theory is evaluated through a rigorous multi-method evaluation approach including descriptive, observational, and experimental evaluation. First, SSeBP design theory is validated by modeling business processes of an industry standard named Collaborative Planning, Forecasting, and Replenishment (CPFR) approach. Our model enhances CPFR by incorporating security requirements in the process model, which is critically lacking in the current CPFR technical guidelines. Secondly, we model the demand forecasting and capacity planning business processes for two large organizations to evaluate the efficacy and utility of SSeBP design theory to capture the realistic requirements and complex nuances of real inter-organizational business processes. Finally, we empirically evaluate SSeBP, against enhanced Use Cases (Siponen et al., 2006) and UML activity diagrams, for informational equivalence (Larkin and Simon, 1987) and its utility in generating situational awareness (Endsley, 1995) of the security and coordination requirements of a business process. Specific contributions of this dissertation are to develop a design theory (SSeBP) that presents a novel and holistic approach that contributes to the IS knowledge base by filling an existing research gap in the area of design of information systems to support secure and coordinated business processes. The proposed design theory provides practitioners with the meta-design and the design process, including the system components and principles to guide the analysis and design of secure eBusiness processes that are secure and coordinated

    Building knowledge through families of experiments

    Quality of Design, Analysis and Reporting of Software Engineering Experiments:A Systematic Review

    Background: Like any research discipline, software engineering research must be of a certain quality to be valuable. High quality research in software engineering ensures that knowledge is accumulated and helpful advice is given to the industry. One way of assessing research quality is to conduct systematic reviews of the published research literature. Objective: The purpose of this work was to assess the quality of published experiments in software engineering with respect to the validity of inference and the quality of reporting. More specifically, the aim was to investigate the level of statistical power, the analysis of effect size, the handling of selection bias in quasi-experiments, and the completeness and consistency of the reporting of information regarding subjects, experimental settings, design, analysis, and validity. Furthermore, the work aimed at providing suggestions for improvements, using the potential deficiencies detected as a basis. Method: The quality was assessed by conducting a systematic review of the 113 experiments published in nine major software engineering journals and three conference proceedings in the decade 1993-2002. Results: The review revealed that software engineering experiments were generally designed with unacceptably low power and that inadequate attention was paid to issues of statistical power. Effect sizes were sparsely reported and not interpreted with respect to their practical importance for the particular context. There seemed to be little awareness of the importance of controlling for selection bias in quasi-experiments. Moreover, the review revealed a need for more complete and standardized reporting of information, which is crucial for understanding software engineering experiments and judging their results. Implications: The consequence of low power is that the actual effects of software engineering technologies will not be detected to an acceptable extent. The lack of reporting of effect sizes and the improper interpretation of effect sizes result in ignorance of the practical importance, and thereby the relevance to industry, of experimental results. The lack of control for selection bias in quasi-experiments may make these experiments less credible than randomized experiments. This is an unsatisfactory situation, because quasi-experiments serve an important role in investigating cause-effect relationships in software engineering, for example, in industrial settings. Finally, the incomplete and unstandardized reporting makes it difficult for the reader to understand an experiment and judge its results. Conclusions: Insufficient quality was revealed in the reviewed experiments. This has implications for inferences drawn from the experiments and might in turn lead to the accumulation of erroneous information and the offering of misleading advice to the industry. Ways to improve this situation are suggested

    Component-based software engineering: a quantitative approach

    Dissertação apresentada para a obtenção do Grau de Doutor em Informática pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaBackground: Often, claims in Component-Based Development (CBD) are only supported by qualitative expert opinion, rather than by quantitative data. This contrasts with the normal practice in other sciences, where a sound experimental validation of claims is standard practice. Experimental Software Engineering (ESE) aims to bridge this gap. Unfortunately, it is common to find experimental validation efforts that are hard to replicate and compare, to build up the body of knowledge in CBD. Objectives: In this dissertation our goals are (i) to contribute to evolution of ESE, in what concerns the replicability and comparability of experimental work, and (ii) to apply our proposals to CBD, thus contributing to its deeper and sounder understanding. Techniques: We propose a process model for ESE, aligned with current experimental best practices, and combine this model with a measurement technique called Ontology-Driven Measurement (ODM). ODM is aimed at improving the state of practice in metrics definition and collection, by making metrics definitions formal and executable,without sacrificing their usability. ODM uses standard technologies that can be well adapted to current integrated development environments. Results: Our contributions include the definition and preliminary validation of a process model for ESE and the proposal of ODM for supporting metrics definition and collection in the context of CBD. We use both the process model and ODM to perform a series experimental works in CBD, including the cross-validation of a component metrics set for JavaBeans, a case study on the influence of practitioners expertise in a sub-process of component development (component code inspections), and an observational study on reusability patterns of pluggable components (Eclipse plug-ins). These experimental works implied proposing, adapting, or selecting adequate ontologies, as well as the formal definition of metrics upon each of those ontologies. Limitations: Although our experimental work covers a variety of component models and, orthogonally, both process and product, the plethora of opportunities for using our quantitative approach to CBD is far from exhausted. Conclusions: The main contribution of this dissertation is the illustration, through practical examples, of how we can combine our experimental process model with ODM to support the experimental validation of claims in the context of CBD, in a repeatable and comparable way. In addition, the techniques proposed in this dissertation are generic and can be applied to other software development paradigms.Departamento de Informática of the Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa (FCT/UNL); Centro de Informática e Tecnologias da Informação of the FCT/UNL; Fundação para a Ciência e Tecnologia through the STACOS project(POSI/CHS/48875/2002); The Experimental Software Engineering Network (ESERNET);Association Internationale pour les Technologies Objets (AITO); Association forComputing Machinery (ACM

    Heuristics for use case descriptions.

    Use cases, as part of the Unified Modelling Language, have become an industry standard. The major focus has been on the use case diagram. It is only recently that any detailed attention has been paid to the use case description. The description should be written in such a way as to make it communicable to its reader. However, this does not always appear to be the case. This thesis presents the 7 C's of Communicability as quality features of use case descriptions that make them more comprehensible. The 7 C's are derived from software engineering best practice on use case descriptions and from theories of text comprehension. To help in writing descriptions, the CP Use Case Writing Rules are proposed, a small set of guidelines derived from the 7 C's. Going beyond requirements, software engineers often employ use case descriptions to help them build initial design models of the proposed system. Despite Jacobson's claim that "objects naturally fall out of use cases", fording design-oriented classes and objects in use case descriptions is shown not to be straightforward. This thesis proposes a Question Set which allows the engineer to interrogate the description for important elements of specification and design. Experimentation shows that the CP Writing Rules furnish descriptions that are as comprehensible as those written by other guidelines proposed in the literature. It is also suggested that descriptions be written from the perspective of their intended audience. The limitations of conducting requirements engineering experiments using students are considered and it is suggested that experimenters should not expect large effects from the results. An industrial case study shows that although the CP Rules could not be applied to all events in the use case descriptions, they were applied to most and at varying levels of abstraction. The case study showed that the 7 C's did identify problems with the written descriptions. The Question Set was well received by the case study stakeholders, but it was considered time consuming. One of the overriding findings from the case study was that project time constraints would not allow the company to use the techniques suggested, although they recognised the need to do so. Automation would make industrial application of the CP Rules and 7 C's more feasible

    Leveraging Machine Learning to Improve Software Reliability

    Finding software faults is a critical task during the lifecycle of a software system. While traditional software quality control practices such as statistical defect prediction, static bug detection, regression test, and code review are often inefficient and time-consuming, which cannot keep up with the increasing complexity of modern software systems. We argue that machine learning with its capability in knowledge representation, learning, natural language processing, classification, etc., can be used to extract invaluable information from software artifacts that may be difficult to obtain with other research methodologies to improve existing software reliability practices such as statistical defect prediction, static bug detection, regression test, and code review. This thesis presents a suite of machine learning based novel techniques to improve existing software reliability practices for helping developers find software bugs more effective and efficient. First, it introduces a deep learning based defect prediction technique to improve existing statistical defect prediction models. To build accurate prediction models, previous studies focused on manually designing features that encode the statistical characteristics of programs. However, these features often fail to capture the semantic difference of programs, and such a capability is needed for building accurate prediction models. To bridge the gap between programs' semantics and defect prediction features, this thesis leverages deep learning techniques to learn a semantic representation of programs automatically from source code and further build and train defect prediction models by using these semantic features. We examine the effectiveness of the deep learning based prediction models on both the open-source and commercial projects. Results show that the learned semantic features can significantly outperform existing defect prediction models. Second, it introduces an n-gram language based static bug detection technique, i.e., Bugram, to detect new types of bugs with less false positives. Most of existing static bug detection techniques are based on programming rules inferred from source code. It is known that if a pattern does not appear frequently enough, rules are not learned, thus missing many bugs. To solve this issue, this thesis proposes Bugram, which leverages n-gram language models instead of rules to detect bugs. Specifically, Bugram models program tokens sequentially, using the n-gram language model. Token sequences from the program are then assessed according to their probability in the learned model, and low probability sequences are marked as potential bugs. The assumption is that low probability token sequences in a program are unusual, which may indicate bugs, bad practices, or unusual/special uses of code of which developers may want to be aware. We examine the effectiveness of our approach on the latest versions of 16 open-source projects. Results show that Bugram detected 25 new bugs, 23 of which cannot be detected by existing rule-based bug detection approaches, which suggests that Bugram is complementary to existing bug detection approaches to detect more bugs and generates less false positives. Third, it introduces a machine learning based regression test prioritization technique, i.e., QTEP, to find and run test cases that could reveal bugs earlier. Existing test case prioritization techniques mainly focus on maximizing coverage information between source code and test cases to schedule test cases for finding bugs earlier. While they often do not consider the likely distribution of faults in the source code. However, software faults are not often equally distributed in source code, e.g., around 80\% faults are located in about 20\% source code. Intuitively, test cases that cover the faulty source code should have higher priorities, since they are more likely to find faults. To solve this issue, this thesis proposes QTEP, which leverages machine learning models to evaluate source code quality and then adapt existing test case prioritization algorithms by considering the weighted source code quality. Evaluation on seven open-source projects shows that QTEP can significantly outperform existing test case prioritization techniques to find failed test cases early. Finally, it introduces a machine learning based approach to identifying risky code review requests. Code review has been widely adopted in the development process of both the proprietary and open-source software, which helps improve the maintenance and quality of software before the code changes being merged into the source code repository. Our observation on code review requests from four large-scale projects reveals that around 20\% changes cannot pass the first round code review and require non-trivial revision effort (i.e., risky changes). In addition, resolving these risky changes requires 3X more time and 1.6X more reviewers than the regular changes (i.e., changes pass the first code review) on average. This thesis presents the first study to characterize these risky changes and automatically identify these risky changes with machine learning classifiers. Evaluation on one proprietary project and three large-scale open-source projects (i.e., Qt, Android, and OpenStack) shows that our approach is effective in identifying risky code review requests. Taken together, the results of the four studies provide evidence that machine learning can help improve traditional software reliability such as statistical defect prediction, static bug detection, regression test, and code review

