211,689 research outputs found

    Empirical analyses of the factors affecting confirmation bias and the effects of confirmation bias on software developer/tester performance

    Get PDF
    Background: During all levels of software testing, the goal should be to fail the code. However, software developers and testers are more likely to choose positive tests rather than negative ones due to the phenomenon called confirmation bias. Confirmation bias is defined as the tendency of people to verify their hypotheses rather than refuting them. In the literature, there are theories about the possible effects of confirmation bias on software development and testing. Due to the tendency towards positive tests, most of the software defects remain undetected, which in turn leads to an increase in software defect density. Aims: In this study, we analyze factors affecting confirmation bias in order to discover methods to circumvent confirmation bias. The factors, we investigate are experience in software development/testing and reasoning skills that can be gained through education. In addition, we analyze the effect of confirmation bias on software developer and tester performance. Method: In order to measure and quantify confirmation bias levels of software developers/testers, we prepared pen-and-paper and interactive tests based on two tasks from cognitive psychology literature. These tests were conducted on the 36 employees of a large scale telecommunication company in Europe as well as 28 graduate computer engineering students of Bogazici University, resulting in a total of 64 subjects. We evaluated the outcomes of these tests using the metrics we proposed in addition to some basic methods which we inherited from the cognitive psychology literature. Results: Results showed that regardless of experience in software development/testing, abilities such as logical reasoning and strategic hypotheses testing are differentiating factors in low confirmation bias levels. Moreover, the results of the analysis to investigate the relationship between code defect density and confirmation bias levels of software developers and testers showed that there is a direct correlation between confirmation bias and defect proneness of the code. Conclusions: Our findings show that having strong logical reasoning and hypothesis testing skills are differentiating factors in the software developer/tester performance in terms of defect rates. We recommend that companies should focus on improving logical reasoning and hypothesis testing skills of their employees by designing training programs. As future work, we plan to replicate this study in other software development companies. Moreover, we will use confirmation bias metrics in addition to product and process metrics in for software defect prediction. We believe that confirmation bias metrics would improve the prediction performance of learning based defect prediction models which we have been building over a decade

    Predicting Fault-prone Software Module Using Data Mining Technique and Fuzzy Logic

    Get PDF
    This paper discusses a new model towards reliability and quality improvement of software systems by predicting fault-prone module before testing. Model utilizes the classification capability of data mining techniques and knowledge stored in software metrics to classify the software module as fault-prone or not fault-prone. A decision tree is constructed using ID3 algorithm for existing project data in order to gain information for the purpose of decision making whether a particular module id fault-prone or not. The gained information is converted into fuzzy rules and integrated with fuzzy inference system to predict fault-prone or not fault-prone software module for target data. The model is also able to predict fault-proneness degree of faulty module. The goal is to help software manager to concentrate their testing efforts to fault-prone modules in order to improve the reliability and quality of the software system. We used NASA projects data set from the PROMOSE repository to validate the predictive accuracy of the model

    Guiding the retraining of convolutional neural networks against adversarial inputs

    Full text link
    Background: When using deep learning models, there are many possible vulnerabilities and some of the most worrying are the adversarial inputs, which can cause wrong decisions with minor perturbations. Therefore, it becomes necessary to retrain these models against adversarial inputs, as part of the software testing process addressing the vulnerability to these inputs. Furthermore, for an energy efficient testing and retraining, data scientists need support on which are the best guidance metrics and optimal dataset configurations. Aims: We examined four guidance metrics for retraining convolutional neural networks and three retraining configurations. Our goal is to improve the models against adversarial inputs regarding accuracy, resource utilization and time from the point of view of a data scientist in the context of image classification. Method: We conducted an empirical study in two datasets for image classification. We explore: (a) the accuracy, resource utilization and time of retraining convolutional neural networks by ordering new training set by four different guidance metrics (neuron coverage, likelihood-based surprise adequacy, distance-based surprise adequacy and random), (b) the accuracy and resource utilization of retraining convolutional neural networks with three different configurations (from scratch and augmented dataset, using weights and augmented dataset, and using weights and only adversarial inputs). Results: We reveal that retraining with adversarial inputs from original weights and by ordering with surprise adequacy metrics gives the best model w.r.t. the used metrics. Conclusions: Although more studies are necessary, we recommend data scientists to use the above configuration and metrics to deal with the vulnerability to adversarial inputs of deep learning models, as they can improve their models against adversarial inputs without using many inputs

    Adaptive random testing based on distribution metrics

    Get PDF
    Random testing (RT) is a fundamental software testing technique. Adaptive random testing (ART), an enhancement of RT, generally uses fewer test cases than RT to detect the first failure. ART generates test cases in a random manner, together with additional test case selection criteria to enforce that the executed test cases are evenly spread over the input domain. Some studies have been conducted to measure how evenly an ART algorithm can spread its test cases with respect to some distribution metrics. These studies observed that there exists a correlation between the failure detection capability and the evenness of test case distribution. Inspired by this observation, we aim to study whether failure detection capability of ART can be enhanced by using distribution metrics as criteria for the test case selection process. Our simulations and empirical results show that the newly proposed algorithms not only improve the evenness of test case distribution, but also enhance the failure detection capability of ART

    Analysis of Slice-Based Metrics for Aspect-Oriented Programs

    Get PDF
    To improve separation of concerns in software design and implementation, the technique of Aspect-Oriented Programming (AOP) was introduced. But AOP has a lot of features like aspects, advices, point-cuts, join-points etc., and because of these the usage of the existing intermediate graph representations is rendered useless. In our work we have defined a new intermediate graph representation for AOP. The construction of SDG is automated by analysing the bytecode of aspect-oriented programs that incorporates the representation of aspect-oriented features. After constructing the SDG, we propose a slicing algorithm that uses the intermediate graph and computes slices for a given AOP. Program slicing has numerous applications in software engineering activities like debugging, testing, maintenance, model checking etc. To implement our proposed slicing technique, we have developed a prototype tool that takes an AOP as input and compute its slices using our proposed slicing algorithm. To evaluate our proposed technique, we have considered some case studies by taking open source projects. The comparative study of our proposed slicing algorithm with some existing algorithms show that our approach is an efficient and scalable approach of slicing for different applications with respect to aspect-oriented programs. Software metrics are used to measure certain aspects of software. Using the slicing approach we have computed eight software metrics which quantitatively and qualitatively analyse the whole aspect project. We have compiled a metrics suite for AOP and an automated prototype tool is developed for helping the process of SDLC

    Toward Data-Driven Discovery of Software Vulnerabilities

    Get PDF
    Over the years, Software Engineering, as a discipline, has recognized the potential for engineers to make mistakes and has incorporated processes to prevent such mistakes from becoming exploitable vulnerabilities. These processes span the spectrum from using unit/integration/fuzz testing, static/dynamic/hybrid analysis, and (automatic) patching to discover instances of vulnerabilities to leveraging data mining and machine learning to collect metrics that characterize attributes indicative of vulnerabilities. Among these processes, metrics have the potential to uncover systemic problems in the product, process, or people that could lead to vulnerabilities being introduced, rather than identifying specific instances of vulnerabilities. The insights from metrics can be used to support developers and managers in making decisions to improve the product, process, and/or people with the goal of engineering secure software. Despite empirical evidence of metrics\u27 association with historical software vulnerabilities, their adoption in the software development industry has been limited. The level of granularity at which the metrics are defined, the high false positive rate from models that use the metrics as explanatory variables, and, more importantly, the difficulty in deriving actionable intelligence from the metrics are often cited as factors that inhibit metrics\u27 adoption in practice. Our research vision is to assist software engineers in building secure software by providing a technique that generates scientific, interpretable, and actionable feedback on security as the software evolves. In this dissertation, we present our approach toward achieving this vision through (1) systematization of vulnerability discovery metrics literature, (2) unsupervised generation of metrics-informed security feedback, and (3) continuous developer-in-the-loop improvement of the feedback. We systematically reviewed the literature to enumerate metrics that have been proposed and/or evaluated to be indicative of vulnerabilities in software and to identify the validation criteria used to assess the decision-informing ability of these metrics. In addition to enumerating the metrics, we implemented a subset of these metrics as containerized microservices. We collected the metric values from six large open-source projects and assessed metrics\u27 generalizability across projects, application domains, and programming languages. We then used an unsupervised approach from literature to compute threshold values for each metric and assessed the thresholds\u27 ability to classify risk from historical vulnerabilities. We used the metrics\u27 values, thresholds, and interpretation to provide developers natural language feedback on security as they contributed changes and used a survey to assess their perception of the feedback. We initiated an open dialogue to gain an insight into their expectations from such feedback. In response to developer comments, we assessed the effectiveness of an existing vulnerability discovery approach—static analysis—and that of vulnerability discovery metrics in identifying risk from vulnerability contributing commits

    Software quality attribute measurement and analysis based on class diagram metrics

    Get PDF
    Software quality measurement lies at the heart of the quality engineering process. Quality measurement for object-oriented artifacts has become the key for ensuring high quality software. Both researchers and practitioners are interested in measuring software product quality for improvement. It has recently become more important to consider the quality of products at the early phases, especially at the design level to ensure that the coding and testing would be conducted more quickly and accurately. The research work on measuring quality at the design level progressed in a number of steps. The first step was to discover the correct set of metrics to measure design elements at the design level. Chidamber and Kemerer (C&K) formulated the first suite of OO metrics. Other researchers extended on this suite and provided additional metrics. The next step was to collect these metrics by using software tools. A number of tools were developed to measure the different suites of metrics; some represent their measurements in the form of ordinary numbers, others represent them in 3D visual form. In recent years, researchers developed software quality models which went a bit further by computing quality attributes from collected design metrics. In this research we extended on the software quality modelers’ work by adding a quality attribute prioritization scheme and a design metric analysis layer. Our work is all focused on the class diagram, the most fundamental constituent in any object oriented design. Using earlier researchers’ work, we extract a class diagram’s metrics and compute its quality attributes. We then analyze the results and inform the user. We present our figures and observations in the form of an analysis report. Our target user could be a project manager or a software quality engineer or a developer who needs to improve the class diagram’s quality. We closely examine the design metrics that affect quality attributes. We pinpoint the weaknesses in the class diagram, based on these metrics, inform the user about the problems that emerged from these classes, and advice him/her as to how he/she can go about improving the overall design quality. We consider the six basic quality attributes: “Reusability”, “Functionality”, “Understandability”, “Flexibility”, “Extendibility”, and “Effectiveness” of the whole class diagram. We allow the user to set priorities on these quality attributes in a sequential manner based on his/her requirements. Using a geometric series, we calculate a weighted average value for the arranged list of quality attributes. This weighted average value indicates the overall quality of the product, the class diagram. Our experimental work gave us much insight into the meanings and dependencies between design metrics and quality attributes. This helped us refine our analysis technique and give more concrete observations to the user

    Guiding the retraining of convolutional neural networks against adversarial inputs

    Get PDF
    Background: When using deep learning models, one of the most critical vulnerabilities is their exposure to adversarial inputs, which can cause wrong decisions (e.g., incorrect classification of an image) with minor perturbations. To address this vulnerability, it becomes necessary to retrain the affected model against adversarial inputs as part of the software testing process. In order to make this process energy efficient, data scientists need support on which are the best guidance metrics for reducing the adversarial inputs to create and use during testing, as well as optimal dataset configurations. Aim: We examined six guidance metrics for retraining deep learning models, specifically with convolutional neural network architecture, and three retraining configurations. Our goal is to improve the convolutional neural networks against the attack of adversarial inputs with regard to the accuracy, resource utilization and execution time from the point of view of a data scientist in the context of image classification. Method: We conducted an empirical study using five datasets for image classification. We explore: (a) the accuracy, resource utilization, and execution time of retraining convolutional neural networks with the guidance of six different guidance metrics (neuron coverage, likelihood-based surprise adequacy, distance-based surprise adequacy, DeepGini, softmax entropy and random), (b) the accuracy and resource utilization of retraining convolutional neural networks with three different configurations (one-step adversarial retraining, adversarial retraining and adversarial fine-tuning). Results: We reveal that adversarial retraining from original model weights, and by ordering with uncertainty metrics, gives the best model w.r.t. accuracy, resource utilization, and execution time. Conclusions: Although more studies are necessary, we recommend data scientists use the above configuration and metrics to deal with the vulnerability to adversarial inputs of deep learning models, as they can improve their models against adversarial inputs without using many inputs and without creating numerous adversarial inputs. We also show that dataset size has an important impact on the results.This work was supported by the GAISSA Spanish research project (ref. TED2021-130923B-I00; MCIN/AEI/10.13039/501100011033), the “UNAM-DGECI: Iniciación a la Investigación (verano otoño 2021)” scholarship provided by Universidad Nacional Autónoma de México (UNAM), the “Beatriz Galindo” Spanish Program BEAGAL18/00064, the Austrian Science Fund (FWF): I 4701-N and the project Continuous Testing in Production (ConTest) funded by the Austrian Research Promotion Agency (FFG): 888127.Peer ReviewedObjectius de Desenvolupament Sostenible::7 - Energia Assequible i No ContaminantObjectius de Desenvolupament Sostenible::13 - Acció per al ClimaPostprint (published version

    An investigation into telecommunications billing system testing processes

    Get PDF
    Testing is an important part of the software development process, since it ultimately determines the quality of the product or service that the end user is provided. As error correction costs increase exponentially with time, it is important to resolve software defects as early as possible. The same applies to telecommunications billing software, where the level of competitiveness demands that the testing process be both accurate and efficient. The investigation undertaken aimed to evaluate and improve the testing process of a company that develops telecommunications billing software, Nokia Siemens Networks (NSN). The methodology used to perform the study involved the use of the Goal Question Metric (GQM) approach, which has been used extensively for process measurement and improvement. A research model was developed which derived process goals from the key research questions, ensuring that the research questions could be answered from the goal results. Four goals were determined using this method. These goals were to improve defect detection accuracy, defect correction accuracy, defect detection efficiency and defect correction efficiency. This led to 14 questions and 95 metrics in total. Defect detection accuracy was found to be insufficient, defect correction accuracy was determined to be satisfactory, defect detection efficiency was a key goal, and it was found to be unsatisfactory, while defect correction efficiency was acceptable, however there were many cases where error resolution was slow. Several specific proposals for improvement are suggested, as well as general process improvement suggestions. The process can be improved overall by using the agile Scrum approach. Scrum's cross-functional teams coupled with development testing through Test-driven Development will ensure that detection accuracy and efficiency are improved. The study found that because the process is more traditional than agile and separates testing and development, it is not well suited to the size of the projects and their timelines. In order to meet the needs of the industry and release quality services competitively, a more agile approach needs to be taken. The research conducted provides a contribution to a field where research is scarce, and provides evidence of the insufficiency of traditional development processes in small telecommunications projects, while motivating the use of agile methodologies to meet organisational goals
    corecore