187,965 research outputs found

    An Empirical Study on the Effectiveness of Testing Metrics to Test Deep Learning Models

    Get PDF
    In recent years, Deep Learning (DL) models have widely been applied to develop safety and security critical systems. The recent evolvement of Deep Neural Networks (DNNs) is the key reason behind the unprecedented achievements in image classification, object detection, medical image analysis, speech recog nition, and autonomous driving. However, DL models often remain a black box for their practitioners due to the lack of interpretability and explainability. DL practitioners generally use standard metrics such as Precision, Recall, and F1 score to evaluate the performance of DL models on the test dataset. However, as high-quality test data is not frequently accessed, the expected level of accuracy of these standard metrics on test datasets cannot justify the trustworthiness of testing adequacy, generality and robustness of DL models. The way we ensure the quality of DL models is still in its infancy; hence, a scalable DL model testing frame work is highly demanded in the context of software testing. The existing techniques for testing traditional software systems could not be directly applicable to DL models because of the fundamental difference in pro gramming paradigm, systems development methodologies, and processes. However, several testing metrics (e.g., Neuron Coverage (NC), Confusion and Bias error metrics, and Multi-granularity metrics) have been proposed leveraging the concept of test coverage in traditional software testing to measure the robustness of DL models and the quality of the test datasets. Although test coverage is highly effective to test traditional software systems, the effectiveness of DL coverage metrics must be evaluated in testing the robustness of DL models and measuring the quality of the test datasets. In addition, the selected testing metrics work on the activated neurons of a DL model. In our study, we consider the neuron count of a DL model differently than the existing studies. For example, according to our calculation the LeNet-5 model has 6508 neurons whereas other studies consider the LeNet-5 model contains 268 neurons only. Therefore, it is also important to in vestigate how neurons’ concept (e.g., the idea of having neurons in the DL model and the way we calculate the number of neurons a DL model does have) impact the testing metrics. In this thesis, we thus conduct an exploratory study for evaluating the effectiveness of the testing metrics to test DL models not only in measuring their robustness but also in assessing the quality of the test datasets. Furthermore, since selected testing metrics work on the activated neurons of a DL model, we also investigate the impact of the neurons’ concepts on the testing metrics. To conduct our experiments, we select popular publicly available datasets (e.g., MNIST, Fashion MNIST, CIFAR-10, ImageNet and so on) and train DL models on them. We also select sate-of-the-art DL models (e.g., VGG-16, VGG-19, ResNet-50, ResNet-101 and so on) trained on the ImageNet dataset. Our experimental results demonstrate that whatever the neuron’s concepts are, NC and Multi-granularity testing metrics are ineffective in evaluating the robustness of DL models and in assessing the quality of the test datasets. In addition, the selection of threshold values has a negligible impact on the NC metric. Increasing the coverage values of the Multi-granularity testing metrics can not separate regular test data from adversarial test data. Our exploratory study also shows that the DL models still make accurate predictions with higher coverage values of Multi-granularity metrics than the false predictions. Therefore, it is not always true that increasing coverage values of the Multi-granularity testing metrics find more defects of DL models. Finally, the Precision and Recall scores show that the Confusion and Bias error metrics are adequate to detect class-level violations of the DL models

    Quality Assurance of Software Models - A Structured Quality Assurance Process Supported by a Flexible Tool Environment in the Eclipse Modeling Project

    Get PDF
    The paradigm of model-based software development (MBSD) has become more and more popular since it promises an increase in the efficiency and quality of software development. In this paradigm, software models play an increasingly important role and software quality and quality assurance consequently leads back to the quality and quality assurance of the involved models. The fundamental aim of this thesis is the definition of a structured syntax-oriented process for quality assurance of software models that can be adapted to project-specific and domain-specific needs. It is structured into two sub-processes: a process for the specification of project-specific model quality assurance techniques, and a process for applying them on concrete software models within a MBSD project. The approach concentrates on quality aspects to be checked on the abstract model syntax and is based on quality assurance techniques model metrics, smells, and refactorings well-known from literature. So far, these techniques are mostly considered in isolation only and therefore the proposed process integrates them in order to perform model quality assurance more systematically. Three example cases performing the process serve as proof-of-concept implementations and show its applicability, its flexibility, and hence its usefulness. Related to several issues concerning model quality assurance minor contributions of this thesis are (1) the definition of a quality model for model quality that consists of high-level quality attributes and low-level characteristics, (2) overviews on metrics, smells, and refactorings for UML class models including structured descriptions of each technique, and (3) an approach for composite model refactoring that concentrates on the specification of refactoring composition. Since manually reviewing models is time consuming and error prone, several tasks of the proposed process should consequently be automated. As a further main contribution, this thesis presents a flexible tool environment for model quality assurance which is based on the Eclipse Modeling Framework (EMF), a common open source technology in model-based software development. The tool set is part of the Eclipse Modeling Project (EMP) and belongs to the Eclipse incubation project EMF Refactor which is available under the Eclipse public license (EPL). The EMF Refactor framework supports both the model designer and the model reviewer by obtaining metrics reports, by checking for potential model deficiencies (called model smells) and by systematically restructuring models using refactorings. The functionality of EMF Refactor is integrated into standard tree-based EMF instance editors, graphical GMF-based editors as used by Papyrus UML, and textual editors provided by Xtext. Several experiments and studies show the suitability of the tools for supporting the techniques of the structured syntax-oriented model quality assurance process

    Tool Support for Inspecting the Code Quality of HPC Applications

    Get PDF
    The nature of HPC application development encourages ad hoc design and implementation, rather than formal requirements analysis and design specification as is typical in software engineering. However, we cannot simply expect HPC developers to adopt formal software engineering processes wholesale, even while there is a need to improve software structure and quality to ensure future maintainability. Therefore, we propose tools that HPC developers can use at their discretion to obtain feedback on the structure and quality of their codes. This feedback would come in the form of code quality metrics and analyses, presented when necessary in intuitive and interactive visualizations. This paper summarizes our implementation of just such a tool, which we apply to a standard HPC benchmark as ''proof-of-concept.'

    Software acquisition: evolution, Total Quality Management and applications to the Army Tactical Missile System

    Get PDF
    Software acquisition has become the critical path in the procurement of Department of Defense (DOD) weapon systems. Software requirements and their complexity have increased at an exponential rate and support requirements now constitute up to 70 percent of the software life cycle costs. This thesis presents the concept of software Total Quality Management (TQM) which focuses on the entire process of software acquisition, as a partial solution to the software acquisition crisis. A software case study, analysis, and lessons learned with applications to the Army Tactical Missile System (TACMS) is presented. A software process control maturity model, a standard software language, and a set of software metrics are presented. A discussion of program manager's responsibilities to implement a process control mechanism to produce quality software products is presented. The principal finding is that software acquisition is the major challenge to a program manager for waepon systems procurement. The major recommendation of this study is that software TQM can be applied to software acquisitionhttp://archive.org/details/softwareacquisit00barbCaptain, United States ArmyApproved for public release; distribution is unlimited

    Using quality models in software package selection

    Get PDF
    The growing importance of commercial off-the-shelf software packages requires adapting some software engineering practices, such as requirements elicitation and testing, to this emergent framework. Also, some specific new activities arise, among which selection of software packages plays a prominent role. All the methodologies that have been proposed recently for choosing software packages compare user requirements with the packages' capabilities. There are different types of requirements, such as managerial, political, and, of course, quality requirements. Quality requirements are often difficult to check. This is partly due to their nature, but there is another reason that can be mitigated, namely the lack of structured and widespread descriptions of package domains (that is, categories of software packages such as ERP systems, graphical or data structure libraries, and so on). This absence hampers the accurate description of software packages and the precise statement of quality requirements, and consequently overall package selection and confidence in the result of the process. Our methodology for building structured quality models helps solve this drawback.Peer ReviewedPostprint (published version

    Systematic evaluation of software product line architectures

    Get PDF
    The architecture of a software product line is one of its most important artifacts as it represents an abstraction of the products that can be generated. It is crucial to evaluate the quality attributes of a product line architecture in order to: increase the productivity of the product line process and the quality of the products; provide a means to understand the potential behavior of the products and, consequently, decrease their time to market; and, improve the handling of the product line variability. The evaluation of product line architecture can serve as a basis to analyze the managerial and economical values of a product line for software managers and architects. Most of the current research on the evaluation of product line architecture does not take into account metrics directly obtained from UML models and their variabilities; the metrics used instead are difficult to be applied in general and to be used for quantitative analysis. This paper presents a Systematic Evaluation Method for UML-based Software Product Line Architecture, the SystEM-PLA. SystEM-PLA differs from current research as it provides stakeholders with a means to: (i) estimate and analyze potential products; (ii) use predefined basic UML-based metrics to compose quality attribute metrics; (iii) perform feasibility and trade-off analysis of a product line architecture with respect to its quality attributes; and, (iv) make the evaluation of product line architecture more flexible. An example using the SEI’s Arcade Game Maker (AGM) product line is presented as a proof of concept, illustrating SystEM-PLA activities. Metrics for complexity and extensibility quality attributes are defined and used to perform a trade-off analysis

    CloudHealth: A Model-Driven Approach to Watch the Health of Cloud Services

    Full text link
    Cloud systems are complex and large systems where services provided by different operators must coexist and eventually cooperate. In such a complex environment, controlling the health of both the whole environment and the individual services is extremely important to timely and effectively react to misbehaviours, unexpected events, and failures. Although there are solutions to monitor cloud systems at different granularity levels, how to relate the many KPIs that can be collected about the health of the system and how health information can be properly reported to operators are open questions. This paper reports the early results we achieved in the challenge of monitoring the health of cloud systems. In particular we present CloudHealth, a model-based health monitoring approach that can be used by operators to watch specific quality attributes. The CloudHealth Monitoring Model describes how to operationalize high level monitoring goals by dividing them into subgoals, deriving metrics for the subgoals, and using probes to collect the metrics. We use the CloudHealth Monitoring Model to control the probes that must be deployed on the target system, the KPIs that are dynamically collected, and the visualization of the data in dashboards.Comment: 8 pages, 2 figures, 1 tabl
    • …
    corecore