178,756 research outputs found
Testing web enabled simulation at scale using metamorphic testing
We report on Facebook's deployment of MIA (Metamorphic Interaction Automaton). MIA is used to test Facebook's Web Enabled Simulation, built on a web infrastructure of hundreds of millions of lines of code. MIA tackles the twin problems of test flakiness and the unknowable oracle problem. It uses metamorphic testing to automate continuous integration and regression test execution. MIA also plays the role of a test bot, automatically commenting on all relevant changes submitted for code review. It currently uses a suite of over 40 metamorphic test cases. Even at this extreme scale, a non-trivial metamorphic test suite subset yields outcomes within 20 minutes (sufficient for continuous integration and review processes). Furthermore, our offline mode simulation reduces test flakiness from approximately 50% (of all online tests) to 0% (offline). Metamorphic testing has been widely-studied for 22 years. This paper is the first reported deployment into an industrial continuous integration system
Combining different validation techniques for continuous software improvement - Implications in the development of TRNSYS 16
Validation using published, high quality test suites can serve to identify different problems in simulation software: modeling and coding errors, missing features, frequent sources of user confusion. This paper discusses the application of different published validation procedures during the development of a new TRNSYS version: BESTEST/ASHRAE 140 (Building envelope), HVAC BESTEST (mechanical systems) and IEA ECBCS Annex 21 / SHC Task 12 empirical validation (performance of a test cell with a very simple mechanical system). It is shown that each validation suite has allowed to identify different types of problems. Those validation tools were also used to diagnose and fix some of the identified problems, and to assess the influence of code modifications. The paper also discusses some limitations of the selected validation tools
Recommended from our members
Verification Test Suite for Physics Simulation Codes
The DOE/NNSA Advanced Simulation & Computing (ASC) Program directs the development, demonstration and deployment of physics simulation codes. The defensible utilization of these codes for high-consequence decisions requires rigorous verification and validation of the simulation software. The physics and engineering codes used at Los Alamos National Laboratory (LANL), Lawrence Livermore National Laboratory (LLNL), and Sandia National Laboratory (SNL) are arguably among the most complex utilized in computational science. Verification represents an important aspect of the development, assessment and application of simulation software for physics and engineering. The purpose of this note is to formally document the existing tri-laboratory suite of verification problems used by LANL, LLNL, and SNL, i.e., the Tri-Lab Verification Test Suite. Verification is often referred to as ensuring that ''the [discrete] equations are solved [numerically] correctly''. More precisely, verification develops evidence of mathematical consistency between continuum partial differential equations (PDEs) and their discrete analogues, and provides an approach by which to estimate discretization errors. There are two variants of verification: (1) code verification, which compares simulation results to known analytical solutions, and (2) calculation verification, which estimates convergence rates and discretization errors without knowledge of a known solution. Together, these verification analyses support defensible verification and validation (V&V) of physics and engineering codes that are used to simulate complex problems that do not possess analytical solutions. Discretization errors (e.g., spatial and temporal errors) are embedded in the numerical solutions of the PDEs that model the relevant governing equations. Quantifying discretization errors, which comprise only a portion of the total numerical simulation error, is possible through code and calculation verification. Code verification computes the absolute value of discretization errors relative to an exact solution of the governing equations. In contrast, calculation verification, which does not utilize a reference solution, combines an assessment of stable self-convergence and exact solution prediction to quantitatively estimate discretization errors. In FY01, representatives of the V&V programs at LANL, LLNL, and SNL identified a set of verification test problems for the Accelerated Strategic Computing Initiative (ASCI) Program. Specifically, a set of code verification test problems that exercise relevant single- and multiple-physics packages was agreed upon. The verification test suite problems can be evaluated in multidimensional geometry and span both smooth and non-smooth behavior
The CMS Simulation Software
In this paper we present the features and the expected performance of the re-designed CMS simulation software, as well as the experience from the migration process. Today, the CMS simulation suite is based on the two principal components - Geant4 detector simulation toolkit and the new CMS offline Framework and Event Data Model. The simulation chain includes event generation, detector simulation, and digitization steps. With Geant4, we employ the full set of electromagnetic and hadronic physics processes and detailed particle tracking in the 4 Tesla magnetic field. The Framework provides "action on demand" mechanisms, to allow users to load dynamically the desired modules and to configure and tune the final application at the run time. The simulation suite is used to model the complete central CMS detector (over 1 million of geometrical volumes) and the forward systems, such as Castor calorimeter and Zero Degree Calorimeter, the Totem telescopes, Roman Pots, and the Luminosity Monitor. The designs also previews the use of the electromagnetic and hadronic showers parametrization, instead of full modelling of high energy particles passage through a complex hierarchy of volumes and materials, allowing significant gain in speed while tuning the simulation to test beam and collider data. Physics simulation has been extensively validated by comparison with test beam data and previous simulation results. The redesigned and upgraded simulation software was exercised for performance and robustness tests. It went into Production in July 2006, running in the US and EU grids, and has since delivered about 60 millions of events
Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of System-level Testing of Autonomous Vehicles
AI-powered systems have gained widespread popularity in various domains,
including Autonomous Vehicles (AVs). However, ensuring their reliability and
safety is challenging due to their complex nature. Conventional test adequacy
metrics, designed to evaluate the effectiveness of traditional software
testing, are often insufficient or impractical for these systems. White-box
metrics, which are specifically designed for these systems, leverage neuron
coverage information. These coverage metrics necessitate access to the
underlying AI model and training data, which may not always be available.
Furthermore, the existing adequacy metrics exhibit weak correlations with the
ability to detect faults in the generated test suite, creating a gap that we
aim to bridge in this study.
In this paper, we introduce a set of black-box test adequacy metrics called
"Test suite Instance Space Adequacy" (TISA) metrics, which can be used to gauge
the effectiveness of a test suite. The TISA metrics offer a way to assess both
the diversity and coverage of the test suite and the range of bugs detected
during testing. Additionally, we introduce a framework that permits testers to
visualise the diversity and coverage of the test suite in a two-dimensional
space, facilitating the identification of areas that require improvement.
We evaluate the efficacy of the TISA metrics by examining their correlation
with the number of bugs detected in system-level simulation testing of AVs. A
strong correlation, coupled with the short computation time, indicates their
effectiveness and efficiency in estimating the adequacy of testing AVs.Comment: 12 pages, 7 figure
The Effectiveness of Using a Modified “Beat Frequent Pick” Algorithm in the First International RoShamBo Tournament
In this study, a bot is developed to compete in the first International RoShamBo Tournament test suite. The basic “Beat Frequent Pick (BFP)” algorithm was taken from the supplied test suite and was improved by adding a random choice tailored fit against the opponent\u27s distribution of picks. A training program was also developed that finds the best performing bot variant by changing the bot\u27s behavior in terms of the timing of the recomputation of the pick distribution. Simulation results demonstrate the significantly improved performance of the proposed variant over the original BFP. This indicates the potential of using the core technique (of the proposed variant) as an Artificial Intelligence bot to similarly applicable computer games
- …