30,433 research outputs found

    Fairness Testing: Testing Software for Discrimination

    Full text link
    This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination.Comment: Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness Testing: Testing Software for Discrimination. In Proceedings of 2017 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), Paderborn, Germany, September 4-8, 2017 (ESEC/FSE'17). https://doi.org/10.1145/3106237.3106277, ESEC/FSE, 201

    Functional Testing of Feature Model Analysis Tools: a Test Suite

    Get PDF
    A Feature Model (FM) is a compact representation of all the products of a software product line. Automated analysis of FMs is rapidly gaining importance: new operations of analysis have been proposed, new tools have been developed to support those operations and different logical paradigms and algorithms have been proposed to perform them. Implementing operations is a complex task that easily leads to errors in analysis solutions. In this context, the lack of specific testing mechanisms is becoming a major obstacle hindering the development of tools and affecting their quality and reliability. In this article, we present FaMa Test Suite, a set of implementationā€“independent test cases to validate the functionality of FM analysis tools. This is an efficient and handy mechanism to assist in the development of tools, detecting faults and improving their quality. In order to show the effectiveness of our proposal, we evaluated the suite using mutation testing as well as real faults and tools. Our results are promising and directly applicable in the testing of analysis solutions. We intend this work to be a first step toward the development of a widely accepted test suite to support functional testing in the community of automated analysis of feature models.CICYT TIN2009-07366CICYT TIN2006-00472Junta de AndalucĆ­a TIC-253

    Diurnal ocean surface layer model validation

    Get PDF
    The diurnal ocean surface layer (DOSL) model at the Fleet Numerical Oceanography Center forecasts the 24-hour change in a global sea surface temperatures (SST). Validating the DOSL model is a difficult task due to the huge areas involved and the lack of in situ measurements. Therefore, this report details the use of satellite infrared multichannel SST imagery to provide day and night SSTs that can be directly compared to DOSL products. This water-vapor-corrected imagery has the advantages of high thermal sensitivity (0.12 C), large synoptic coverage (nearly 3000 km across), and high spatial resolution that enables diurnal heating events to be readily located and mapped. Several case studies in the subtropical North Atlantic readily show that DOSL results during extreme heating periods agree very well with satellite-imagery-derived values in terms of the pattern of diurnal warming. The low wind and cloud-free conditions necessary for these events to occur lend themselves well to observation via infrared imagery. Thus, the normally cloud-limited aspects of satellite imagery do not come into play for these particular environmental conditions. The fact that the DOSL model does well in extreme events is beneficial from the standpoint that these cases can be associated with the destruction of the surface acoustic duct. This so-called afternoon effect happens as the afternoon warming of the mixed layer disrupts the sound channel and the propagation of acoustic energy

    Sonic Booms in Atmospheric Turbulence (SonicBAT): The Influence of Turbulence on Shaped Sonic Booms

    Get PDF
    The objectives of the Sonic Booms in Atmospheric Turbulence (SonicBAT) Program were to develop and validate, via research flight experiments under a range of realistic atmospheric conditions, one numeric turbulence model research code and one classic turbulence model research code using traditional N-wave booms in the presence of atmospheric turbulence, and to apply these models to assess the effects of turbulence on the levels of shaped sonic booms predicted from low boom aircraft designs. The SonicBAT program has successfully investigated sonic boom turbulence effects through the execution of flight experiments at two NASA centers, Armstrong Flight Research Center (AFRC) and Kennedy Space Center (KSC), collecting a comprehensive set of acoustic and atmospheric turbulence data that were used to validate the numeric and classic turbulence models developed. The validated codes were incorporated into the PCBoom sonic boom prediction software and used to estimate the effect of turbulence on the levels of shaped sonic booms associated with several low boom aircraft designs. The SonicBAT program was a four year effort that consisted of turbulence model development and refinement throughout the entire period as well as extensive flight test planning that culminated with the two research flight tests being conducted in the second and third years of the program. The SonicBAT team, led by Wyle, includes partners from the Pennsylvania State University, Lockheed Martin, Gulfstream Aerospace, Boeing, Eagle Aeronautics, Technical & Business Systems, and the Laboratory of Fluid Mechanics and Acoustics (France). A number of collaborators, including the Japan Aerospace Exploration Agency, also participated by supporting the experiments with human and equipment resources at their own expense. Three NASA centers, AFRC, Langley Research Center (LaRC), and KSC were essential to the planning and conduct of the experiments. The experiments involved precision flight of either an F-18A or F-18B executing steady, level passes at supersonic airspeeds in a turbulent atmosphere to create sonic boom signatures that had been distorted by turbulence. The flights spanned a range of atmospheric turbulence conditions at NASA Armstrong and Kennedy in order to provide a variety of conditions for code validations. The SonicBAT experiments at both sites were designed to capture simultaneous F-18A or F-18B onboard flight instrumentation data, high fidelity ground based and airborne acoustic data, surface and upper air meteorological data, and additional meteorological data from ultrasonic anemometers and SODARs to determine the local atmospheric turbulence and boundary layer height

    Detecting adversarial manipulation using inductive Venn-ABERS predictors

    Get PDF
    Inductive Venn-ABERS predictors (IVAPs) are a type of probabilistic predictors with the theoretical guarantee that their predictions are perfectly calibrated. In this paper, we propose to exploit this calibration property for the detection of adversarial examples in binary classification tasks. By rejecting predictions if the uncertainty of the IVAP is too high, we obtain an algorithm that is both accurate on the original test set and resistant to adversarial examples. This robustness is observed on adversarials for the underlying model as well as adversarials that were generated by taking the IVAP into account. The method appears to offer competitive robustness compared to the state-of-the-art in adversarial defense yet it is computationally much more tractable
    • ā€¦
    corecore