41,279 research outputs found

    MLPerf Inference Benchmark

    Full text link
    Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.Comment: ISCA 202

    Fuzzy Adaptive Tuning of a Particle Swarm Optimization Algorithm for Variable-Strength Combinatorial Test Suite Generation

    Full text link
    Combinatorial interaction testing is an important software testing technique that has seen lots of recent interest. It can reduce the number of test cases needed by considering interactions between combinations of input parameters. Empirical evidence shows that it effectively detects faults, in particular, for highly configurable software systems. In real-world software testing, the input variables may vary in how strongly they interact, variable strength combinatorial interaction testing (VS-CIT) can exploit this for higher effectiveness. The generation of variable strength test suites is a non-deterministic polynomial-time (NP) hard computational problem \cite{BestounKamalFuzzy2017}. Research has shown that stochastic population-based algorithms such as particle swarm optimization (PSO) can be efficient compared to alternatives for VS-CIT problems. Nevertheless, they require detailed control for the exploitation and exploration trade-off to avoid premature convergence (i.e. being trapped in local optima) as well as to enhance the solution diversity. Here, we present a new variant of PSO based on Mamdani fuzzy inference system \cite{Camastra2015,TSAKIRIDIS2017257,KHOSRAVANIAN2016280}, to permit adaptive selection of its global and local search operations. We detail the design of this combined algorithm and evaluate it through experiments on multiple synthetic and benchmark problems. We conclude that fuzzy adaptive selection of global and local search operations is, at least, feasible as it performs only second-best to a discrete variant of PSO, called DPSO. Concerning obtaining the best mean test suite size, the fuzzy adaptation even outperforms DPSO occasionally. We discuss the reasons behind this performance and outline relevant areas of future work.Comment: 21 page

    Input Prioritization for Testing Neural Networks

    Full text link
    Deep neural networks (DNNs) are increasingly being adopted for sensing and control functions in a variety of safety and mission-critical systems such as self-driving cars, autonomous air vehicles, medical diagnostics, and industrial robotics. Failures of such systems can lead to loss of life or property, which necessitates stringent verification and validation for providing high assurance. Though formal verification approaches are being investigated, testing remains the primary technique for assessing the dependability of such systems. Due to the nature of the tasks handled by DNNs, the cost of obtaining test oracle data---the expected output, a.k.a. label, for a given input---is high, which significantly impacts the amount and quality of testing that can be performed. Thus, prioritizing input data for testing DNNs in meaningful ways to reduce the cost of labeling can go a long way in increasing testing efficacy. This paper proposes using gauges of the DNN's sentiment derived from the computation performed by the model, as a means to identify inputs that are likely to reveal weaknesses. We empirically assessed the efficacy of three such sentiment measures for prioritization---confidence, uncertainty, and surprise---and compare their effectiveness in terms of their fault-revealing capability and retraining effectiveness. The results indicate that sentiment measures can effectively flag inputs that expose unacceptable DNN behavior. For MNIST models, the average percentage of inputs correctly flagged ranged from 88% to 94.8%

    Recovering Grammar Relationships for the Java Language Specification

    Get PDF
    Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

    A Critical Review of "Automatic Patch Generation Learned from Human-Written Patches": Essay on the Problem Statement and the Evaluation of Automatic Software Repair

    Get PDF
    At ICSE'2013, there was the first session ever dedicated to automatic program repair. In this session, Kim et al. presented PAR, a novel template-based approach for fixing Java bugs. We strongly disagree with key points of this paper. Our critical review has two goals. First, we aim at explaining why we disagree with Kim and colleagues and why the reasons behind this disagreement are important for research on automatic software repair in general. Second, we aim at contributing to the field with a clarification of the essential ideas behind automatic software repair. In particular we discuss the main evaluation criteria of automatic software repair: understandability, correctness and completeness. We show that depending on how one sets up the repair scenario, the evaluation goals may be contradictory. Eventually, we discuss the nature of fix acceptability and its relation to the notion of software correctness.Comment: ICSE 2014, India (2014

    A Testability Analysis Framework for Non-Functional Properties

    Full text link
    This paper presents background, the basic steps and an example for a testability analysis framework for non-functional properties

    A MOSAIC of methods: Improving ortholog detection through integration of algorithmic diversity

    Full text link
    Ortholog detection (OD) is a critical step for comparative genomic analysis of protein-coding sequences. In this paper, we begin with a comprehensive comparison of four popular, methodologically diverse OD methods: MultiParanoid, Blat, Multiz, and OMA. In head-to-head comparisons, these methods are shown to significantly outperform one another 12-30% of the time. This high complementarity motivates the presentation of the first tool for integrating methodologically diverse OD methods. We term this program MOSAIC, or Multiple Orthologous Sequence Analysis and Integration by Cluster optimization. Relative to component and competing methods, we demonstrate that MOSAIC more than quintuples the number of alignments for which all species are present, while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality. Further, we demonstrate that this improvement in alignment quality yields 40-280% more confidently aligned sites. Combined, these factors translate to higher estimated levels of overall conservation, while at the same time allowing for the detection of up to 180% more positively selected sites. MOSAIC is available as python package. MOSAIC alignments, source code, and full documentation are available at http://pythonhosted.org/bio-MOSAIC

    Approaches for advancing scientific understanding of macrosystems

    Get PDF
    The emergence of macrosystems ecology (MSE), which focuses on regional- to continental-scale ecological patterns and processes, builds upon a history of long-term and broad-scale studies in ecology. Scientists face the difficulty of integrating the many elements that make up macrosystems, which consist of hierarchical processes at interacting spatial and temporal scales. Researchers must also identify the most relevant scales and variables to be considered, the required data resources, and the appropriate study design to provide the proper inferences. The large volumes of multi-thematic data often associated with macrosystem studies typically require validation, standardization, and assimilation. Finally, analytical approaches need to describe how cross-scale and hierarchical dynamics and interactions relate to macroscale phenomena. Here, we elaborate on some key methodological challenges of MSE research and discuss existing and novel approaches to meet them
    corecore