41,279 research outputs found
MLPerf Inference Benchmark
Machine-learning (ML) hardware and software system demand is burgeoning.
Driven by ML applications, the number of different ML inference systems has
exploded. Over 100 organizations are building ML inference chips, and the
systems that incorporate existing models span at least three orders of
magnitude in power consumption and five orders of magnitude in performance;
they range from embedded devices to data-center solutions. Fueling the hardware
are a dozen or more software frameworks and libraries. The myriad combinations
of ML hardware and ML software make assessing ML-system performance in an
architecture-neutral, representative, and reproducible manner challenging.
There is a clear need for industry-wide standard ML benchmarking and evaluation
criteria. MLPerf Inference answers that call. In this paper, we present our
benchmarking method for evaluating ML inference systems. Driven by more than 30
organizations as well as more than 200 ML engineers and practitioners, MLPerf
prescribes a set of rules and best practices to ensure comparability across
systems with wildly differing architectures. The first call for submissions
garnered more than 600 reproducible inference-performance measurements from 14
organizations, representing over 30 systems that showcase a wide range of
capabilities. The submissions attest to the benchmark's flexibility and
adaptability.Comment: ISCA 202
Fuzzy Adaptive Tuning of a Particle Swarm Optimization Algorithm for Variable-Strength Combinatorial Test Suite Generation
Combinatorial interaction testing is an important software testing technique
that has seen lots of recent interest. It can reduce the number of test cases
needed by considering interactions between combinations of input parameters.
Empirical evidence shows that it effectively detects faults, in particular, for
highly configurable software systems. In real-world software testing, the input
variables may vary in how strongly they interact, variable strength
combinatorial interaction testing (VS-CIT) can exploit this for higher
effectiveness. The generation of variable strength test suites is a
non-deterministic polynomial-time (NP) hard computational problem
\cite{BestounKamalFuzzy2017}. Research has shown that stochastic
population-based algorithms such as particle swarm optimization (PSO) can be
efficient compared to alternatives for VS-CIT problems. Nevertheless, they
require detailed control for the exploitation and exploration trade-off to
avoid premature convergence (i.e. being trapped in local optima) as well as to
enhance the solution diversity. Here, we present a new variant of PSO based on
Mamdani fuzzy inference system
\cite{Camastra2015,TSAKIRIDIS2017257,KHOSRAVANIAN2016280}, to permit adaptive
selection of its global and local search operations. We detail the design of
this combined algorithm and evaluate it through experiments on multiple
synthetic and benchmark problems. We conclude that fuzzy adaptive selection of
global and local search operations is, at least, feasible as it performs only
second-best to a discrete variant of PSO, called DPSO. Concerning obtaining the
best mean test suite size, the fuzzy adaptation even outperforms DPSO
occasionally. We discuss the reasons behind this performance and outline
relevant areas of future work.Comment: 21 page
Input Prioritization for Testing Neural Networks
Deep neural networks (DNNs) are increasingly being adopted for sensing and
control functions in a variety of safety and mission-critical systems such as
self-driving cars, autonomous air vehicles, medical diagnostics, and industrial
robotics. Failures of such systems can lead to loss of life or property, which
necessitates stringent verification and validation for providing high
assurance. Though formal verification approaches are being investigated,
testing remains the primary technique for assessing the dependability of such
systems. Due to the nature of the tasks handled by DNNs, the cost of obtaining
test oracle data---the expected output, a.k.a. label, for a given input---is
high, which significantly impacts the amount and quality of testing that can be
performed. Thus, prioritizing input data for testing DNNs in meaningful ways to
reduce the cost of labeling can go a long way in increasing testing efficacy.
This paper proposes using gauges of the DNN's sentiment derived from the
computation performed by the model, as a means to identify inputs that are
likely to reveal weaknesses. We empirically assessed the efficacy of three such
sentiment measures for prioritization---confidence, uncertainty, and
surprise---and compare their effectiveness in terms of their fault-revealing
capability and retraining effectiveness. The results indicate that sentiment
measures can effectively flag inputs that expose unacceptable DNN behavior. For
MNIST models, the average percentage of inputs correctly flagged ranged from
88% to 94.8%
Recovering Grammar Relationships for the Java Language Specification
Grammar convergence is a method that helps discovering relationships between
different grammars of the same language or different language versions. The key
element of the method is the operational, transformation-based representation
of those relationships. Given input grammars for convergence, they are
transformed until they are structurally equal. The transformations are composed
from primitive operators; properties of these operators and the composed chains
provide quantitative and qualitative insight into the relationships between the
grammars at hand. We describe a refined method for grammar convergence, and we
use it in a major study, where we recover the relationships between all the
grammars that occur in the different versions of the Java Language
Specification (JLS). The relationships are represented as grammar
transformation chains that capture all accidental or intended differences
between the JLS grammars. This method is mechanized and driven by nominal and
structural differences between pairs of grammars that are subject to
asymmetric, binary convergence steps. We present the underlying operator suite
for grammar transformation in detail, and we illustrate the suite with many
examples of transformations on the JLS grammars. We also describe the
extraction effort, which was needed to make the JLS grammars amenable to
automated processing. We include substantial metadata about the convergence
process for the JLS so that the effort becomes reproducible and transparent
A Critical Review of "Automatic Patch Generation Learned from Human-Written Patches": Essay on the Problem Statement and the Evaluation of Automatic Software Repair
At ICSE'2013, there was the first session ever dedicated to automatic program
repair. In this session, Kim et al. presented PAR, a novel template-based
approach for fixing Java bugs. We strongly disagree with key points of this
paper. Our critical review has two goals. First, we aim at explaining why we
disagree with Kim and colleagues and why the reasons behind this disagreement
are important for research on automatic software repair in general. Second, we
aim at contributing to the field with a clarification of the essential ideas
behind automatic software repair. In particular we discuss the main evaluation
criteria of automatic software repair: understandability, correctness and
completeness. We show that depending on how one sets up the repair scenario,
the evaluation goals may be contradictory. Eventually, we discuss the nature of
fix acceptability and its relation to the notion of software correctness.Comment: ICSE 2014, India (2014
A Testability Analysis Framework for Non-Functional Properties
This paper presents background, the basic steps and an example for a
testability analysis framework for non-functional properties
A MOSAIC of methods: Improving ortholog detection through integration of algorithmic diversity
Ortholog detection (OD) is a critical step for comparative genomic analysis
of protein-coding sequences. In this paper, we begin with a comprehensive
comparison of four popular, methodologically diverse OD methods: MultiParanoid,
Blat, Multiz, and OMA. In head-to-head comparisons, these methods are shown to
significantly outperform one another 12-30% of the time. This high
complementarity motivates the presentation of the first tool for integrating
methodologically diverse OD methods. We term this program MOSAIC, or Multiple
Orthologous Sequence Analysis and Integration by Cluster optimization. Relative
to component and competing methods, we demonstrate that MOSAIC more than
quintuples the number of alignments for which all species are present, while
simultaneously maintaining or improving functional-, phylogenetic-, and
sequence identity-based measures of ortholog quality. Further, we demonstrate
that this improvement in alignment quality yields 40-280% more confidently
aligned sites. Combined, these factors translate to higher estimated levels of
overall conservation, while at the same time allowing for the detection of up
to 180% more positively selected sites. MOSAIC is available as python package.
MOSAIC alignments, source code, and full documentation are available at
http://pythonhosted.org/bio-MOSAIC
Approaches for advancing scientific understanding of macrosystems
The emergence of macrosystems ecology (MSE), which focuses on regional- to continental-scale ecological patterns and processes, builds upon a history of long-term and broad-scale studies in ecology. Scientists face the difficulty of integrating the many elements that make up macrosystems, which consist of hierarchical processes at interacting spatial and temporal scales. Researchers must also identify the most relevant scales and variables to be considered, the required data resources, and the appropriate study design to provide the proper inferences. The large volumes of multi-thematic data often associated with macrosystem studies typically require validation, standardization, and assimilation. Finally, analytical approaches need to describe how cross-scale and hierarchical dynamics and interactions relate to macroscale phenomena. Here, we elaborate on some key methodological challenges of MSE research and discuss existing and novel approaches to meet them
- …