36,542 research outputs found
Methods and metrics for selective regression testing
In corrective software maintenance, selective regression testing includes test selection from previously-run test suites and test coverage identification. We propose three reduction-based regression test selection methods and two McCabe-based coverage identification metrics (T. McCabe, 1976). We empirically compare these methods with three other reduction- and precision-oriented methods, using 60 test problems. The comparison shows that our proposed methods yield favourable result
Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
Deep Reinforcement Learning (DRL) has achieved impressive success in many
applications. A key component of many DRL models is a neural network
representing a Q function, to estimate the expected cumulative reward following
a state-action pair. The Q function neural network contains a lot of implicit
knowledge about the RL problems, but often remains unexamined and
uninterpreted. To our knowledge, this work develops the first mimic learning
framework for Q functions in DRL. We introduce Linear Model U-trees (LMUTs) to
approximate neural network predictions. An LMUT is learned using a novel
on-line algorithm that is well-suited for an active play setting, where the
mimic learner observes an ongoing interaction between the neural net and the
environment. Empirical evaluation shows that an LMUT mimics a Q function
substantially better than five baseline methods. The transparent tree structure
of an LMUT facilitates understanding the network's learned knowledge by
analyzing feature influence, extracting rules, and highlighting the
super-pixels in image inputs.Comment: This paper is accepted by ECML-PKDD 201
Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization
Protecting vast quantities of data poses a daunting challenge for the growing
number of organizations that collect, stockpile, and monetize it. The ability
to distinguish data that is actually needed from data collected "just in case"
would help these organizations to limit the latter's exposure to attack. A
natural approach might be to monitor data use and retain only the working-set
of in-use data in accessible storage; unused data can be evicted to a highly
protected store. However, many of today's big data applications rely on machine
learning (ML) workloads that are periodically retrained by accessing, and thus
exposing to attack, the entire data store. Training set minimization methods,
such as count featurization, are often used to limit the data needed to train
ML workloads to improve performance or scalability. We present Pyramid, a
limited-exposure data management system that builds upon count featurization to
enhance data protection. As such, Pyramid uniquely introduces both the idea and
proof-of-concept for leveraging training set minimization methods to instill
rigor and selectivity into big data management. We integrated Pyramid into
Spark Velox, a framework for ML-based targeting and personalization. We
evaluate it on three applications and show that Pyramid approaches
state-of-the-art models while training on less than 1% of the raw data
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
- …