18,767 research outputs found
Classification with Costly Features using Deep Reinforcement Learning
We study a classification problem where each feature can be acquired for a
cost and the goal is to optimize a trade-off between the expected
classification error and the feature cost. We revisit a former approach that
has framed the problem as a sequential decision-making problem and solved it by
Q-learning with a linear approximation, where individual actions are either
requests for feature values or terminate the episode by providing a
classification decision. On a set of eight problems, we demonstrate that by
replacing the linear approximation with neural networks the approach becomes
comparable to the state-of-the-art algorithms developed specifically for this
problem. The approach is flexible, as it can be improved with any new
reinforcement learning enhancement, it allows inclusion of pre-trained
high-performance classifier, and unlike prior art, its performance is robust
across all evaluated datasets.Comment: AAAI 201
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
Training neural networks is a challenging non-convex optimization problem,
and backpropagation or gradient descent can get stuck in spurious local optima.
We propose a novel algorithm based on tensor decomposition for guaranteed
training of two-layer neural networks. We provide risk bounds for our proposed
method, with a polynomial sample complexity in the relevant parameters, such as
input dimension and number of neurons. While learning arbitrary target
functions is NP-hard, we provide transparent conditions on the function and the
input for learnability. Our training method is based on tensor decomposition,
which provably converges to the global optimum, under a set of mild
non-degeneracy conditions. It consists of simple embarrassingly parallel linear
and multi-linear operations, and is competitive with standard stochastic
gradient descent (SGD), in terms of computational complexity. Thus, we propose
a computationally efficient method with guaranteed risk bounds for training
neural networks with one hidden layer.Comment: The tensor decomposition analysis is expanded, and the analysis of
ridge regression is added for recovering the parameters of last layer of
neural networ
Distribution-Based Categorization of Classifier Transfer Learning
Transfer Learning (TL) aims to transfer knowledge acquired in one problem,
the source problem, onto another problem, the target problem, dispensing with
the bottom-up construction of the target model. Due to its relevance, TL has
gained significant interest in the Machine Learning community since it paves
the way to devise intelligent learning models that can easily be tailored to
many different applications. As it is natural in a fast evolving area, a wide
variety of TL methods, settings and nomenclature have been proposed so far.
However, a wide range of works have been reporting different names for the same
concepts. This concept and terminology mixture contribute however to obscure
the TL field, hindering its proper consideration. In this paper we present a
review of the literature on the majority of classification TL methods, and also
a distribution-based categorization of TL with a common nomenclature suitable
to classification problems. Under this perspective three main TL categories are
presented, discussed and illustrated with examples
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network
Because of their effectiveness in broad practical applications, LSTM networks
have received a wealth of coverage in scientific journals, technical blogs, and
implementation guides. However, in most articles, the inference formulas for
the LSTM network and its parent, RNN, are stated axiomatically, while the
training formulas are omitted altogether. In addition, the technique of
"unrolling" an RNN is routinely presented without justification throughout the
literature. The goal of this paper is to explain the essential RNN and LSTM
fundamentals in a single document. Drawing from concepts in signal processing,
we formally derive the canonical RNN formulation from differential equations.
We then propose and prove a precise statement, which yields the RNN unrolling
technique. We also review the difficulties with training the standard RNN and
address them by transforming the RNN into the "Vanilla LSTM" network through a
series of logical arguments. We provide all equations pertaining to the LSTM
system together with detailed descriptions of its constituent entities. Albeit
unconventional, our choice of notation and the method for presenting the LSTM
system emphasizes ease of understanding. As part of the analysis, we identify
new opportunities to enrich the LSTM system and incorporate these extensions
into the Vanilla LSTM network, producing the most general LSTM variant to date.
The target reader has already been exposed to RNNs and LSTM networks through
numerous available resources and is open to an alternative pedagogical
approach. A Machine Learning practitioner seeking guidance for implementing our
new augmented LSTM model in software for experimentation and research will find
the insights and derivations in this tutorial valuable as well.Comment: 43 pages, 10 figures, 78 reference
- …