28 research outputs found
Training Deep Neural Networks via Direct Loss Minimization
Supervised training of deep neural nets typically relies on minimizing
cross-entropy. However, in many domains, we are interested in performing well
on metrics specific to the application. In this paper we propose a direct loss
minimization approach to train deep neural networks, which provably minimizes
the application-specific loss function. This is often non-trivial, since these
functions are neither smooth nor decomposable and thus are not amenable to
optimization with standard gradient-based methods. We demonstrate the
effectiveness of our approach in the context of maximizing average precision
for ranking problems. Towards this goal, we develop a novel dynamic programming
algorithm that can efficiently compute the weight updates. Our approach proves
superior to a variety of baselines in the context of action classification and
object detection, especially in the presence of label noise.Comment: ICML201
Deep Structured Energy-Based Image Inpainting
In this paper, we propose a structured image inpainting method employing an
energy based model. In order to learn structural relationship between patterns
observed in images and missing regions of the images, we employ an energy-based
structured prediction method. The structural relationship is learned by
minimizing an energy function which is defined by a simple convolutional neural
network. The experimental results on various benchmark datasets show that our
proposed method significantly outperforms the state-of-the-art methods which
use Generative Adversarial Networks (GANs). We obtained 497.35 mean squared
error (MSE) on the Olivetti face dataset compared to 833.0 MSE provided by the
state-of-the-art method. Moreover, we obtained 28.4 dB peak signal to noise
ratio (PSNR) on the SVHN dataset and 23.53 dB on the CelebA dataset, compared
to 22.3 dB and 21.3 dB, provided by the state-of-the-art methods, respectively.
The code is publicly available.Comment: Accepted to 24th International Conference on Pattern Recognition
(ICPR 2018). 6 pages, 7 figure
Learning Discriminators as Energy Networks in Adversarial Learning
We propose a novel framework for structured prediction via adversarial
learning. Existing adversarial learning methods involve two separate networks,
i.e., the structured prediction models and the discriminative models, in the
training. The information captured by discriminative models complements that in
the structured prediction models, but few existing researches have studied on
utilizing such information to improve structured prediction models at the
inference stage. In this work, we propose to refine the predictions of
structured prediction models by effectively integrating discriminative models
into the prediction. Discriminative models are treated as energy-based models.
Similar to the adversarial learning, discriminative models are trained to
estimate scores which measure the quality of predicted outputs, while
structured prediction models are trained to predict contrastive outputs with
maximal energy scores. In this way, the gradient vanishing problem is
ameliorated, and thus we are able to perform inference by following the ascent
gradient directions of discriminative models to refine structured prediction
models. The proposed method is able to handle a range of tasks, e.g.,
multi-label classification and image segmentation. Empirical results on these
two tasks validate the effectiveness of our learning method
Learning Surrogate Losses
The minimization of loss functions is the heart and soul of Machine Learning.
In this paper, we propose an off-the-shelf optimization approach that can
minimize virtually any non-differentiable and non-decomposable loss function
(e.g. Miss-classification Rate, AUC, F1, Jaccard Index, Mathew Correlation
Coefficient, etc.) seamlessly. Our strategy learns smooth relaxation versions
of the true losses by approximating them through a surrogate neural network.
The proposed loss networks are set-wise models which are invariant to the order
of mini-batch instances. Ultimately, the surrogate losses are learned jointly
with the prediction model via bilevel optimization. Empirical results on
multiple datasets with diverse real-life loss functions compared with
state-of-the-art baselines demonstrate the efficiency of learning surrogate
losses
Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs
We approach structured output prediction by optimizing a deep value network
(DVN) to precisely estimate the task loss on different output configurations
for a given input. Once the model is trained, we perform inference by gradient
descent on the continuous relaxations of the output variables to find outputs
with promising scores from the value network. When applied to image
segmentation, the value network takes an image and a segmentation mask as
inputs and predicts a scalar estimating the intersection over union between the
input and ground truth masks. For multi-label classification, the DVN's
objective is to correctly predict the F1 score for any potential label
configuration. The DVN framework achieves the state-of-the-art results on
multi-label prediction and image segmentation benchmarks.Comment: Published at ICML 201
Deep Structured Prediction with Nonlinear Output Transformations
Deep structured models are widely used for tasks like semantic segmentation,
where explicit correlations between variables provide important prior
information which generally helps to reduce the data needs of deep nets.
However, current deep structured models are restricted by oftentimes very local
neighborhood structure, which cannot be increased for computational complexity
reasons, and by the fact that the output configuration, or a representation
thereof, cannot be transformed further. Very recent approaches which address
those issues include graphical model inference inside deep nets so as to permit
subsequent non-linear output space transformations. However, optimization of
those formulations is challenging and not well understood. Here, we develop a
novel model which generalizes existing approaches, such as structured
prediction energy networks, and discuss a formulation which maintains
applicability of existing inference techniques.Comment: Appearing in NIPS 201
Few-Shot Learning Through an Information Retrieval Lens
Few-shot learning refers to understanding new concepts from only a few
examples. We propose an information retrieval-inspired approach for this
problem that is motivated by the increased importance of maximally leveraging
all the available information in this low-data regime. We define a training
objective that aims to extract as much information as possible from each
training batch by effectively optimizing over all relative orderings of the
batch points simultaneously. In particular, we view each batch point as a
`query' that ranks the remaining ones based on its predicted relevance to them
and we define a model within the framework of structured prediction to optimize
mean Average Precision over these rankings. Our method achieves impressive
results on the standard few-shot classification benchmarks while is also
capable of few-shot retrieval
Learning to Teach with Dynamic Loss Functions
Teaching is critical to human society: it is with teaching that prospective
students are educated and human civilization can be inherited and advanced. A
good teacher not only provides his/her students with qualified teaching
materials (e.g., textbooks), but also sets up appropriate learning objectives
(e.g., course projects and exams) considering different situations of a
student. When it comes to artificial intelligence, treating machine learning
models as students, the loss functions that are optimized act as perfect
counterparts of the learning objective set by the teacher. In this work, we
explore the possibility of imitating human teaching behaviors by dynamically
and automatically outputting appropriate loss functions to train machine
learning models. Different from typical learning settings in which the loss
function of a machine learning model is predefined and fixed, in our framework,
the loss function of a machine learning model (we call it student) is defined
by another machine learning model (we call it teacher). The ultimate goal of
teacher model is cultivating the student to have better performance measured on
development dataset. Towards that end, similar to human teaching, the teacher,
a parametric model, dynamically outputs different loss functions that will be
used and optimized by its student model at different training stages. We
develop an efficient learning method for the teacher model that makes gradient
based optimization possible, exempt of the ineffective solutions such as policy
optimization. We name our method as "learning to teach with dynamic loss
functions" (L2T-DLF for short). Extensive experiments on real world tasks
including image classification and neural machine translation demonstrate that
our method significantly improves the quality of various student models.Comment: NIPS 201
Task-based End-to-end Model Learning in Stochastic Optimization
With the increasing popularity of machine learning techniques, it has become
common to see prediction algorithms operating within some larger process.
However, the criteria by which we train these algorithms often differ from the
ultimate criteria on which we evaluate them. This paper proposes an end-to-end
approach for learning probabilistic machine learning models in a manner that
directly captures the ultimate task-based objective for which they will be
used, within the context of stochastic programming. We present three
experimental evaluations of the proposed approach: a classical inventory stock
problem, a real-world electrical grid scheduling task, and a real-world energy
storage arbitrage task. We show that the proposed approach can outperform both
traditional modeling and purely black-box policy optimization approaches in
these applications.Comment: In NIPS 2017. Code available at
https://github.com/locuslab/e2e-model-learnin
Dissimilarity Coefficient based Weakly Supervised Object Detection
We consider the problem of weakly supervised object detection, where the
training samples are annotated using only image-level labels that indicate the
presence or absence of an object category. In order to model the uncertainty in
the location of the objects, we employ a dissimilarity coefficient based
probabilistic learning objective. The learning objective minimizes the
difference between an annotation agnostic prediction distribution and an
annotation aware conditional distribution. The main computational challenge is
the complex nature of the conditional distribution, which consists of terms
over hundreds or thousands of variables. The complexity of the conditional
distribution rules out the possibility of explicitly modeling it. Instead, we
exploit the fact that deep learning frameworks rely on stochastic optimization.
This allows us to use a state of the art discrete generative model that can
provide annotation consistent samples from the conditional distribution.
Extensive experiments on PASCAL VOC 2007 and 2012 data sets demonstrate the
efficacy of our proposed approach.Comment: Preprin