13,900 research outputs found
Practical recommendations for gradient-based training of deep architectures
Learning algorithms related to artificial neural networks and in particular
for Deep Learning may seem to involve many bells and whistles, called
hyper-parameters. This chapter is meant as a practical guide with
recommendations for some of the most commonly used hyper-parameters, in
particular in the context of learning algorithms based on back-propagated
gradient and gradient-based optimization. It also discusses how to deal with
the fact that more interesting results can be obtained when allowing one to
adjust many hyper-parameters. Overall, it describes elements of the practice
used to successfully and efficiently train and debug large-scale and often deep
multi-layer neural networks. It closes with open questions about the training
difficulties observed with deeper architectures
Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients
Recent work has established an empirically successful framework for adapting
learning rates for stochastic gradient descent (SGD). This effectively removes
all needs for tuning, while automatically reducing learning rates over time on
stationary problems, and permitting learning rates to grow appropriately in
non-stationary tasks. Here, we extend the idea in three directions, addressing
proper minibatch parallelization, including reweighted updates for sparse or
orthogonal gradients, improving robustness on non-smooth loss functions, in the
process replacing the diagonal Hessian estimation procedure that may not always
be available by a robust finite-difference approximation. The final algorithm
integrates all these components, has linear complexity and is hyper-parameter
free.Comment: Published at the First International Conference on Learning
Representations (ICLR-2013). Public reviews are available at
http://openreview.net/document/c14f2204-fd66-4d91-bed4-153523694041#c14f2204-fd66-4d91-bed4-15352369404
Toward a Robust Sparse Data Representation for Wireless Sensor Networks
Compressive sensing has been successfully used for optimized operations in
wireless sensor networks. However, raw data collected by sensors may be neither
originally sparse nor easily transformed into a sparse data representation.
This paper addresses the problem of transforming source data collected by
sensor nodes into a sparse representation with a few nonzero elements. Our
contributions that address three major issues include: 1) an effective method
that extracts population sparsity of the data, 2) a sparsity ratio guarantee
scheme, and 3) a customized learning algorithm of the sparsifying dictionary.
We introduce an unsupervised neural network to extract an intrinsic sparse
coding of the data. The sparse codes are generated at the activation of the
hidden layer using a sparsity nomination constraint and a shrinking mechanism.
Our analysis using real data samples shows that the proposed method outperforms
conventional sparsity-inducing methods.Comment: 8 page
Adversarial Dropout for Supervised and Semi-supervised Learning
Recently, the training with adversarial examples, which are generated by
adding a small but worst-case perturbation on input examples, has been proved
to improve generalization performance of neural networks. In contrast to the
individually biased inputs to enhance the generality, this paper introduces
adversarial dropout, which is a minimal set of dropouts that maximize the
divergence between the outputs from the network with the dropouts and the
training supervisions. The identified adversarial dropout are used to
reconfigure the neural network to train, and we demonstrated that training on
the reconfigured sub-network improves the generalization performance of
supervised and semi-supervised learning tasks on MNIST and CIFAR-10. We
analyzed the trained model to reason the performance improvement, and we found
that adversarial dropout increases the sparsity of neural networks more than
the standard dropout does.Comment: submitted to AAAI-1
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
- …