1,688 research outputs found
Stochastic Training of Neural Networks via Successive Convex Approximations
This paper proposes a new family of algorithms for training neural networks
(NNs). These are based on recent developments in the field of non-convex
optimization, going under the general name of successive convex approximation
(SCA) techniques. The basic idea is to iteratively replace the original
(non-convex, highly dimensional) learning problem with a sequence of (strongly
convex) approximations, which are both accurate and simple to optimize.
Differently from similar ideas (e.g., quasi-Newton algorithms), the
approximations can be constructed using only first-order information of the
neural network function, in a stochastic fashion, while exploiting the overall
structure of the learning problem for a faster convergence. We discuss several
use cases, based on different choices for the loss function (e.g., squared loss
and cross-entropy loss), and for the regularization of the NN's weights. We
experiment on several medium-sized benchmark problems, and on a large-scale
dataset involving simulated physical data. The results show how the algorithm
outperforms state-of-the-art techniques, providing faster convergence to a
better minimum. Additionally, we show how the algorithm can be easily
parallelized over multiple computational units without hindering its
performance. In particular, each computational unit can optimize a tailored
surrogate function defined on a randomly assigned subset of the input
variables, whose dimension can be selected depending entirely on the available
computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and
Learning System
Practical recommendations for gradient-based training of deep architectures
Learning algorithms related to artificial neural networks and in particular
for Deep Learning may seem to involve many bells and whistles, called
hyper-parameters. This chapter is meant as a practical guide with
recommendations for some of the most commonly used hyper-parameters, in
particular in the context of learning algorithms based on back-propagated
gradient and gradient-based optimization. It also discusses how to deal with
the fact that more interesting results can be obtained when allowing one to
adjust many hyper-parameters. Overall, it describes elements of the practice
used to successfully and efficiently train and debug large-scale and often deep
multi-layer neural networks. It closes with open questions about the training
difficulties observed with deeper architectures
Clasificación de sentimientos semi-supervisada y dependiente de objetivo para micro- blogs
The wealth of opinions expressed in micro-blogs, such as tweets, motivated researchers to develop techniques for automatic opinion detection.
However, accuracies of such techniques are still limited. Moreover, current techniques focus on detecting sentiment polarity regardless of the topic (target) discussed. Detecting sentiment towards a specific target, referred to as target-dependent sentiment classification, has not received adequate researchers’ attention. Literature review has shown that all target-dependent approaches use supervised learning techniques. Such techniques need a large number of labeled data. However, labeling data in social media is cumbersome and error prone. The research presented in this paper addresses this issue by employing semi-supervised learning techniques for target-dependent sentiment classification. Semisupervised learning techniques make use of labeled as well as unlabeled data. In this paper, we present a new semi-supervised learning technique that uses less number of labeled micro-blogs than that used by supervised learning techniques. Experiment results have shown that the proposed technique provides comparable accuracy.Facultad de Informátic
Clasificación de sentimientos semi-supervisada y dependiente de objetivo para micro- blogs
The wealth of opinions expressed in micro-blogs, such as tweets, motivated researchers to develop techniques for automatic opinion detection.
However, accuracies of such techniques are still limited. Moreover, current techniques focus on detecting sentiment polarity regardless of the topic (target) discussed. Detecting sentiment towards a specific target, referred to as target-dependent sentiment classification, has not received adequate researchers’ attention. Literature review has shown that all target-dependent approaches use supervised learning techniques. Such techniques need a large number of labeled data. However, labeling data in social media is cumbersome and error prone. The research presented in this paper addresses this issue by employing semi-supervised learning techniques for target-dependent sentiment classification. Semisupervised learning techniques make use of labeled as well as unlabeled data. In this paper, we present a new semi-supervised learning technique that uses less number of labeled micro-blogs than that used by supervised learning techniques. Experiment results have shown that the proposed technique provides comparable accuracy.Facultad de Informátic
DC Proximal Newton for Non-Convex Optimization Problems
We introduce a novel algorithm for solving learning problems where both the
loss function and the regularizer are non-convex but belong to the class of
difference of convex (DC) functions. Our contribution is a new general purpose
proximal Newton algorithm that is able to deal with such a situation. The
algorithm consists in obtaining a descent direction from an approximation of
the loss function and then in performing a line search to ensure sufficient
descent. A theoretical analysis is provided showing that the iterates of the
proposed algorithm {admit} as limit points stationary points of the DC
objective function. Numerical experiments show that our approach is more
efficient than current state of the art for a problem with a convex loss
functions and non-convex regularizer. We have also illustrated the benefit of
our algorithm in high-dimensional transductive learning problem where both loss
function and regularizers are non-convex
- …