1,688 research outputs found

    Stochastic Training of Neural Networks via Successive Convex Approximations

    Full text link
    This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA) techniques. The basic idea is to iteratively replace the original (non-convex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Differently from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the neural network function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN's weights. We experiment on several medium-sized benchmark problems, and on a large-scale dataset involving simulated physical data. The results show how the algorithm outperforms state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and Learning System

    Practical recommendations for gradient-based training of deep architectures

    Full text link
    Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures

    Clasificación de sentimientos semi-supervisada y dependiente de objetivo para micro- blogs

    Get PDF
    The wealth of opinions expressed in micro-blogs, such as tweets, motivated researchers to develop techniques for automatic opinion detection. However, accuracies of such techniques are still limited. Moreover, current techniques focus on detecting sentiment polarity regardless of the topic (target) discussed. Detecting sentiment towards a specific target, referred to as target-dependent sentiment classification, has not received adequate researchers’ attention. Literature review has shown that all target-dependent approaches use supervised learning techniques. Such techniques need a large number of labeled data. However, labeling data in social media is cumbersome and error prone. The research presented in this paper addresses this issue by employing semi-supervised learning techniques for target-dependent sentiment classification. Semisupervised learning techniques make use of labeled as well as unlabeled data. In this paper, we present a new semi-supervised learning technique that uses less number of labeled micro-blogs than that used by supervised learning techniques. Experiment results have shown that the proposed technique provides comparable accuracy.Facultad de Informátic

    Clasificación de sentimientos semi-supervisada y dependiente de objetivo para micro- blogs

    Get PDF
    The wealth of opinions expressed in micro-blogs, such as tweets, motivated researchers to develop techniques for automatic opinion detection. However, accuracies of such techniques are still limited. Moreover, current techniques focus on detecting sentiment polarity regardless of the topic (target) discussed. Detecting sentiment towards a specific target, referred to as target-dependent sentiment classification, has not received adequate researchers’ attention. Literature review has shown that all target-dependent approaches use supervised learning techniques. Such techniques need a large number of labeled data. However, labeling data in social media is cumbersome and error prone. The research presented in this paper addresses this issue by employing semi-supervised learning techniques for target-dependent sentiment classification. Semisupervised learning techniques make use of labeled as well as unlabeled data. In this paper, we present a new semi-supervised learning technique that uses less number of labeled micro-blogs than that used by supervised learning techniques. Experiment results have shown that the proposed technique provides comparable accuracy.Facultad de Informátic

    DC Proximal Newton for Non-Convex Optimization Problems

    Get PDF
    We introduce a novel algorithm for solving learning problems where both the loss function and the regularizer are non-convex but belong to the class of difference of convex (DC) functions. Our contribution is a new general purpose proximal Newton algorithm that is able to deal with such a situation. The algorithm consists in obtaining a descent direction from an approximation of the loss function and then in performing a line search to ensure sufficient descent. A theoretical analysis is provided showing that the iterates of the proposed algorithm {admit} as limit points stationary points of the DC objective function. Numerical experiments show that our approach is more efficient than current state of the art for a problem with a convex loss functions and non-convex regularizer. We have also illustrated the benefit of our algorithm in high-dimensional transductive learning problem where both loss function and regularizers are non-convex
    • …
    corecore