1,265 research outputs found

    Clasificación de sentimientos semi-supervisada y dependiente de objetivo para micro- blogs

    Get PDF
    The wealth of opinions expressed in micro-blogs, such as tweets, motivated researchers to develop techniques for automatic opinion detection. However, accuracies of such techniques are still limited. Moreover, current techniques focus on detecting sentiment polarity regardless of the topic (target) discussed. Detecting sentiment towards a specific target, referred to as target-dependent sentiment classification, has not received adequate researchers’ attention. Literature review has shown that all target-dependent approaches use supervised learning techniques. Such techniques need a large number of labeled data. However, labeling data in social media is cumbersome and error prone. The research presented in this paper addresses this issue by employing semi-supervised learning techniques for target-dependent sentiment classification. Semisupervised learning techniques make use of labeled as well as unlabeled data. In this paper, we present a new semi-supervised learning technique that uses less number of labeled micro-blogs than that used by supervised learning techniques. Experiment results have shown that the proposed technique provides comparable accuracy.Facultad de Informátic

    Clasificación de sentimientos semi-supervisada y dependiente de objetivo para micro- blogs

    Get PDF
    The wealth of opinions expressed in micro-blogs, such as tweets, motivated researchers to develop techniques for automatic opinion detection. However, accuracies of such techniques are still limited. Moreover, current techniques focus on detecting sentiment polarity regardless of the topic (target) discussed. Detecting sentiment towards a specific target, referred to as target-dependent sentiment classification, has not received adequate researchers’ attention. Literature review has shown that all target-dependent approaches use supervised learning techniques. Such techniques need a large number of labeled data. However, labeling data in social media is cumbersome and error prone. The research presented in this paper addresses this issue by employing semi-supervised learning techniques for target-dependent sentiment classification. Semisupervised learning techniques make use of labeled as well as unlabeled data. In this paper, we present a new semi-supervised learning technique that uses less number of labeled micro-blogs than that used by supervised learning techniques. Experiment results have shown that the proposed technique provides comparable accuracy.Facultad de Informátic

    DC Proximal Newton for Non-Convex Optimization Problems

    Get PDF
    We introduce a novel algorithm for solving learning problems where both the loss function and the regularizer are non-convex but belong to the class of difference of convex (DC) functions. Our contribution is a new general purpose proximal Newton algorithm that is able to deal with such a situation. The algorithm consists in obtaining a descent direction from an approximation of the loss function and then in performing a line search to ensure sufficient descent. A theoretical analysis is provided showing that the iterates of the proposed algorithm {admit} as limit points stationary points of the DC objective function. Numerical experiments show that our approach is more efficient than current state of the art for a problem with a convex loss functions and non-convex regularizer. We have also illustrated the benefit of our algorithm in high-dimensional transductive learning problem where both loss function and regularizers are non-convex

    Regularized Optimal Transport and the Rot Mover's Distance

    Full text link
    This paper presents a unified framework for smooth convex regularization of discrete optimal transport problems. In this context, the regularized optimal transport turns out to be equivalent to a matrix nearness problem with respect to Bregman divergences. Our framework thus naturally generalizes a previously proposed regularization based on the Boltzmann-Shannon entropy related to the Kullback-Leibler divergence, and solved with the Sinkhorn-Knopp algorithm. We call the regularized optimal transport distance the rot mover's distance in reference to the classical earth mover's distance. We develop two generic schemes that we respectively call the alternate scaling algorithm and the non-negative alternate scaling algorithm, to compute efficiently the regularized optimal plans depending on whether the domain of the regularizer lies within the non-negative orthant or not. These schemes are based on Dykstra's algorithm with alternate Bregman projections, and further exploit the Newton-Raphson method when applied to separable divergences. We enhance the separable case with a sparse extension to deal with high data dimensions. We also instantiate our proposed framework and discuss the inherent specificities for well-known regularizers and statistical divergences in the machine learning and information geometry communities. Finally, we demonstrate the merits of our methods with experiments using synthetic data to illustrate the effect of different regularizers and penalties on the solutions, as well as real-world data for a pattern recognition application to audio scene classification

    Deep Generative Models for Reject Inference in Credit Scoring

    Get PDF
    Credit scoring models based on accepted applications may be biased and their consequences can have a statistical and economic impact. Reject inference is the process of attempting to infer the creditworthiness status of the rejected applications. In this research, we use deep generative models to develop two new semi-supervised Bayesian models for reject inference in credit scoring, in which we model the data generating process to be dependent on a Gaussian mixture. The goal is to improve the classification accuracy in credit scoring models by adding reject applications. Our proposed models infer the unknown creditworthiness of the rejected applications by exact enumeration of the two possible outcomes of the loan (default or non-default). The efficient stochastic gradient optimization technique used in deep generative models makes our models suitable for large data sets. Finally, the experiments in this research show that our proposed models perform better than classical and alternative machine learning models for reject inference in credit scoring
    • …
    corecore