9,674 research outputs found

    Bivariate Beta-LSTM

    Full text link
    Long Short-Term Memory (LSTM) infers the long term dependency through a cell state maintained by the input and the forget gate structures, which models a gate output as a value in [0,1] through a sigmoid function. However, due to the graduality of the sigmoid function, the sigmoid gate is not flexible in representing multi-modality or skewness. Besides, the previous models lack modeling on the correlation between the gates, which would be a new method to adopt inductive bias for a relationship between previous and current input. This paper proposes a new gate structure with the bivariate Beta distribution. The proposed gate structure enables probabilistic modeling on the gates within the LSTM cell so that the modelers can customize the cell state flow with priors and distributions. Moreover, we theoretically show the higher upper bound of the gradient compared to the sigmoid function, and we empirically observed that the bivariate Beta distribution gate structure provides higher gradient values in training. We demonstrate the effectiveness of bivariate Beta gate structure on the sentence classification, image classification, polyphonic music modeling, and image caption generation.Comment: AAAI 202

    Improving Negative Sampling for Word Representation using Self-embedded Features

    Get PDF
    Although the word-popularity based negative sampler has shown superb performance in the skip-gram model, the theoretical motivation behind oversampling popular (non-observed) words as negative samples is still not well understood. In this paper, we start from an investigation of the gradient vanishing issue in the skipgram model without a proper negative sampler. By performing an insightful analysis from the stochastic gradient descent (SGD) learning perspective, we demonstrate that, both theoretically and intuitively, negative samples with larger inner product scores are more informative than those with lower scores for the SGD learner in terms of both convergence rate and accuracy. Understanding this, we propose an alternative sampling algorithm that dynamically selects informative negative samples during each SGD update. More importantly, the proposed sampler accounts for multi-dimensional self-embedded features during the sampling process, which essentially makes it more effective than the original popularity-based (one-dimensional) sampler. Empirical experiments further verify our observations, and show that our fine-grained samplers gain significant improvement over the existing ones without increasing computational complexity.Comment: Accepted in WSDM 201

    Using Item response models to investigate attitudes towards divorce

    Get PDF
    Item Response Theory (IRT) is a form of latent structure analysis that is used to analyze binary or ordinal response data. IRT models are used to evaluate the relationships between the latent trait of interest and the items measuring the trait. Several IRT models will be fitted to assess the factors that lead to divorce in the Maltese Islands. The 1-PL and 2-PL logistic Rasch models are used for dichotomous responses, whereas the 1-PL rating scale and 1-PL partial-credit models are used for polytomous responses. All the models are fitted using the generalized linear latent and mixed modeling (GLLAMM) framework. The gllamm directive estimates parameters by maximum likelihood using adaptive quadrature (Rabe-Hesketh, Skrondal, and Pickles 2002; 2005). In the 1-PL Rasch model, the probability that a person agrees with a divorce-related item is modeled as a function of subject ability and item difficulty parameters. The major weakness of this model is that the items have the same discrimination parameter. In the 2-PL Birnbaum model, an item-specific weight is added so that the slope of the item response function varies between the items. The 1-PL rating scale model specifies that the items share the same rating scale structure, while the 1-PL partial credit model specifies a distinct rating scale structure for each item.peer-reviewe

    Automatic domain ontology extraction for context-sensitive opinion mining

    Get PDF
    Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline
    • …
    corecore