9,674 research outputs found
Bivariate Beta-LSTM
Long Short-Term Memory (LSTM) infers the long term dependency through a cell
state maintained by the input and the forget gate structures, which models a
gate output as a value in [0,1] through a sigmoid function. However, due to the
graduality of the sigmoid function, the sigmoid gate is not flexible in
representing multi-modality or skewness. Besides, the previous models lack
modeling on the correlation between the gates, which would be a new method to
adopt inductive bias for a relationship between previous and current input.
This paper proposes a new gate structure with the bivariate Beta distribution.
The proposed gate structure enables probabilistic modeling on the gates within
the LSTM cell so that the modelers can customize the cell state flow with
priors and distributions. Moreover, we theoretically show the higher upper
bound of the gradient compared to the sigmoid function, and we empirically
observed that the bivariate Beta distribution gate structure provides higher
gradient values in training. We demonstrate the effectiveness of bivariate Beta
gate structure on the sentence classification, image classification, polyphonic
music modeling, and image caption generation.Comment: AAAI 202
Improving Negative Sampling for Word Representation using Self-embedded Features
Although the word-popularity based negative sampler has shown superb
performance in the skip-gram model, the theoretical motivation behind
oversampling popular (non-observed) words as negative samples is still not well
understood. In this paper, we start from an investigation of the gradient
vanishing issue in the skipgram model without a proper negative sampler. By
performing an insightful analysis from the stochastic gradient descent (SGD)
learning perspective, we demonstrate that, both theoretically and intuitively,
negative samples with larger inner product scores are more informative than
those with lower scores for the SGD learner in terms of both convergence rate
and accuracy. Understanding this, we propose an alternative sampling algorithm
that dynamically selects informative negative samples during each SGD update.
More importantly, the proposed sampler accounts for multi-dimensional
self-embedded features during the sampling process, which essentially makes it
more effective than the original popularity-based (one-dimensional) sampler.
Empirical experiments further verify our observations, and show that our
fine-grained samplers gain significant improvement over the existing ones
without increasing computational complexity.Comment: Accepted in WSDM 201
Using Item response models to investigate attitudes towards divorce
Item Response Theory (IRT) is a form of latent structure
analysis that is used to analyze binary or ordinal response
data. IRT models are used to evaluate the relationships
between the latent trait of interest and the items measuring
the trait. Several IRT models will be fitted to assess the
factors that lead to divorce in the Maltese Islands. The 1-PL
and 2-PL logistic Rasch models are used for dichotomous
responses, whereas the 1-PL rating scale and 1-PL partial-credit
models are used for polytomous responses. All the
models are fitted using the generalized linear latent and
mixed modeling (GLLAMM) framework. The gllamm
directive estimates parameters by maximum likelihood
using adaptive quadrature (Rabe-Hesketh, Skrondal, and
Pickles 2002; 2005).
In the 1-PL Rasch model, the probability that a person
agrees with a divorce-related item is modeled as a function
of subject ability and item difficulty parameters. The major
weakness of this model is that the items have the same
discrimination parameter. In the 2-PL Birnbaum model, an
item-specific weight is added so that the slope of the item
response function varies between the items. The 1-PL rating
scale model specifies that the items share the same rating
scale structure, while the 1-PL partial credit model specifies
a distinct rating scale structure for each item.peer-reviewe
Automatic domain ontology extraction for context-sensitive opinion mining
Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline
- …