44,866 research outputs found
Predicting Complex Word Emotions and Topics through a Hierarchical Bayesian Network
In this paper, we provide a Word Emotion Topic (WET) model to predict the complex word emotion information from text, and discover the distribution of emotions among different topics. A complex emotion is defined as the combination of one or more singular emotions from following 8 basic emotion categories: joy, love, expectation, surprise, anxiety, sorrow, anger and hate. We use a hierarchical Bayesian network to model the emotions and topics in the text. Both the complex emotions and topics are drawn from raw texts, without considering any complicated language features. Our experiment shows promising results of word emotion prediction, which outperforms the traditional parsing methods such as the Hidden Markov Model and the Conditional Random Fields(CRFs) on raw text. We also explore the topic distribution by examining the emotion topic variation in an emotion topic diagram
Computing the Cramer-Rao bound of Markov random field parameters: Application to the Ising and the Potts models
This report considers the problem of computing the Cramer-Rao bound for the
parameters of a Markov random field. Computation of the exact bound is not
feasible for most fields of interest because their likelihoods are intractable
and have intractable derivatives. We show here how it is possible to formulate
the computation of the bound as a statistical inference problem that can be
solve approximately, but with arbitrarily high accuracy, by using a Monte Carlo
method. The proposed methodology is successfully applied on the Ising and the
Potts models.% where it is used to assess the performance of three state-of-the
art estimators of the parameter of these Markov random fields
Two-parameter Poisson-Dirichlet measures and reversible exchangeable fragmentation-coalescence processes
We show that for , the Poisson-Dirichlet
distribution with parameter is the unique reversible
distribution of a rather natural fragmentation-coalescence process. This
completes earlier results in the literature for certain split and merge
transformations and the parameter
Learning Reputation in an Authorship Network
The problem of searching for experts in a given academic field is hugely
important in both industry and academia. We study exactly this issue with
respect to a database of authors and their publications. The idea is to use
Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) to perform
topic modelling in order to find authors who have worked in a query field. We
then construct a coauthorship graph and motivate the use of influence
maximisation and a variety of graph centrality measures to obtain a ranked list
of experts. The ranked lists are further improved using a Markov Chain-based
rank aggregation approach. The complete method is readily scalable to large
datasets. To demonstrate the efficacy of the approach we report on an extensive
set of computational simulations using the Arnetminer dataset. An improvement
in mean average precision is demonstrated over the baseline case of simply
using the order of authors found by the topic models
Learning loopy graphical models with latent variables: Efficient methods and guarantees
The problem of structure estimation in graphical models with latent variables
is considered. We characterize conditions for tractable graph estimation and
develop efficient methods with provable guarantees. We consider models where
the underlying Markov graph is locally tree-like, and the model is in the
regime of correlation decay. For the special case of the Ising model, the
number of samples required for structural consistency of our method scales
as , where p is the
number of variables, is the minimum edge potential, is
the depth (i.e., distance from a hidden node to the nearest observed nodes),
and is a parameter which depends on the bounds on node and edge
potentials in the Ising model. Necessary conditions for structural consistency
under any algorithm are derived and our method nearly matches the lower bound
on sample requirements. Further, the proposed method is practical to implement
and provides flexibility to control the number of latent variables and the
cycle lengths in the output graph.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1070 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- âŠ