9,883 research outputs found
Stochastic Modeling of Semantic Structures of Online Movie Reviews
Facing the enormous volumes of data available nowadays, we try to extract useful information from the data by properly modeling and characterizing the data. In this thesis, we focus on one particular type of semantic data --- online movie reviews, which can be found on all major movie websites. Our objective is mining movie review data to seek quantifiable patterns between reviews on the same movie, or reviews from the same reviewer. A novel approach is presented in this thesis to achieve this goal. The key idea is converting a movie review text into a list of tuples, where each tuple contains four elements: feature word, category of feature word, opinion word and polarity of opinion word. Then we further convert each tuple into an 18-dimension vector. Given a multinomial distribution representing a movie review, we can systematically and consistently quantify the similarity and dependence between reviews made by the same or different reviewers using metrics including KL distance and distance correlation, respectively. Such comparisons allow us to find reviewers sharing similarity in generated multinomial distributions, or demonstrating correlation patterns to certain extent. Among the identified pairs of frequent reviewers, we further investigate the category-wise dependency relationships between two reviewers, which are further captured by our proposed ordinary least square estimation models. The proposed data processing approaches, as well as the corresponding modeling framework, could be further leveraged to develop classification, prediction, and common randomness extraction algorithms for semantic movie review data
Latent Tree Language Model
In this paper we introduce Latent Tree Language Model (LTLM), a novel
approach to language modeling that encodes syntax and semantics of a given
sentence as a tree of word roles.
The learning phase iteratively updates the trees by moving nodes according to
Gibbs sampling. We introduce two algorithms to infer a tree for a given
sentence. The first one is based on Gibbs sampling. It is fast, but does not
guarantee to find the most probable tree. The second one is based on dynamic
programming. It is slower, but guarantees to find the most probable tree. We
provide comparison of both algorithms.
We combine LTLM with 4-gram Modified Kneser-Ney language model via linear
interpolation. Our experiments with English and Czech corpora show significant
perplexity reductions (up to 46% for English and 49% for Czech) compared with
standalone 4-gram Modified Kneser-Ney language model.Comment: Accepted to EMNLP 201
Anaphora and Discourse Structure
We argue in this paper that many common adverbial phrases generally taken to
signal a discourse relation between syntactically connected units within
discourse structure, instead work anaphorically to contribute relational
meaning, with only indirect dependence on discourse structure. This allows a
simpler discourse structure to provide scaffolding for compositional semantics,
and reveals multiple ways in which the relational meaning conveyed by adverbial
connectives can interact with that associated with discourse structure. We
conclude by sketching out a lexicalised grammar for discourse that facilitates
discourse interpretation as a product of compositional rules, anaphor
resolution and inference.Comment: 45 pages, 17 figures. Revised resubmission to Computational
Linguistic
- …