Search CORE

73 research outputs found

Latent dirichlet markov allocation for sentiment analysis

Author: Bagheri A
de Jong F
Saraee MH
Publication venue: The OR Society , Birmingham, UK
Publication date: 01/01/2013
Field of study

In recent years probabilistic topic models have gained tremendous attention in data mining and natural language processing research areas. In the field of information retrieval for text mining, a variety of probabilistic topic models have been used to analyse content of documents. A topic model is a generative model for documents, it specifies a probabilistic procedure by which documents can be generated. All topic models share the idea that documents are mixture of topics, where a topic is a probability distribution over words. In this paper we describe Latent Dirichlet Markov Allocation Model (LDMA), a new generative probabilistic topic model, based on Latent Dirichlet Allocation (LDA) and Hidden Markov Model (HMM), which emphasizes on extracting multi-word topics from text data. LDMA is a four-level hierarchical Bayesian model where topics are associated with documents, words are associated with topics and topics in the model can be presented with single- or multi-word terms. To evaluate performance of LDMA, we report results in the field of aspect detection in sentiment analysis, comparing to the basic LDA model

CiteSeerX

University of Salford Institutional Repository

University of Twente Research Information

Language Models

Author: Hiemstra D.
Publication venue: Springer Verlag
Publication date: 01/01/2009
Field of study

Contains fulltext : 227630.pdf (preprint version ) (Open Access

Radboud Repository

University of Twente Research Information

Context Modeling for Ranking and Tagging Bursty Features in Text Streams

Author: HE Jing
JIANG Jing
LI Xiaoming
Shan Dongdong
YAN Hongfei
ZHAO Xin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Bursty features in text streams are very useful in many text mining applications. Most existing studies detect bursty features based purely on term frequency changes without taking into account the semantic contexts of terms, and as a result the detected bursty features may not always be interesting or easy to interpret. In this paper we propose to model the contexts of bursty features using a language modeling approach. We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of a stream of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. ? 2010 ACM.EI

Crossref

Institutional Knowledge at Singapore Management University

Modeling Documents as Mixtures of Persons for Expert Finding

Author: Hiemstra D.
Serdyukov P.
Publication venue: Springer Verlag
Publication date: 01/01/2008
Field of study

In this paper we address the problem of searching for knowledgeable persons within the enterprise, known as the expert finding (or expert search) task. We present a probabilistic algorithm using the assumption that terms in documents are produced by people who are mentioned in them.We represent documents retrieved to a query as mixtures of candidate experts language models. Two methods of personal language models extraction are proposed, as well as the way of combining them with other evidences of expertise. Experiments conducted with the TREC Enterprise collection demonstrate the superiority of our approach in comparison with the best one among existing solutions

CiteSeerX

Radboud Repository

University of Twente Research Information

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity

Author: A Solow
C Rao
CD Manning
DD Lewis
DM Blei
DQ Nguyen
H Azarbonyad
H Soleimani
M Dehghani
Publication venue
Publication date: 01/01/2017
Field of study

A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is suboptimal due to generality and impurity. General topics only include common information from a background corpus and are assigned to most of the documents in the collection. Impure topics contain words that are not related to the topic; impurity lowers the interpretability of topic models and impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation approach for topic models to combat generality and impurity; the proposed approach operates at three levels: words, topics, and documents. Our re-estimation approach for measuring documents' topical diversity outperforms the state of the art on PubMed dataset which is commonly used for diversity experiments.Comment: Proceedings of the 39th European Conference on Information Retrieval (ECIR2017

arXiv.org e-Print Archive

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Knowledge-based Query Expansion in Real-Time Microblog Search

Author: Fan Feifan
Lv Chao
Qiang Runwei
Yang Jianwu
Publication venue
Publication date: 13/03/2015
Field of study

Since the length of microblog texts, such as tweets, is strictly limited to 140 characters, traditional Information Retrieval techniques suffer from the vocabulary mismatch problem severely and cannot yield good performance in the context of microblogosphere. To address this critical challenge, in this paper, we propose a new language modeling approach for microblog retrieval by inferring various types of context information. In particular, we expand the query using knowledge terms derived from Freebase so that the expanded one can better reflect users' search intent. Besides, in order to further satisfy users' real-time information need, we incorporate temporal evidences into the expansion method, which can boost recent tweets in the retrieval results with respect to a given topic. Experimental results on two official TREC Twitter corpora demonstrate the significant superiority of our approach over baseline methods.Comment: 9 pages, 9 figure

arXiv.org e-Print Archive

Crossref