29,176 research outputs found
Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time
Dynamic topic modeling facilitates the identification of topical trends over
time in temporal collections of unstructured documents. We introduce a novel
unsupervised neural dynamic topic model named as Recurrent Neural
Network-Replicated Softmax Model (RNNRSM), where the discovered topics at each
time influence the topic discovery in the subsequent time steps. We account for
the temporal ordering of documents by explicitly modeling a joint distribution
of latent topical dependencies over time, using distributional estimators with
temporal recurrent connections. Applying RNN-RSM to 19 years of articles on NLP
research, we demonstrate that compared to state-of-the art topic models, RNNRSM
shows better generalization, topic interpretation, evolution and trends. We
also introduce a metric (named as SPAN) to quantify the capability of dynamic
topic model to capture word evolution in topics over time.Comment: In Proceedings of the 16th Annual Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language
Technologies (NAACL-HLT 2018
Recommended from our members
New topic detection in microblogs and topic model evaluation using topical alignment
textThis thesis deals with topic model evaluation and new topic detection in microblogs. Microblogs are short and thus may not carry any contextual clues. Hence it becomes challenging to apply traditional natural language processing algorithms on such data. Graphical models have been traditionally used for topic discovery and text clustering on sets of text-based documents. Their unsupervised nature allows topic models to be trained easily on datasets meant for specific domains. However the advantage of not requiring annotated data comes with a drawback with respect to evaluation difficulties. The problem aggravates when the data comprises microblogs which are unstructured and noisy.
We demonstrate the application of three types of such models to microblogs - the Latent Dirichlet Allocation, the Author-Topic and the Author-Recipient-Topic model. We extensively evaluate these models under different settings, and our results show that the Author-Recipient-Topic model extracts the most coherent topics. We also addressed the problem of topic modeling on short text by using clustering techniques. This technique helps in boosting the performance of our models.
Topical alignment is used for large scale assessment of topical relevance by comparing topics to manually generated domain specific concepts. In this thesis we use this idea to evaluate topic models by measuring misalignments between topics. Our study on comparing topic models reveals interesting traits about Twitter messages, users and their interactions and establishes that joint modeling on author-recipient pairs and on the content of tweet leads to qualitatively better topic discovery.
This thesis gives a new direction to the well known problem of topic discovery in microblogs. Trend prediction or topic discovery for microblogs is an extensive research area. We propose the idea of using topical alignment to detect new topics by comparing topics from the current week to those of the previous week. We measure correspondence between a set of topics from the current week and a set of topics from the previous week to quantify five types of misalignments: \textit{junk, fused, missing} and \textit{repeated}. Our analysis compares three types of topic models under different settings and demonstrates how our framework can detect new topics from topical misalignments. In particular so-called \textit{junk} topics are more likely to be new topics and the \textit{missing} topics are likely to have died or die out.
To get more insights into the nature of microblogs we apply topical alignment to hashtags. Comparing topics to hashtags enables us to make interesting inferences about Twitter messages and their content. Our study revealed that although a very small proportion of Twitter messages explicitly contain hashtags, the proportion of tweets that discuss topics related to hashtags is much higher.Computer Science
iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling
Researchers in the Digital Humanities and journalists need to monitor,
collect and analyze fresh online content regarding current events such as the
Ebola outbreak or the Ukraine crisis on demand. However, existing focused
crawling approaches only consider topical aspects while ignoring temporal
aspects and therefore cannot achieve thematically coherent and fresh Web
collections. Especially Social Media provide a rich source of fresh content,
which is not used by state-of-the-art focused crawlers. In this paper we
address the issues of enabling the collection of fresh and relevant Web and
Social Web content for a topic of interest through seamless integration of Web
and Social Media in a novel integrated focused crawler. The crawler collects
Web and Social Media content in a single system and exploits the stream of
fresh Social Media content for guiding the crawler.Comment: Published in the Proceedings of the 15th ACM/IEEE-CS Joint Conference
on Digital Libraries 201
Analysis of Computational Science Papers from ICCS 2001-2016 using Topic Modeling and Graph Theory
This paper presents results of topic modeling and network models of topics
using the International Conference on Computational Science corpus, which
contains domain-specific (computational science) papers over sixteen years (a
total of 5695 papers). We discuss topical structures of International
Conference on Computational Science, how these topics evolve over time in
response to the topicality of various problems, technologies and methods, and
how all these topics relate to one another. This analysis illustrates
multidisciplinary research and collaborations among scientific communities, by
constructing static and dynamic networks from the topic modeling results and
the keywords of authors. The results of this study give insights about the past
and future trends of core discussion topics in computational science. We used
the Non-negative Matrix Factorization topic modeling algorithm to discover
topics and labeled and grouped results hierarchically.Comment: Accepted by International Conference on Computational Science (ICCS)
2017 which will be held in Zurich, Switzerland from June 11-June 1
- …