47,646 research outputs found
Exploratory topic modeling with distributional semantics
As we continue to collect and store textual data in a multitude of domains,
we are regularly confronted with material whose largely unknown thematic
structure we want to uncover. With unsupervised, exploratory analysis, no prior
knowledge about the content is required and highly open-ended tasks can be
supported. In the past few years, probabilistic topic modeling has emerged as a
popular approach to this problem. Nevertheless, the representation of the
latent topics as aggregations of semi-coherent terms limits their
interpretability and level of detail.
This paper presents an alternative approach to topic modeling that maps
topics as a network for exploration, based on distributional semantics using
learned word vectors. From the granular level of terms and their semantic
similarity relations global topic structures emerge as clustered regions and
gradients of concepts. Moreover, the paper discusses the visual interactive
representation of the topic map, which plays an important role in supporting
its exploration.Comment: Conference: The Fourteenth International Symposium on Intelligent
Data Analysis (IDA 2015
Topic Map Generation Using Text Mining
Starting from text corpus analysis with linguistic and statistical analysis algorithms, an infrastructure for text mining is described which uses collocation analysis as a central tool. This text mining method may be applied to different domains as well as languages. Some examples taken form large reference databases motivate the applicability to knowledge management using declarative standards of information structuring and description. The ISO/IEC Topic Map standard is introduced as a candidate for rich metadata description of information resources and it is shown how text mining can be used for automatic topic map generation
Mapping Topics and Topic Bursts in PNAS
Scientific research is highly dynamic. New areas of science continually
evolve;others gain or lose importance, merge or split. Due to the steady
increase in the number of scientific publications it is hard to keep an
overview of the structure and dynamic development of one's own field of
science, much less all scientific domains. However, knowledge of hot topics,
emergent research frontiers, or change of focus in certain areas is a critical
component of resource allocation decisions in research labs, governmental
institutions, and corporations. This paper demonstrates the utilization of
Kleinberg's burst detection algorithm, co-word occurrence analysis, and graph
layout techniques to generate maps that support the identification of major
research topics and trends. The approach was applied to analyze and map the
complete set of papers published in the Proceedings of the National Academy of
Sciences (PNAS) in the years 1982-2001. Six domain experts examined and
commented on the resulting maps in an attempt to reconstruct the evolution of
major research areas covered by PNAS
Analyzing the Language of Food on Social Media
We investigate the predictive power behind the language of food on social
media. We collect a corpus of over three million food-related posts from
Twitter and demonstrate that many latent population characteristics can be
directly predicted from this data: overweight rate, diabetes rate, political
leaning, and home geographical location of authors. For all tasks, our
language-based models significantly outperform the majority-class baselines.
Performance is further improved with more complex natural language processing,
such as topic modeling. We analyze which textual features have most predictive
power for these datasets, providing insight into the connections between the
language of food, geographic locale, and community characteristics. Lastly, we
design and implement an online system for real-time query and visualization of
the dataset. Visualization tools, such as geo-referenced heatmaps,
semantics-preserving wordclouds and temporal histograms, allow us to discover
more complex, global patterns mirrored in the language of food.Comment: An extended abstract of this paper will appear in IEEE Big Data 201
Automated construction and analysis of political networks via open government and media sources
We present a tool to generate real world political networks from user provided lists of politicians and news sites. Additional output includes visualizations, interactive tools and maps that allow a user to better understand the politicians and their surrounding environments as portrayed by the media. As a case study, we construct a comprehensive list of current Texas politicians, select news sites that convey a spectrum of political viewpoints covering Texas politics, and examine the results. We propose a ”Combined” co-occurrence distance metric to better reflect the relationship between two entities. A topic modeling technique is also proposed as a novel, automated way of labeling communities that exist within a politician’s ”extended” network.Peer ReviewedPostprint (author's final draft
- …