6,032 research outputs found
Topic Similarity Networks: Visual Analytics for Large Document Sets
We investigate ways in which to improve the interpretability of LDA topic
models by better analyzing and visualizing their outputs. We focus on examining
what we refer to as topic similarity networks: graphs in which nodes represent
latent topics in text collections and links represent similarity among topics.
We describe efficient and effective approaches to both building and labeling
such networks. Visualizations of topic models based on these networks are shown
to be a powerful means of exploring, characterizing, and summarizing large
collections of unstructured text documents. They help to "tease out"
non-obvious connections among different sets of documents and provide insights
into how topics form larger themes. We demonstrate the efficacy and
practicality of these approaches through two case studies: 1) NSF grants for
basic research spanning a 14 year period and 2) the entire English portion of
Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData
2014
mARC: Memory by Association and Reinforcement of Contexts
This paper introduces the memory by Association and Reinforcement of Contexts
(mARC). mARC is a novel data modeling technology rooted in the second
quantization formulation of quantum mechanics. It is an all-purpose incremental
and unsupervised data storage and retrieval system which can be applied to all
types of signal or data, structured or unstructured, textual or not. mARC can
be applied to a wide range of information clas-sification and retrieval
problems like e-Discovery or contextual navigation. It can also for-mulated in
the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast
to Conway approach, the objects evolve in a massively multidimensional space.
In order to start evaluating the potential of mARC we have built a mARC-based
Internet search en-gine demonstrator with contextual functionality. We compare
the behavior of the mARC demonstrator with Google search both in terms of
performance and relevance. In the study we find that the mARC search engine
demonstrator outperforms Google search by an order of magnitude in response
time while providing more relevant results for some classes of queries
On the Role of Social Identity and Cohesion in Characterizing Online Social Communities
Two prevailing theories for explaining social group or community structure
are cohesion and identity. The social cohesion approach posits that social
groups arise out of an aggregation of individuals that have mutual
interpersonal attraction as they share common characteristics. These
characteristics can range from common interests to kinship ties and from social
values to ethnic backgrounds. In contrast, the social identity approach posits
that an individual is likely to join a group based on an intrinsic
self-evaluation at a cognitive or perceptual level. In other words group
members typically share an awareness of a common category membership.
In this work we seek to understand the role of these two contrasting theories
in explaining the behavior and stability of social communities in Twitter. A
specific focal point of our work is to understand the role of these theories in
disparate contexts ranging from disaster response to socio-political activism.
We extract social identity and social cohesion features-of-interest for large
scale datasets of five real-world events and examine the effectiveness of such
features in capturing behavioral characteristics and the stability of groups.
We also propose a novel measure of social group sustainability based on the
divergence in group discussion. Our main findings are: 1) Sharing of social
identities (especially physical location) among group members has a positive
impact on group sustainability, 2) Structural cohesion (represented by high
group density and low average shortest path length) is a strong indicator of
group sustainability, and 3) Event characteristics play a role in shaping group
sustainability, as social groups in transient events behave differently from
groups in events that last longer
- …