2,993 research outputs found
LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet Allocation
We present LDAExplore, a tool to visualize topic distributions in a given
document corpus that are generated using Topic Modeling methods. Latent
Dirichlet Allocation (LDA) is one of the basic methods that is predominantly
used to generate topics. One of the problems with methods like LDA is that
users who apply them may not understand the topics that are generated. Also,
users may find it difficult to search correlated topics and correlated
documents. LDAExplore, tries to alleviate these problems by visualizing topic
and word distributions generated from the document corpus and allowing the user
to interact with them. The system is designed for users, who have minimal
knowledge of LDA or Topic Modelling methods. To evaluate our design, we run a
pilot study which uses the abstracts of 322 Information Visualization papers,
where every abstract is considered a document. The topics generated are then
explored by users. The results show that users are able to find correlated
documents and group them based on topics that are similar
Extracting corpus specific knowledge bases from Wikipedia
Thesauri are useful knowledge structures for assisting information retrieval. Yet their production is labor-intensive, and few domains have comprehensive thesauri that cover domain-specific concepts and contemporary usage. One approach, which has been attempted without much success for decades, is to seek statistical natural language processing algorithms that work on free text. Instead, we propose to replace costly professional indexers with thousands of dedicated amateur volunteers--namely, those that are producing Wikipedia. This vast, open encyclopedia represents a rich tapestry of topics and semantics and a huge investment of human effort and judgment. We show how this can be directly exploited to provide WikiSauri: manually-defined yet inexpensive thesaurus structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We also offer concrete evidence of the effectiveness of WikiSauri for assisting information retrieval
- …