977 research outputs found
Topic Similarity Networks: Visual Analytics for Large Document Sets
We investigate ways in which to improve the interpretability of LDA topic
models by better analyzing and visualizing their outputs. We focus on examining
what we refer to as topic similarity networks: graphs in which nodes represent
latent topics in text collections and links represent similarity among topics.
We describe efficient and effective approaches to both building and labeling
such networks. Visualizations of topic models based on these networks are shown
to be a powerful means of exploring, characterizing, and summarizing large
collections of unstructured text documents. They help to "tease out"
non-obvious connections among different sets of documents and provide insights
into how topics form larger themes. We demonstrate the efficacy and
practicality of these approaches through two case studies: 1) NSF grants for
basic research spanning a 14 year period and 2) the entire English portion of
Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData
2014
E3 : Keyphrase based News Event Exploration Engine
This paper presents a novel system E3 for extracting keyphrases from news
content for the purpose of offering the news audience a broad overview of news
events, with especially high content volume. Given an input query, E3 extracts
keyphrases and enrich them by tagging, ranking and finding role for frequently
associated keyphrases. Also, E3 finds the novelty and activeness of keyphrases
using news publication date, to identify the most interesting and informative
keyphrases
Crowdsourced real-world sensing: sentiment analysis and the real-time web
The advent of the real-time web is proving both challeng-
ing and at the same time disruptive for a number of areas of research,
notably information retrieval and web data mining. As an area of research reaching maturity, sentiment analysis oers a promising direction for modelling the text content available in real-time streams. This paper reviews the real-time web as a new area of focus for sentiment analysis
and discusses the motivations and challenges behind such a direction
Terminology mining in social media
The highly variable and dynamic word usage in social media presents serious challenges for both research and those commercial applications that are geared towards blogs or other user-generated non-editorial texts. This paper discusses and exempliïŹes a terminology mining approach for dealing with the productive character of the textual environment in social media. We explore the challenges of practically acquiring new terminology, and of modeling similarity and relatedness of terms from observing realistic amounts of data. We also discuss semantic evolution and density, and investigate novel measures for characterizing the preconditions for terminology mining
Unsupervised Extraction of Representative Concepts from Scientific Literature
This paper studies the automated categorization and extraction of scientific
concepts from titles of scientific articles, in order to gain a deeper
understanding of their key contributions and facilitate the construction of a
generic academic knowledgebase. Towards this goal, we propose an unsupervised,
domain-independent, and scalable two-phase algorithm to type and extract key
concept mentions into aspects of interest (e.g., Techniques, Applications,
etc.). In the first phase of our algorithm we propose PhraseType, a
probabilistic generative model which exploits textual features and limited POS
tags to broadly segment text snippets into aspect-typed phrases. We extend this
model to simultaneously learn aspect-specific features and identify academic
domains in multi-domain corpora, since the two tasks mutually enhance each
other. In the second phase, we propose an approach based on adaptor grammars to
extract fine grained concept mentions from the aspect-typed phrases without the
need for any external resources or human effort, in a purely data-driven
manner. We apply our technique to study literature from diverse scientific
domains and show significant gains over state-of-the-art concept extraction
techniques. We also present a qualitative analysis of the results obtained.Comment: Published as a conference paper at CIKM 201
Automatic Taxonomy Generation - A Use-Case in the Legal Domain
A key challenge in the legal domain is the adaptation and representation of
the legal knowledge expressed through texts, in order for legal practitioners
and researchers to access this information easier and faster to help with
compliance related issues. One way to approach this goal is in the form of a
taxonomy of legal concepts. While this task usually requires a manual
construction of terms and their relations by domain experts, this paper
describes a methodology to automatically generate a taxonomy of legal noun
concepts. We apply and compare two approaches on a corpus consisting of
statutory instruments for UK, Wales, Scotland and Northern Ireland laws.Comment: 9 page
- âŠ