10,260 research outputs found

    An Analysis of the Coherence of Descriptors in Topic Modeling

    Get PDF
    In recent years, topic modeling has become an established method in the analysis of text corpora, with probabilistic techniques such as latent Dirichlet allocation (LDA) commonly employed for this purpose. However, it might be argued that adequate attention is often not paid to the issue of topic coherence, the semantic interpretability of the top terms usually used to describe discovered topics. Nevertheless, a number of studies have proposed measures for analyzing such coherence, where these have been largely focused on topics found by LDA, with matrix decomposition techniques such as Non-negative Matrix Factorization (NMF) being somewhat overlooked in comparison. This motivates the current work, where we compare and analyze topics found by popular variants of both NMF and LDA in multiple corpora in terms of both their coherence and associated generality, using a combination of existing and new measures, including one based on distributional semantics. Two out of three coherence measures find NMF to regularly produce more coherent topics, with higher levels of generality and redundancy observed with the LDA topic descriptors. In all cases, we observe that the associated term weighting strategy plays a major role. The results observed with NMF suggest that this may be a more suitable topic modeling method when analyzing certain corpora, such as those associated with niche or non-mainstream domains.Science Foundation Irelan

    Comparison of Two-pass Algorithms for Dynamic Topic Modelling Based on Matrix Decompositions

    Get PDF
    In this paper we present a two-pass algorithm based on different matrix decompositions, such as LSI, PCA, ICA and NMF, which allows tracking of the evolution of topics over time. The proposed dynamic topic models as output give an easily interpreted overview of topics found in a sequentially organized set of documents that does not require further processing. Each topic is presented by a user-specified number of top-terms. Such an approach to topic modeling if applied to, for example, a news article data set, can be convenient and useful for economists, sociologists, political scientists. The proposed approach allows to achieve results comparable to those obtained using complex probabilistic models, such as LDA

    Natural Language Generation and Fuzzy Sets : An Exploratory Study on Geographical Referring Expression Generation

    Get PDF
    This work was supported by the Spanish Ministry for Economy and Competitiveness (grant TIN2014-56633-C3-1-R) and by the European Regional Development Fund (ERDF/FEDER) and the Galician Ministry of Education (grants GRC2014/030 and CN2012/151). Alejandro Ramos-Soto is supported by the Spanish Ministry for Economy and Competitiveness (FPI Fellowship Program) under grant BES-2012-051878.Postprin

    Understanding PubMed Search Results using Topic Models and Interactive Information Visualization

    Get PDF
    With data increasing exponentially, extracting and understanding information, themes and relationships from larger collections of documents is becoming more and more important to researchers in many areas. PubMed, which comprises more than 25 million citations, uses Medical Subject Headings (MeSH) to index articles to better facilitate their management, searching and indexing. However, researchers are still challenged to find and then get a meaningful overview of a set of documents in a specific area of interest. This is due in part to several limitations of MeSH terms, including: the need to monitor and expand the vocabulary; the lack of concept coverage for newly developing areas; human inconsistency in assigning codes; and the time required to manually index an exponentially growing corpus. Another reason for this challenge is that neither PubMed itself nor its related Web tools can help users see high level themes and hidden semantic structures in the biomedical literature. Topic models are a class of statistical machine learning algorithms that when given a set of natural language documents, extract the semantic themes (topics) from the set of documents, describe the topics for each document, and the semantic similarity of topics and documents. Researchers have shown that these latent themes can help humans better understand and search documents. Unlike MeSH terms, which are created based on important concepts throughout the literature, topics extracted from a subset of documents are specific to those documents. Thus they can find document-specific themes that may not exist in MeSH terms. Such themes may give a subject area-specific set of themes for browsing search results, and provide a broader overview of the search results. This first part of this dissertation presents the TopicalMeSH representation, which exploits the ‘correspondence’ between topics generated using latent Dirichlet allocation (LDA) and MeSH terms to create new document representations that combine MeSH terms and latent topic vectors. In an evaluation with 15 systematic drug review corpora, TopicalMeSH performed better than MeSH in both document retrieval and classification tasks. The second part of this work introduces the “Hybrid Topic”, an alternative LDA approach that uses a ‘bag-of-MeSH&words’ approach, instead of just ‘bag-of-words’, to test whether the addition of labels (e.g. MeSH descriptors) can improve the quality and facilitate the interpretation of LDA-generated topics. An evaluation of this approach on the quality and interpretability of topics in both a general corpus and a specialized corpus demonstrated that the coherence of ‘hybrid topics’ is higher than that of regular bag-of-words topics in both specialized and general copora. The last part of this dissertation presents a visualization tool based on the ‘hybrid topics’ model that could allow users to interactively use topic models and MeSH terms to efficiently and effectively retrieve relevant information from tons of PubMed search results. A preliminary user study has been conducted with 6 participants. All of them agree that this tool can quickly help them understand PubMed search results and identify target articles

    Interaction between high-level and low-level image analysis for semantic video object extraction

    Get PDF
    Authors of articles published in EURASIP Journal on Advances in Signal Processing are the copyright holders of their articles and have granted to any third party, in advance and in perpetuity, the right to use, reproduce or disseminate the article, according to the SpringerOpen copyright and license agreement (http://www.springeropen.com/authors/license)

    A model-based approach to hypermedia design.

    Get PDF
    This paper introduces the MESH approach to hypermedia design, which combines established entity-relationship and object-oriented abstractions with proprietary concepts into a formal hypermedia data model. Uniform layout and link typing specifications can be attributed and inherited in a static node typing hierarchy, whereas both nodes and links can be submitted dynamically to multiple complementary classifications. In addition, the data model's support for a context-based navigation paradigm, as well as a platform-independent implementation framework, are briefly discussed.Data; Model; Specifications; Classification;
    corecore