19 research outputs found

    ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics

    Full text link
    This paper presents an algorithmic family of dynamic topic models called Aligned Neural Topic Models (ANTM), which combine novel data mining algorithms to provide a modular framework for discovering evolving topics. ANTM maintains the temporal continuity of evolving topics by extracting time-aware features from documents using advanced pre-trained Large Language Models (LLMs) and employing an overlapping sliding window algorithm for sequential document clustering. This overlapping sliding window algorithm identifies a different number of topics within each time frame and aligns semantically similar document clusters across time periods. This process captures emerging and fading trends across different periods and allows for a more interpretable representation of evolving topics. Experiments on four distinct datasets show that ANTM outperforms probabilistic dynamic topic models in terms of topic coherence and diversity metrics. Moreover, it improves the scalability and flexibility of dynamic topic models by being accessible and adaptable to different types of algorithms. Additionally, a Python package is developed for researchers and scientists who wish to study the trends and evolving patterns of topics in large-scale textual data

    Unifying Community Detection Across Scales from Genomes to Landscapes

    Get PDF
    Biodiversity science encompasses multiple disciplines and biological scales from molecules to landscapes. Nevertheless, biodiversity data are often analyzed separately with discipline-specific methodologies, constraining resulting inferences to a single scale. To overcome this, we present a topic modeling framework to analyze community composition in cross-disciplinary datasets, including those generated from metagenomics, metabolomics, field ecology and remote sensing. Using topic models, we demonstrate how community detection in different datasets can inform the conservation of interacting plants and herbivores. We show how topic models can identify members of molecular, organismal and landscape-level communities that relate to wildlife health, from gut microbes to forage quality. We conclude with a future vision for how topic modeling can be used to design cross-scale studies that promote a holistic approach to detect, monitor and manage biodiversity

    Topic Modeling on Health Journals with Regularized Variational Inference

    Full text link
    Topic modeling enables exploration and compact representation of a corpus. The CaringBridge (CB) dataset is a massive collection of journals written by patients and caregivers during a health crisis. Topic modeling on the CB dataset, however, is challenging due to the asynchronous nature of multiple authors writing about their health journeys. To overcome this challenge we introduce the Dynamic Author-Persona topic model (DAP), a probabilistic graphical model designed for temporal corpora with multiple authors. The novelty of the DAP model lies in its representation of authors by a persona --- where personas capture the propensity to write about certain topics over time. Further, we present a regularized variational inference algorithm, which we use to encourage the DAP model's personas to be distinct. Our results show significant improvements over competing topic models --- particularly after regularization, and highlight the DAP model's unique ability to capture common journeys shared by different authors.Comment: Published in Thirty-Second AAAI Conference on Artificial Intelligence, February 2018, New Orleans, Louisiana, US

    Bitcoin Volatility Forecasting with a Glimpse into Buy and Sell Orders

    Full text link
    In this paper, we study the ability to make the short-term prediction of the exchange price fluctuations towards the United States dollar for the Bitcoin market. We use the data of realized volatility collected from one of the largest Bitcoin digital trading offices in 2016 and 2017 as well as order information. Experiments are performed to evaluate a variety of statistical and machine learning approaches.Comment: Full version of the paper published at IEEE International Conference on Data Mining (ICDM), 201
    corecore