13 research outputs found

    Unlocking Insights into Business Trajectories with Transformer-based Spatio-temporal Data Analysis

    Full text link
    The world of business is constantly evolving and staying ahead of the curve requires a deep understanding of market trends and performance. This article addresses this requirement by modeling business trajectories using news articles data.Comment: Presented in the conference Spatial Analysis and GEOmatics 2023 SAGE

    ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics

    Full text link
    This paper presents an algorithmic family of dynamic topic models called Aligned Neural Topic Models (ANTM), which combine novel data mining algorithms to provide a modular framework for discovering evolving topics. ANTM maintains the temporal continuity of evolving topics by extracting time-aware features from documents using advanced pre-trained Large Language Models (LLMs) and employing an overlapping sliding window algorithm for sequential document clustering. This overlapping sliding window algorithm identifies a different number of topics within each time frame and aligns semantically similar document clusters across time periods. This process captures emerging and fading trends across different periods and allows for a more interpretable representation of evolving topics. Experiments on four distinct datasets show that ANTM outperforms probabilistic dynamic topic models in terms of topic coherence and diversity metrics. Moreover, it improves the scalability and flexibility of dynamic topic models by being accessible and adaptable to different types of algorithms. Additionally, a Python package is developed for researchers and scientists who wish to study the trends and evolving patterns of topics in large-scale textual data

    India nudges to contain COVID-19 pandemic: A reactive public policy analysis using machine-learning based topic modelling.

    Get PDF
    India locked down 1.3 billion people on March 25, 2020, in the wake of COVID-19 pandemic. The economic cost of it was estimated at USD 98 billion, while the social costs are still unknown. This study investigated how government formed reactive policies to fight coronavirus across its policy sectors. Primary data was collected from the Press Information Bureau (PIB) in the form press releases of government plans, policies, programme initiatives and achievements. A text corpus of 260,852 words was created from 396 documents from the PIB. An unsupervised machine-based topic modelling using Latent Dirichlet Allocation (LDA) algorithm was performed on the text corpus. It was done to extract high probability topics in the policy sectors. The interpretation of the extracted topics was made through a nudge theoretic lens to derive the critical policy heuristics of the government. Results showed that most interventions were targeted to generate endogenous nudge by using external triggers. Notably, the nudges from the Prime Minister of India was critical in creating herd effect on lockdown and social distancing norms across the nation. A similar effect was also observed around the public health (e.g., masks in public spaces; Yoga and Ayurveda for immunity), transport (e.g., old trains converted to isolation wards), micro, small and medium enterprises (e.g., rapid production of PPE and masks), science and technology sector (e.g., diagnostic kits, robots and nano-technology), home affairs (e.g., surveillance and lockdown), urban (e.g. drones, GIS-tools) and education (e.g., online learning). A conclusion was drawn on leveraging these heuristics are crucial for lockdown easement planning

    A query-driven topic model

    Get PDF
    Topic modeling is an unsupervised method for revealing the hidden semantic structure of a corpus. It has been increasingly widely adopted as a tool in the social sciences, including political science, digital humanities and sociological research in general. One desirable property of topic models is to allow users to find topics describing a specific aspect of the corpus. A possible solution is to incorporate domain-specific knowledge into topic modeling, but this requires a specification from domain experts. We propose a novel query-driven topic model that allows users to specify a simple query in words or phrases and return query-related topics, thus avoiding tedious work from domain experts. Our proposed approach is particularly attractive when the user-specified query has a low occurrence in a text corpus, making it difficult for traditional topic models built on word cooccurrence patterns to identify relevant topics. Experimental results demonstrate the effectiveness of our model in comparison with both classical topic models and neural topic models

    Grounded reality meets machine learning: A deep-narrative analysis framework for energy policy research

    Get PDF
    Text-based data sources like narratives and stories have become increasingly popular as critical insight generator in energy research and social science. However, their implications in policy application usually remain superficial and fail to fully explo

    Mobility in Unsupervised Word Embeddings for Knowledge Extraction—The Scholars’ Trajectories across Research Topics

    Get PDF
    In the knowledge discovery field of the Big Data domain the analysis of geographic positioning and mobility information plays a key role. At the same time, in the Natural Language Processing (NLP) domain pre-trained models such as BERT and word embedding algorithms such as Word2Vec enabled a rich encoding of words that allows mapping textual data into points of an arbitrary multi-dimensional space, in which the notion of proximity reflects an association among terms or topics. The main contribution of this paper is to show how analytical tools, traditionally adopted to deal with geographic data to measure the mobility of an agent in a time interval, can also be effectively applied to extract knowledge in a semantic realm, such as a semantic space of words and topics, looking for latent trajectories that can benefit the properties of neural network latent representations. As a case study, the Scopus database was queried about works of highly cited researchers in recent years. On this basis, we performed a dynamic analysis, for measuring the Radius of Gyration as an index of the mobility of researchers across scientific topics. The semantic space is built from the automatic analysis of the paper abstracts of each author. In particular, we evaluated two different methodologies to build the semantic space and we found that Word2Vec embeddings perform better than the BERT ones for this task. Finally, The scholars’ trajectories show some latent properties of this model, which also represent new scientific contributions of this work. These properties include (i) the correlation between the scientific mobility and the achievement of scientific results, measured through the H-index; (ii) differences in the behavior of researchers working in different countries and subjects; and (iii) some interesting similarities between mobility patterns in this semantic realm and those typically observed in the case of human mobility

    Can linguistic features extracted from geo-referenced tweets help building function classification in remote sensing?

    Get PDF
    The fusion of two or more different data sources is a widely accepted technique in remote sensing while becoming increasingly important due to the availability of big Earth Observation satellite data. As a complementary source of geo-information to satellite data, massive text messages from social media form a temporally quasi-seamless, spatially multi-perspective stream, but with unknown and diverse quality. Despite the uncontrolled quality: can linguistic features extracted from geo-referenced tweets support remote sensing tasks? This work presents a straightforward decision fusion framework for very high-resolution remote sensing images and Twitter text messages. We apply our proposed fusion framework to a land-use classification task - the building function classification task - in which we classify building functions like commercial or residential based on linguistic features derived from tweets and remote sensing images. Using building tags from OpenStreetMap (OSM), we labeled tweets and very high-resolution (VHR) images from Google Maps. We collected English tweets from San Francisco, New York City, Los Angeles, and Washington D.C. and trained a stacked bi-directional LSTM neural network with these tweets. For the aerial images, we predicted building functions with state-of-the-art Convolutional Neural Network (CNN) architectures fine-tuned from ImageNet on the given task. After predicting each modality separately, we combined the prediction probabilities of both models building-wise at a decision level. We show that the proposed fusion framework can improve the classification results of the building type classification task. To the best of our knowledge, we are the first to use semantic contents of Twitter messages and fusing them with remote sensing images to classify building functions at a single building level

    Social media and GIScience: Collection, analysis, and visualization of user-generated spatial data

    Get PDF
    Over the last decade, social media platforms have eclipsed the height of popular culture and communication technology, which, in combination with widespread access to GIS-enabled hardware (i.e. mobile phones), has resulted in the continuous creation of massive amounts of user-generated spatial data. This thesis explores how social media data have been utilized in GIS research and provides a commentary on the impacts of this next iteration of technological change with respect to GIScience. First, the roots of GIS technology are traced to set the stage for the examination of social media as a technological catalyst for change in GIScience. Next, a scoping review is conducted to gather and synthesize a summary of methods used to collect, analyze, and visualize this data. Finally, a case study exploring the spatio-temporality of crowdfunding behaviours in Canada during the COVID-19 pandemic is presented to demonstrate the utility of social media data in spatial research

    Geo-Information Harvesting from Social Media Data

    Get PDF
    As unconventional sources of geo-information, massive imagery and text messages from open platforms and social media form a temporally quasi-seamless, spatially multi-perspective stream, but with unknown and diverse quality. Due to its complementarity to remote sensing data, geo-information from these sources offers promising perspectives, but harvesting is not trivial due to its data characteristics. In this article, we address key aspects in the field, including data availability, analysis-ready data preparation and data management, geo-information extraction from social media text messages and images, and the fusion of social media and remote sensing data. We then showcase some exemplary geographic applications. In addition, we present the first extensive discussion of ethical considerations of social media data in the context of geo-information harvesting and geographic applications. With this effort, we wish to stimulate curiosity and lay the groundwork for researchers who intend to explore social media data for geo-applications. We encourage the community to join forces by sharing their code and data.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin

    Community-Based Behavioral Understanding of Mobility Trends and Public Attitude through Transportation User and Agency Interactions on Social Media in the Emergence of Covid-19

    Get PDF
    The increased availability of technology-enabled transportation options and modern communication devices (smartphones, in particular) is transforming travel-related decision-making in the population differently at different places, points in time, modes of transportation, and socio-economic groups. The emergence of COVID-19 made the dynamics of passenger travel behavior more complex, forcing a worldwide, unparalleled change in human travel behavior and introducing a new normal into their existence. This dissertation explores the potential of social media platforms (SMPs) as a viable alternative to traditional approaches (e.g., travel surveys) to understand the complex dynamics of people’s mobility patterns in the emergence of COVID-19. In this dissertation, we focus on three objectives. First, a novel approach to developing comparative infographics of emerging transportation trends is introduced by natural language processing and data-driven techniques using large-scale social media data. Second, a methodology has been developed to model community-based travel behavior under different socioeconomic and demographic factors at the community level in the emergence of COVID-19 on Twitter, inferring users’ demographics to overcome sampling bias. Third, the communication patterns of different transportation agencies on Twitter regarding message kinds, communication sufficiency, consistency, and coordination were examined by applying text mining techniques and dynamic network analysis. The methodologies and findings of the dissertation will allow real-time monitoring of transportation trends by agencies, researchers, and professionals. Potential applications of the work may include: (1) identifying spatial diversity of public mobility needs and concerns through social media platforms; (2) developing new policies that would satisfy the diverse needs at different locations; (3) introducing new plans to support and celebrate equity, diversity, and inclusion in the transportation sector that would improve the efficient flow of goods and services; (4) designing new methods to model community-based travel behavior at different scales (e.g., census block, zip code, etc.) using social media data inferring users’ socio-economic and demographic properties; and (5) implementing efficient policies to improve existing communication plans, critical information dissemination efficacy, and coordination of different transportation actors to raise awareness among passengers in general and during unprecedented health crises in the fragmented communication world
    corecore