84 research outputs found
Exploring the Power of Topic Modeling Techniques in Analyzing Customer Reviews: A Comparative Analysis
The exponential growth of online social network platforms and applications
has led to a staggering volume of user-generated textual content, including
comments and reviews. Consequently, users often face difficulties in extracting
valuable insights or relevant information from such content. To address this
challenge, machine learning and natural language processing algorithms have
been deployed to analyze the vast amount of textual data available online. In
recent years, topic modeling techniques have gained significant popularity in
this domain. In this study, we comprehensively examine and compare five
frequently used topic modeling methods specifically applied to customer
reviews. The methods under investigation are latent semantic analysis (LSA),
latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF),
pachinko allocation model (PAM), Top2Vec, and BERTopic. By practically
demonstrating their benefits in detecting important topics, we aim to highlight
their efficacy in real-world scenarios. To evaluate the performance of these
topic modeling methods, we carefully select two textual datasets. The
evaluation is based on standard statistical evaluation metrics such as topic
coherence score. Our findings reveal that BERTopic consistently yield more
meaningful extracted topics and achieve favorable results.Comment: 13 page
A Sustainable West? Analyzing Clusters of Public Opinion in Sustainability Western Discourses in a Collection of Multilingual Newspapers (1999-2018)
In this article, we analyze the temporal and geographic evolution of sustainability-related discourses over a time frame of twenty years (1999-2018). We use a collection of multilingual newspapers in English, French, German, Spanish, and Italian, as a proxy. We filter documents using four key terms: sustainable development, climate change, environment, and pollution, seeking to explore how different newspapers encode the same message, aiming to detect points of contact (agreement) and rupture (polarity). Our methodology includes Topic Modelling (Pachinko Allocation [1]), word embeddings [2], Ward’s hierarchical cluster analysis [3], and network analysis [4]. Our results show a progressive simplification of semantic fields over time, reflecting less polarizing views across countries and, therefore, showing an increasing agreement on sustainability-related discourses in our contemporary societies. Moreover, we also notice little variation of newspapers rhetorics over time. Therefore, this article also contributes with a meta-reflection about newspapers behaviour as information containers
Topic Modeling for Analysing Similarity between Users in Twitter
La minerÃa de datos en redes sociales está ganando importancia debido a que permite realizar campañas de marketing más precisas. Por ejemplo, Google realiza un análisis de todos nuestros datos: vÃdeos que vemos, términos que buscamos, páginas webs a las que accedemos, aplicaciones que descargamos, etc. para conocernos mejor y mostrarnos publicidad personalizada.
LDA es un modelo estadÃstico generativo para modelar documentos. Existen diversos algoritmos que dado un conjunto de documentos permiten obtener un modelo LDA que podrÃa haber generado esos documentos. Con ese modelo es posible observar los temas usados en esos documentos y las palabras más relevantes para cada tema.
En el presente trabajo se pretende realizar una primera aproximación a la minerÃa de datos en Twitter. Para ello, usando la API de Twitter se han descargado tweets de diversos usuarios y de sus seguidores. Posteriormente se han procesado esos Tweets generando documentos y se ha aplicado la implementación de Gensim del algoritmo Online LDA para obtener los temas de los documentos. Posteriormente, se han comparado los temas de los usuarios con los de sus seguidores.
También se proporciona un análisis del estado del arte de la minerÃa de datos en Twitter
SocialVisTUM: An Interactive Visualization Toolkit for Correlated Neural Topic Models on Social Media Opinion Mining
Recent research in opinion mining proposed word embedding-based topic
modeling methods that provide superior coherence compared to traditional topic
modeling. In this paper, we demonstrate how these methods can be used to
display correlated topic models on social media texts using SocialVisTUM, our
proposed interactive visualization toolkit. It displays a graph with topics as
nodes and their correlations as edges. Further details are displayed
interactively to support the exploration of large text collections, e.g.,
representative words and sentences of topics, topic and sentiment
distributions, hierarchical topic clustering, and customizable, predefined
topic labels. The toolkit optimizes automatically on custom data for optimal
coherence. We show a working instance of the toolkit on data crawled from
English social media discussions about organic food consumption. The
visualization confirms findings of a qualitative consumer research study.
SocialVisTUM and its training procedures are accessible online.Comment: Demo paper accepted for publication on RANLP 2021; 8 pages, 5
figures, 1 tabl
- …