Search CORE

239,239 research outputs found

The Real-World Experiences of Persons With Multiple Sclerosis During the First COVID-19 Lockdown: Application of Natural Language Processing

Author: Calabrese Pasquale
Chan Andrew
Chiavi Deborah
Gobbi Claudio
Haag Christina
Hoepner Robert
Kamm Christian Philipp
Kesselring Jürg
Manjaly Zina-Mary
Pot Caroline
Puhan Milo
Rapold Irene
Rodgers Stephanie
Salmen Anke
Sieber Chloé
Stanikić Mina
Stegmayer Katharina
von Wyl Viktor
Walther Sebastian
Zecca Chiara
Publication venue: 'JMIR Publications Inc.'
Publication date: 10/11/2022
Field of study

BACKGROUND The increasing availability of "real-world" data in the form of written text holds promise for deepening our understanding of societal and health-related challenges. Textual data constitute a rich source of information, allowing the capture of lived experiences through a broad range of different sources of information (eg, content and emotional tone). Interviews are the "gold standard" for gaining qualitative insights into individual experiences and perspectives. However, conducting interviews on a large scale is not always feasible, and standardized quantitative assessment suitable for large-scale application may miss important information. Surveys that include open-text assessments can combine the advantages of both methods and are well suited for the application of natural language processing (NLP) methods. While innovations in NLP have made large-scale text analysis more accessible, the analysis of real-world textual data is still complex and requires several consecutive steps. OBJECTIVE We developed and subsequently examined the utility and scientific value of an NLP pipeline for extracting real-world experiences from textual data to provide guidance for applied researchers. METHODS We applied the NLP pipeline to large-scale textual data collected by the Swiss Multiple Sclerosis (MS) registry. Such textual data constitute an ideal use case for the study of real-world text data. Specifically, we examined 639 text reports on the experienced impact of the first COVID-19 lockdown from the perspectives of persons with MS. The pipeline has been implemented in Python and complemented by analyses of the "Linguistic Inquiry and Word Count" software. It consists of the following 5 interconnected analysis steps: (1) text preprocessing; (2) sentiment analysis; (3) descriptive text analysis; (4) unsupervised learning-topic modeling; and (5) results interpretation and validation. RESULTS A topic modeling analysis identified the following 4 distinct groups based on the topics participants were mainly concerned with: "contacts/communication;" "social environment;" "work;" and "errands/daily routines." Notably, the sentiment analysis revealed that the "contacts/communication" group was characterized by a pronounced negative emotional tone underlying the text reports. This observed heterogeneity in emotional tonality underlying the reported experiences of the first COVID-19-related lockdown is likely to reflect differences in emotional burden, individual circumstances, and ways of coping with the pandemic, which is in line with previous research on this matter. CONCLUSIONS This study illustrates the timely and efficient applicability of an NLP pipeline and thereby serves as a precedent for applied researchers. Our study thereby contributes to both the dissemination of NLP techniques in applied health sciences and the identification of previously unknown experiences and burdens of persons with MS during the pandemic, which may be relevant for future treatment

ZORA

Recommended from our members

New topic detection in microblogs and topic model evaluation using topical alignment

Author: Rajani Nazneen Fatema
Publication venue
Publication date: 16/09/2014
Field of study

textThis thesis deals with topic model evaluation and new topic detection in microblogs. Microblogs are short and thus may not carry any contextual clues. Hence it becomes challenging to apply traditional natural language processing algorithms on such data. Graphical models have been traditionally used for topic discovery and text clustering on sets of text-based documents. Their unsupervised nature allows topic models to be trained easily on datasets meant for specific domains. However the advantage of not requiring annotated data comes with a drawback with respect to evaluation difficulties. The problem aggravates when the data comprises microblogs which are unstructured and noisy. We demonstrate the application of three types of such models to microblogs - the Latent Dirichlet Allocation, the Author-Topic and the Author-Recipient-Topic model. We extensively evaluate these models under different settings, and our results show that the Author-Recipient-Topic model extracts the most coherent topics. We also addressed the problem of topic modeling on short text by using clustering techniques. This technique helps in boosting the performance of our models. Topical alignment is used for large scale assessment of topical relevance by comparing topics to manually generated domain specific concepts. In this thesis we use this idea to evaluate topic models by measuring misalignments between topics. Our study on comparing topic models reveals interesting traits about Twitter messages, users and their interactions and establishes that joint modeling on author-recipient pairs and on the content of tweet leads to qualitatively better topic discovery. This thesis gives a new direction to the well known problem of topic discovery in microblogs. Trend prediction or topic discovery for microblogs is an extensive research area. We propose the idea of using topical alignment to detect new topics by comparing topics from the current week to those of the previous week. We measure correspondence between a set of topics from the current week and a set of topics from the previous week to quantify five types of misalignments: \textit{junk, fused, missing} and \textit{repeated}. Our analysis compares three types of topic models under different settings and demonstrates how our framework can detect new topics from topical misalignments. In particular so-called \textit{junk} topics are more likely to be new topics and the \textit{missing} topics are likely to have died or die out. To get more insights into the nature of microblogs we apply topical alignment to hashtags. Comparing topics to hashtags enables us to make interesting inferences about Twitter messages and their content. Our study revealed that although a very small proportion of Twitter messages explicitly contain hashtags, the proportion of tweets that discuss topics related to hashtags is much higher.Computer Science

Texas ScholarWorks

The Real-World Experiences of Persons With Multiple Sclerosis During the First COVID-19 Lockdown: Application of Natural Language Processing.

Author: Calabrese P.
Chan A.
Chiavi D.
Gobbi C.
Haag C.
Hoepner R.
Kamm C.P.
Kesselring J.
Manjaly Z.M.
Pot C.
Puhan M.
Rapold I.
Rodgers S.
Salmen A.
Sieber C.
Stanikić M.
Stegmayer K.
von Wyl V.
Walther S.
Zecca C.
Publication venue: 'JMIR Publications Inc.'
Publication date: 10/11/2022
Field of study

The increasing availability of "real-world" data in the form of written text holds promise for deepening our understanding of societal and health-related challenges. Textual data constitute a rich source of information, allowing the capture of lived experiences through a broad range of different sources of information (eg, content and emotional tone). Interviews are the "gold standard" for gaining qualitative insights into individual experiences and perspectives. However, conducting interviews on a large scale is not always feasible, and standardized quantitative assessment suitable for large-scale application may miss important information. Surveys that include open-text assessments can combine the advantages of both methods and are well suited for the application of natural language processing (NLP) methods. While innovations in NLP have made large-scale text analysis more accessible, the analysis of real-world textual data is still complex and requires several consecutive steps. We developed and subsequently examined the utility and scientific value of an NLP pipeline for extracting real-world experiences from textual data to provide guidance for applied researchers. We applied the NLP pipeline to large-scale textual data collected by the Swiss Multiple Sclerosis (MS) registry. Such textual data constitute an ideal use case for the study of real-world text data. Specifically, we examined 639 text reports on the experienced impact of the first COVID-19 lockdown from the perspectives of persons with MS. The pipeline has been implemented in Python and complemented by analyses of the "Linguistic Inquiry and Word Count" software. It consists of the following 5 interconnected analysis steps: (1) text preprocessing; (2) sentiment analysis; (3) descriptive text analysis; (4) unsupervised learning-topic modeling; and (5) results interpretation and validation. A topic modeling analysis identified the following 4 distinct groups based on the topics participants were mainly concerned with: "contacts/communication;" "social environment;" "work;" and "errands/daily routines." Notably, the sentiment analysis revealed that the "contacts/communication" group was characterized by a pronounced negative emotional tone underlying the text reports. This observed heterogeneity in emotional tonality underlying the reported experiences of the first COVID-19-related lockdown is likely to reflect differences in emotional burden, individual circumstances, and ways of coping with the pandemic, which is in line with previous research on this matter. This study illustrates the timely and efficient applicability of an NLP pipeline and thereby serves as a precedent for applied researchers. Our study thereby contributes to both the dissemination of NLP techniques in applied health sciences and the identification of previously unknown experiences and burdens of persons with MS during the pandemic, which may be relevant for future treatment

Serveur académique lausannois

Técnicas big data: análisis de textos a gran escala para la investigación científica y periodística

Author: Arcila-Calderón Carlos
Barbosa-Caro Eduar
Cabezuelo-Lorenzo Francisco
Publication venue: Ediciones Profesionales de la Información SL, Spain
Publication date: 01/07/2016
Field of study

Big data techniques: Large-scale text analysis for scientific and journalistic research. This paper conceptualizes the term big data and describes its relevance in social research and journalistic practices. We explain large-scale text analysis techniques such as automated content analysis, data mining, machine learning, topic modeling, and sentiment analysis, which may help scientific discovery in social sciences and news production in journalism. We explain the required e-infrastructure for big data analysis with the use of cloud computing and we asses the use of the main packages and libraries for information retrieval and analysis in commercial software and programming languages such as Python or

Técnicas big data: análisis de textos a gran escala para la investigación científica y periodística

Author: Arcila-Calderón Carlos
Barbosa-Caro Eduar
Cabezuelo-Lorenzo Francisco
Publication venue: Ediciones Profesionales de la Información SL, Spain
Publication date: 01/07/2016
Field of study

E-LIS

Deep Belief Nets for Topic Modeling

Author: Arngren Morten
Maaloe Lars
Winther Ole
Publication venue
Publication date: 01/01/2015
Field of study

Applying traditional collaborative filtering to digital publishing is challenging because user data is very sparse due to the high volume of documents relative to the number of users. Content based approaches, on the other hand, is attractive because textual content is often very informative. In this paper we describe large-scale content based collaborative filtering for digital publishing. To solve the digital publishing recommender problem we compare two approaches: latent Dirichlet allocation (LDA) and deep belief nets (DBN) that both find low-dimensional latent representations for documents. Efficient retrieval can be carried out in the latent representation. We work both on public benchmarks and digital media content provided by Issuu, an online publishing platform. This article also comes with a newly developed deep belief nets toolbox for topic modeling tailored towards performance evaluation of the DBN model and comparisons to the LDA model.Comment: Accepted to the ICML-2014 Workshop on Knowledge-Powered Deep Learning for Text Minin

arXiv.org e-Print Archive

Online Research Database In Technology

Learning Topic Models by Belief Propagation

Author: Cheung William K.
Liu Jiming
Zeng Jia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/03/2012
Field of study

Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model for probabilistic topic modeling, which attracts worldwide interests and touches on many important applications in text mining, computer vision and computational biology. This paper represents LDA as a factor graph within the Markov random field (MRF) framework, which enables the classic loopy belief propagation (BP) algorithm for approximate inference and parameter estimation. Although two commonly-used approximate inference methods, such as variational Bayes (VB) and collapsed Gibbs sampling (GS), have gained great successes in learning LDA, the proposed BP is competitive in both speed and accuracy as validated by encouraging experimental results on four large-scale document data sets. Furthermore, the BP algorithm has the potential to become a generic learning scheme for variants of LDA-based topic models. To this end, we show how to learn two typical variants of LDA-based topic models, such as author-topic models (ATM) and relational topic models (RTM), using BP based on the factor graph representation.Comment: 14 pages, 17 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

A Fuzzy Approach Model for Uncovering Hidden Latent Semantic Structure in Medical Text Collections

Author: Gangopadhyay Aryya
Karami Amir
Kharrazi Hadi
Zhou Bin
Publication venue: 'iSchools'
Publication date: 15/03/2015
Field of study

One of the challenges for text analysis in the medical domain including the clinical notes and research papers is analyzing large-scale medical documents. As a consequence, finding relevant documents has become more difficult and previous work has also shown unique problems of medical documents. The themes in documents help to retrieve documents on the same topic with and without a query. One of the popular methods to retrieve information based on discovering the themes in the documents is topic modeling. In this paper we describe a novel approach in topic modeling, FATM, using fuzzy clustering. To assess the value of FATM, we experiment with two text datasets of medical documents. The quantitative evaluation carried out through log-likelihood on held-out data shows that FATM produces superior performance to LDA. This research contributes to the emerging field of understanding the characteristics of the medical documents and how to account for them in text mining.ye

Illinois Digital Environment for Access to Learning and Scholarship Repository