239,239 research outputs found
The Real-World Experiences of Persons With Multiple Sclerosis During the First COVID-19 Lockdown: Application of Natural Language Processing
BACKGROUND
The increasing availability of "real-world" data in the form of written text holds promise for deepening our understanding of societal and health-related challenges. Textual data constitute a rich source of information, allowing the capture of lived experiences through a broad range of different sources of information (eg, content and emotional tone). Interviews are the "gold standard" for gaining qualitative insights into individual experiences and perspectives. However, conducting interviews on a large scale is not always feasible, and standardized quantitative assessment suitable for large-scale application may miss important information. Surveys that include open-text assessments can combine the advantages of both methods and are well suited for the application of natural language processing (NLP) methods. While innovations in NLP have made large-scale text analysis more accessible, the analysis of real-world textual data is still complex and requires several consecutive steps.
OBJECTIVE
We developed and subsequently examined the utility and scientific value of an NLP pipeline for extracting real-world experiences from textual data to provide guidance for applied researchers.
METHODS
We applied the NLP pipeline to large-scale textual data collected by the Swiss Multiple Sclerosis (MS) registry. Such textual data constitute an ideal use case for the study of real-world text data. Specifically, we examined 639 text reports on the experienced impact of the first COVID-19 lockdown from the perspectives of persons with MS. The pipeline has been implemented in Python and complemented by analyses of the "Linguistic Inquiry and Word Count" software. It consists of the following 5 interconnected analysis steps: (1) text preprocessing; (2) sentiment analysis; (3) descriptive text analysis; (4) unsupervised learning-topic modeling; and (5) results interpretation and validation.
RESULTS
A topic modeling analysis identified the following 4 distinct groups based on the topics participants were mainly concerned with: "contacts/communication;" "social environment;" "work;" and "errands/daily routines." Notably, the sentiment analysis revealed that the "contacts/communication" group was characterized by a pronounced negative emotional tone underlying the text reports. This observed heterogeneity in emotional tonality underlying the reported experiences of the first COVID-19-related lockdown is likely to reflect differences in emotional burden, individual circumstances, and ways of coping with the pandemic, which is in line with previous research on this matter.
CONCLUSIONS
This study illustrates the timely and efficient applicability of an NLP pipeline and thereby serves as a precedent for applied researchers. Our study thereby contributes to both the dissemination of NLP techniques in applied health sciences and the identification of previously unknown experiences and burdens of persons with MS during the pandemic, which may be relevant for future treatment
Recommended from our members
New topic detection in microblogs and topic model evaluation using topical alignment
textThis thesis deals with topic model evaluation and new topic detection in microblogs. Microblogs are short and thus may not carry any contextual clues. Hence it becomes challenging to apply traditional natural language processing algorithms on such data. Graphical models have been traditionally used for topic discovery and text clustering on sets of text-based documents. Their unsupervised nature allows topic models to be trained easily on datasets meant for specific domains. However the advantage of not requiring annotated data comes with a drawback with respect to evaluation difficulties. The problem aggravates when the data comprises microblogs which are unstructured and noisy.
We demonstrate the application of three types of such models to microblogs - the Latent Dirichlet Allocation, the Author-Topic and the Author-Recipient-Topic model. We extensively evaluate these models under different settings, and our results show that the Author-Recipient-Topic model extracts the most coherent topics. We also addressed the problem of topic modeling on short text by using clustering techniques. This technique helps in boosting the performance of our models.
Topical alignment is used for large scale assessment of topical relevance by comparing topics to manually generated domain specific concepts. In this thesis we use this idea to evaluate topic models by measuring misalignments between topics. Our study on comparing topic models reveals interesting traits about Twitter messages, users and their interactions and establishes that joint modeling on author-recipient pairs and on the content of tweet leads to qualitatively better topic discovery.
This thesis gives a new direction to the well known problem of topic discovery in microblogs. Trend prediction or topic discovery for microblogs is an extensive research area. We propose the idea of using topical alignment to detect new topics by comparing topics from the current week to those of the previous week. We measure correspondence between a set of topics from the current week and a set of topics from the previous week to quantify five types of misalignments: \textit{junk, fused, missing} and \textit{repeated}. Our analysis compares three types of topic models under different settings and demonstrates how our framework can detect new topics from topical misalignments. In particular so-called \textit{junk} topics are more likely to be new topics and the \textit{missing} topics are likely to have died or die out.
To get more insights into the nature of microblogs we apply topical alignment to hashtags. Comparing topics to hashtags enables us to make interesting inferences about Twitter messages and their content. Our study revealed that although a very small proportion of Twitter messages explicitly contain hashtags, the proportion of tweets that discuss topics related to hashtags is much higher.Computer Science
The Real-World Experiences of Persons With Multiple Sclerosis During the First COVID-19 Lockdown: Application of Natural Language Processing.
The increasing availability of "real-world" data in the form of written text holds promise for deepening our understanding of societal and health-related challenges. Textual data constitute a rich source of information, allowing the capture of lived experiences through a broad range of different sources of information (eg, content and emotional tone). Interviews are the "gold standard" for gaining qualitative insights into individual experiences and perspectives. However, conducting interviews on a large scale is not always feasible, and standardized quantitative assessment suitable for large-scale application may miss important information. Surveys that include open-text assessments can combine the advantages of both methods and are well suited for the application of natural language processing (NLP) methods. While innovations in NLP have made large-scale text analysis more accessible, the analysis of real-world textual data is still complex and requires several consecutive steps.
We developed and subsequently examined the utility and scientific value of an NLP pipeline for extracting real-world experiences from textual data to provide guidance for applied researchers.
We applied the NLP pipeline to large-scale textual data collected by the Swiss Multiple Sclerosis (MS) registry. Such textual data constitute an ideal use case for the study of real-world text data. Specifically, we examined 639 text reports on the experienced impact of the first COVID-19 lockdown from the perspectives of persons with MS. The pipeline has been implemented in Python and complemented by analyses of the "Linguistic Inquiry and Word Count" software. It consists of the following 5 interconnected analysis steps: (1) text preprocessing; (2) sentiment analysis; (3) descriptive text analysis; (4) unsupervised learning-topic modeling; and (5) results interpretation and validation.
A topic modeling analysis identified the following 4 distinct groups based on the topics participants were mainly concerned with: "contacts/communication;" "social environment;" "work;" and "errands/daily routines." Notably, the sentiment analysis revealed that the "contacts/communication" group was characterized by a pronounced negative emotional tone underlying the text reports. This observed heterogeneity in emotional tonality underlying the reported experiences of the first COVID-19-related lockdown is likely to reflect differences in emotional burden, individual circumstances, and ways of coping with the pandemic, which is in line with previous research on this matter.
This study illustrates the timely and efficient applicability of an NLP pipeline and thereby serves as a precedent for applied researchers. Our study thereby contributes to both the dissemination of NLP techniques in applied health sciences and the identification of previously unknown experiences and burdens of persons with MS during the pandemic, which may be relevant for future treatment
Técnicas big data: análisis de textos a gran escala para la investigación científica y periodística
Big data techniques: Large-scale text analysis for scientific and journalistic research. This paper conceptualizes the term big data and describes its relevance in social research and journalistic practices. We explain large-scale text analysis techniques such as automated content analysis, data mining, machine learning, topic modeling, and sentiment analysis, which may help scientific discovery in social sciences and news production in journalism. We explain the required e-infrastructure for big data analysis with the use of cloud computing and we asses the use of the main packages and libraries for information retrieval and analysis in commercial software and programming languages such as Python or
Técnicas big data: análisis de textos a gran escala para la investigación científica y periodística
Big data techniques: Large-scale text analysis for scientific and journalistic research. This paper conceptualizes the term big data and describes its relevance in social research and journalistic practices. We explain large-scale text analysis techniques such as automated content analysis, data mining, machine learning, topic modeling, and sentiment analysis, which may help scientific discovery in social sciences and news production in journalism. We explain the required e-infrastructure for big data analysis with the use of cloud computing and we asses the use of the main packages and libraries for information retrieval and analysis in commercial software and programming languages such as Python or
Deep Belief Nets for Topic Modeling
Applying traditional collaborative filtering to digital publishing is
challenging because user data is very sparse due to the high volume of
documents relative to the number of users. Content based approaches, on the
other hand, is attractive because textual content is often very informative. In
this paper we describe large-scale content based collaborative filtering for
digital publishing. To solve the digital publishing recommender problem we
compare two approaches: latent Dirichlet allocation (LDA) and deep belief nets
(DBN) that both find low-dimensional latent representations for documents.
Efficient retrieval can be carried out in the latent representation. We work
both on public benchmarks and digital media content provided by Issuu, an
online publishing platform. This article also comes with a newly developed deep
belief nets toolbox for topic modeling tailored towards performance evaluation
of the DBN model and comparisons to the LDA model.Comment: Accepted to the ICML-2014 Workshop on Knowledge-Powered Deep Learning
for Text Minin
Learning Topic Models by Belief Propagation
Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model
for probabilistic topic modeling, which attracts worldwide interests and
touches on many important applications in text mining, computer vision and
computational biology. This paper represents LDA as a factor graph within the
Markov random field (MRF) framework, which enables the classic loopy belief
propagation (BP) algorithm for approximate inference and parameter estimation.
Although two commonly-used approximate inference methods, such as variational
Bayes (VB) and collapsed Gibbs sampling (GS), have gained great successes in
learning LDA, the proposed BP is competitive in both speed and accuracy as
validated by encouraging experimental results on four large-scale document data
sets. Furthermore, the BP algorithm has the potential to become a generic
learning scheme for variants of LDA-based topic models. To this end, we show
how to learn two typical variants of LDA-based topic models, such as
author-topic models (ATM) and relational topic models (RTM), using BP based on
the factor graph representation.Comment: 14 pages, 17 figure
A Fuzzy Approach Model for Uncovering Hidden Latent Semantic Structure in Medical Text Collections
One of the challenges for text analysis in the medical domain including the clinical notes and research papers is analyzing large-scale medical documents. As a consequence, finding relevant documents has become more difficult and previous work has also shown unique problems of medical documents. The themes in documents help to retrieve documents on the same topic with and without a query. One of the popular methods to retrieve information based on discovering the themes in the documents is topic modeling. In this paper we describe a novel approach in topic modeling, FATM, using fuzzy clustering. To assess the value of FATM, we experiment with two text datasets of medical documents. The quantitative evaluation carried out through log-likelihood on held-out data shows that FATM produces superior performance to LDA. This research contributes to the emerging field of understanding the characteristics of the medical documents and how to account for them in text mining.ye
- …