34 research outputs found

    Categorizing Vaccine Confidence With a Transformer-Based Machine Learning Model: Analysis of Nuances of Vaccine Sentiment in Twitter Discourse.

    Get PDF
    BACKGROUND: Social media has become an established platform for individuals to discuss and debate various subjects, including vaccination. With growing conversations on the web and less than desired maternal vaccination uptake rates, these conversations could provide useful insights to inform future interventions. However, owing to the volume of web-based posts, manual annotation and analysis are difficult and time consuming. Automated processes for this type of analysis, such as natural language processing, have faced challenges in extracting complex stances such as attitudes toward vaccination from large amounts of text. OBJECTIVE: The aim of this study is to build upon recent advances in transposer-based machine learning methods and test whether transformer-based machine learning could be used as a tool to assess the stance expressed in social media posts toward vaccination during pregnancy. METHODS: A total of 16,604 tweets posted between November 1, 2018, and April 30, 2019, were selected using keyword searches related to maternal vaccination. After excluding irrelevant tweets, the remaining tweets were coded by 3 individual researchers into the categories Promotional, Discouraging, Ambiguous, and Neutral or No Stance. After creating a final data set of 2722 unique tweets, multiple machine learning techniques were trained on a part of this data set and then tested and compared with the human annotators. RESULTS: We found the accuracy of the machine learning techniques to be 81.8% (F score=0.78) compared with the agreed score among the 3 annotators. For comparison, the accuracies of the individual annotators compared with the final score were 83.3%, 77.9%, and 77.5%. CONCLUSIONS: This study demonstrates that we are able to achieve close to the same accuracy in categorizing tweets using our machine learning models as could be expected from a single human coder. The potential to use this automated process, which is reliable and accurate, could free valuable time and resources for conducting this analysis, in addition to informing potentially effective and necessary interventions

    Language complexity in on-line health information retrieval

    Get PDF
    The number of people searching for on-line health information has been steadily growing over the years so it is crucial to understand their specific requirements in order to help them finding easily and quickly the specific in-formation they are looking for. Although generic search engines are typically used by health information seekers as the starting point for searching information, they have been shown to be limited and unsatisfactory because they make generic searches, often overloading the user with the provided amount of results. Moreover, they are not able to provide specific information to different types of users. At the same time, specific search engines mostly work on medical literature and provide extracts from medical journals that are mainly useful for medical researchers and experts but not for non-experts. A question then arises: Is it possible to facilitate the search of on-line health/medical information based on specific user requirements? In this pa-per, after analysing the main characteristics and requirements of on-line health seeking, we provide a first answer to this question by exploiting the Web structured data for the health domain and presenting a system that allows different types of users, i.e., non-medical experts and medical experts, to retrieve Web pages with language complexity levels suitable to their expertise. Furthermore, we apply our methodology to the results of a generic search engine, such as Google, in order to re-rank them and provide different users with the proper health/medical Web pages in terms of language complexity

    Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model

    Get PDF
    Social media is increasingly being used to express opinions and attitudes toward vaccines. The vaccine stance of social media posts can be classified in almost real-time using machine learning. We describe the use of a Transformer-based machine learning model for analyzing vaccine stance of Italian tweets, and demonstrate the need to address changes over time in vaccine-related language, through periodic model retraining. Vaccine-related tweets were collected through a platform developed for the European Joint Action on Vaccination. Two datasets were collected, the first between November 2019 and June 2020, the second from April to September 2021. The tweets were manually categorized by three independent annotators. After cleaning, the total dataset consisted of 1,736 tweets with 3 categories (promotional, neutral, and discouraging). The manually classified tweets were used to train and test various machine learning models. The model that classified the data most similarly to humans was XLM-Roberta-large, a multilingual version of the Transformer-based model RoBERTa. The model hyper-parameters were tuned and then the model ran five times. The fine-tuned model with the best F-score over the validation dataset was selected. Running the selected fine-tuned model on just the first test dataset resulted in an accuracy of 72.8% (F-score 0.713). Using this model on the second test dataset resulted in a 10% drop in accuracy to 62.1% (F-score 0.617), indicating that the model recognized a difference in language between the datasets. On the combined test datasets the accuracy was 70.1% (F-score 0.689). Retraining the model using data from the first and second datasets increased the accuracy over the second test dataset to 71.3% (F-score 0.713), a 9% improvement from when using just the first dataset for training. The accuracy over the first test dataset remained the same at 72.8% (F-score 0.721). The accuracy over the combined test datasets was then 72.4% (F-score 0.720), a 2% improvement. Through fine-tuning a machine-learning model on task-specific data, the accuracy achieved in categorizing tweets was close to that expected by a single human annotator. Regular training of machine-learning models with recent data is advisable to maximize accuracy

    Italian hospitals on the web: a cross-sectional analysis of official websites

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although the use of the Internet for health purposes has increased steadily in the last decade, only a few studies have explored the information provided by the websites of health institutions and no studies on the on-line activities of Italian hospitals have been performed to date. The aim of this study was to explore the characteristics of the contents and the user-orientation of Italian hospital websites.</p> <p>Methods</p> <p>The cross-sectional analysis considered all the Italian hospitals with a working website between December 2008 and February 2009. The websites were coded using an <it>ad hoc </it>Codebook, comprising eighty-nine items divided into five sections: technical characteristics, hospital information and facilities, medical services, interactive on-line services and external activities. We calculated a website evaluation score, on the basis of the items satisfied, to compare private (PrHs) and public hospitals, the latter divided into ones with their own website (PubHs-1) and ones with a section on the website of their Local Health Authority (PubHs-2). Lastly, a descriptive analysis of each item was carried out.</p> <p>Results</p> <p>Out of the 1265 hospitals in Italy, we found that 419 of the 652 public hospitals (64.3%) and 344 of the 613 PrHs (56.1%) had a working website (p = 0.01). The mean website evaluation score was 41.9 for PubHs-1, 21.2 for PubHs-2 and 30.8 for PrHs (p < 0.001).</p> <p>Only 5 hospitals out of 763 (< 1%) provided specific clinical performance indicators, such as the nosocomial infection rate or the surgical mortality rates. Regarding interactive on-line services, although nearly 80% of both public and private hospitals enabled users to communicate on-line, less than 18% allowed the reservation of medical services, and only 8 websites (1%) provided a health-care forum.</p> <p>Conclusions</p> <p>A high percentage of hospitals did not provide an official website and the majority of the websites found had several limitations. Very few hospitals provided information to increase the credibility of the hospital and user confidence in the institution. This study suggests that Italian hospital websites are more a source of information on admissions and services than a means of communication between user and hospital.</p

    Surfing the internet for health information: an italian survey on use and population choices

    Get PDF
    BACKGROUND: Recent international sources have described how the rapid expansion of the Internet has precipitated an increase in its use by the general population to search for medical information. Most studies on e-health use investigated either through the prevalence of such use and the social and income patterns of users in selected populations, or the psychological consequences and satisfaction experienced by patients with particular diseases. Few studies have been carried out in Europe that have tried to identify the behavioral consequences of Internet use for health-related purposes in the general population.The aims of this study are to provide information about the prevalence of Internet use for health-related purposes in Italy according to demographic and socio-cultural features, to investigate the impact of the information found on health-related behaviors and choices and to analyze any differences based on health condition, self-rated health and relationships with health professionals and facilities. METHODS: A multicenter survey was designed within six representative Italian cities. Data were collected through a validated questionnaire administered in hospital laboratories by physicians. Respondents were questioned about their generic condition, their use of the Internet and their health behaviors and choices related to Internet use. Data were analyzed using descriptive statistics and logistic regression to assess any differences by socio-demographic and health-related variables. RESULTS: The sample included 3018 individuals between the ages of 18 and 65 years. Approximately 65% of respondents reported using the Internet, and 57% of them reported using it to search for health-related information. The main reasons for search on the Internet were faster access and a greater amount of information. People using the Internet more for health-related purposes were younger, female and affected by chronic diseases. CONCLUSIONS: A large number of Internet users search for health information and subsequently modify their health behaviors and relationships with their medical providers. This may suggest a strong public health impact with consequences in all European countries, and it would be prudent to plan educational and prevention programs. However, it could be important to investigate the quality of health-related websites to protect and inform user

    COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter

    Get PDF
    IntroductionThis study presents COVID-Twitter-BERT (CT-BERT), a transformer-based model that is pre-trained on a large corpus of COVID-19 related Twitter messages. CT-BERT is specifically designed to be used on COVID-19 content, particularly from social media, and can be utilized for various natural language processing tasks such as classification, question-answering, and chatbots. This paper aims to evaluate the performance of CT-BERT on different classification datasets and compare it with BERT-LARGE, its base model.MethodsThe study utilizes CT-BERT, which is pre-trained on a large corpus of COVID-19 related Twitter messages. The authors evaluated the performance of CT-BERT on five different classification datasets, including one in the target domain. The model's performance is compared to its base model, BERT-LARGE, to measure the marginal improvement. The authors also provide detailed information on the training process and the technical specifications of the model.ResultsThe results indicate that CT-BERT outperforms BERT-LARGE with a marginal improvement of 10-30% on all five classification datasets. The largest improvements are observed in the target domain. The authors provide detailed performance metrics and discuss the significance of these results.DiscussionThe study demonstrates the potential of pre-trained transformer models, such as CT-BERT, for COVID-19 related natural language processing tasks. The results indicate that CT-BERT can improve the classification performance on COVID-19 related content, especially on social media. These findings have important implications for various applications, such as monitoring public sentiment and developing chatbots to provide COVID-19 related information. The study also highlights the importance of using domain-specific pre-trained models for specific natural language processing tasks. Overall, this work provides a valuable contribution to the development of COVID-19 related NLP models
    corecore