1,777 research outputs found

    Social Media Analysis for Social Good

    Get PDF
    Data on social media is abundant and offers valuable information that can be utilised for a range of purposes. Users share their experiences and opinions on various topics, ranging from their personal life to the community and the world, in real-time. In comparison to conventional data sources, social media is cost-effective to obtain, is up-to-date and reaches a larger audience. By analysing this rich data source, it can contribute to solving societal issues and promote social impact in an equitable manner. In this thesis, I present my research in exploring innovative applications using \ac{NLP} and machine learning to identify patterns and extract actionable insights from social media data to ultimately make a positive impact on society. First, I evaluate the impact of an intervention program aimed at promoting inclusive and equitable learning opportunities for underrepresented communities using social media data. Second, I develop EmoBERT, an emotion-based variant of the BERT model, for detecting fine-grained emotions to gauge the well-being of a population during significant disease outbreaks. Third, to improve public health surveillance on social media, I demonstrate how emotions expressed in social media posts can be incorporated into health mention classification using an intermediate task fine-tuning and multi-feature fusion approach. I also propose a multi-task learning framework to model the literal meanings of disease and symptom words to enhance the classification of health mentions. Fourth, I create a new health mention dataset to address the imbalance in health data availability between developing and developed countries, providing a benchmark alternative to the traditional standards used in digital health research. Finally, I leverage the power of pretrained language models to analyse religious activities, recognised as social determinants of health, during disease outbreaks

    Social analytics for health integration, intelligence, and monitoring

    Get PDF
    Nowadays, patient-generated social health data are abundant and Healthcare is changing from the authoritative provider-centric model to collaborative and patient-oriented care. The aim of this dissertation is to provide a Social Health Analytics framework to utilize social data to solve the interdisciplinary research challenges of Big Data Science and Health Informatics. Specific research issues and objectives are described below. The first objective is semantic integration of heterogeneous health data sources, which can vary from structured to unstructured and include patient-generated social data as well as authoritative data. An information seeker has to spend time selecting information from many websites and integrating it into a coherent mental model. An integrated health data model is designed to allow accommodating data features from different sources. The model utilizes semantic linked data for lightweight integration and allows a set of analytics and inferences over data sources. A prototype analytical and reasoning tool called “Social InfoButtons” that can be linked from existing EHR systems is developed to allow doctors to understand and take into consideration the behaviors, patterns or trends of patients’ healthcare practices during a patient’s care. The tool can also shed insights for public health officials to make better-informed policy decisions. The second objective is near-real time monitoring of disease outbreaks using social media. The research for epidemics detection based on search query terms entered by millions of users is limited by the fact that query terms are not easily accessible by non-affiliated researchers. Publically available Twitter data is exploited to develop the Epidemics Outbreak and Spread Detection System (EOSDS). EOSDS provides four visual analytics tools for monitoring epidemics, i.e., Instance Map, Distribution Map, Filter Map, and Sentiment Trend to investigate public health threats in space and time. The third objective is to capture, analyze and quantify public health concerns through sentiment classifications on Twitter data. For traditional public health surveillance systems, it is hard to detect and monitor health related concerns and changes in public attitudes to health-related issues, due to their expenses and significant time delays. A two-step sentiment classification model is built to measure the concern. In the first step, Personal tweets are distinguished from Non-Personal tweets. In the second step, Personal Negative tweets are further separated from Personal Non-Negative tweets. In the proposed classification, training data is labeled by an emotion-oriented, clue-based method, and three Machine Learning models are trained and tested. Measure of Concern (MOC) is computed based on the number of Personal Negative sentiment tweets. A timeline trend of the MOC is also generated to monitor public concern levels, which is important for health emergency resource allocations and policy making. The fourth objective is predicting medical condition incidence and progression trajectories by using patients’ self-reported data on PatientsLikeMe. Some medical conditions are correlated with each other to a measureable degree (“comorbidities”). A prediction model is provided to predict the comorbidities and rank future conditions by their likelihood and to predict the possible progression trajectories given an observed medical condition. The novel models for trajectory prediction of medical conditions are validated to cover the comorbidities reported in the medical literature

    Knowledge-aware Assessment of Severity of Suicide Risk for Early Intervention

    Get PDF
    Mental health illness such as depression is a significant risk factor for suicide ideation, behaviors, and attempts. A report by Substance Abuse and Mental Health Services Administration (SAMHSA) shows that 80% of the patients suffering from Borderline Personality Disorder (BPD) have suicidal behavior, 5-10% of whom commit suicide. While multiple initiatives have been developed and implemented for suicide prevention, a key challenge has been the social stigma associated with mental disorders, which deters patients from seeking help or sharing their experiences directly with others including clinicians. This is particularly true for teenagers and younger adults where suicide is the second highest cause of death in the US. Prior research involving surveys and questionnaires (e.g. PHQ-9) for suicide risk prediction failed to provide a quantitative assessment of risk that informed timely clinical decision-making for intervention. Our interdisciplinary study concerns the use of Reddit as an unobtrusive data source for gleaning information about suicidal tendencies and other related mental health conditions afflicting depressed users. We provide details of our learning framework that incorporates domain-specific knowledge to predict the severity of suicide risk for an individual. Our approach involves developing a suicide risk severity lexicon using medical knowledge bases and suicide ontology to detect cues relevant to suicidal thoughts and actions. We also use language modeling, medical entity recognition and normalization and negation detection to create a dataset of 2181 redditors that have discussed or implied suicidal ideation, behavior, or attempt. Given the importance of clinical knowledge, our gold standard dataset of 500 redditors (out of 2181) was developed by four practicing psychiatrists following the guidelines outlined in Columbia Suicide Severity Rating Scale (C-SSRS), with the pairwise annotator agreement of 0.79 and group-wise agreement of 0.73. Compared to the existing four-label classification scheme (no risk, low risk, moderate risk, and high risk), our proposed C-SSRS-based 5-label classification scheme distinguishes people who are supportive, from those who show different severity of suicidal tendency. Our 5-label classification scheme outperforms the state-of-the-art schemes by improving the graded recall by 4.2% and reducing the perceived risk measure by 12.5%. Convolutional neural network (CNN) provided the best performance in our scheme due to the discriminative features and use of domain-specific knowledge resources, in comparison to SVM-L that has been used in the state-of-the-art tools over similar dataset

    Knowledge-aware Assessment of Severity of Suicide Risk for Early Intervention

    Get PDF
    Mental health illness such as depression is a significant risk factor for suicide ideation, behaviors, and attempts. A report by Substance Abuse and Mental Health Services Administration (SAMHSA) shows that 80% of the patients suffering from Borderline Personality Disorder (BPD) have suicidal behavior, 5-10% of whom commit suicide. While multiple initiatives have been developed and implemented for suicide prevention, a key challenge has been the social stigma associated with mental disorders, which deters patients from seeking help or sharing their experiences directly with others including clinicians. This is particularly true for teenagers and younger adults where suicide is the second highest cause of death in the US. Prior research involving surveys and questionnaires (e.g. PHQ-9) for suicide risk prediction failed to provide a quantitative assessment of risk that informed timely clinical decision-making for intervention. Our interdisciplinary study concerns the use of Reddit as an unobtrusive data source for gleaning information about suicidal tendencies and other related mental health conditions afflicting depressed users. We provide details of our learning framework that incorporates domain-specific knowledge to predict the severity of suicide risk for an individual. Our approach involves developing a suicide risk severity lexicon using medical knowledge bases and suicide ontology to detect cues relevant to suicidal thoughts and actions. We also use language modeling, medical entity recognition and normalization and negation detection to create a dataset of 2181 redditors that have discussed or implied suicidal ideation, behavior, or attempt. Given the importance of clinical knowledge, our gold standard dataset of 500 redditors (out of 2181) was developed by four practicing psychiatrists following the guidelines outlined in Columbia Suicide Severity Rating Scale (C-SSRS), with the pairwise annotator agreement of 0.79 and group-wise agreement of 0.73. Compared to the existing four-label classification scheme (no risk, low risk, moderate risk, and high risk), our proposed C-SSRS-based 5-label classification scheme distinguishes people who are supportive, from those who show different severity of suicidal tendency. Our 5-label classification scheme outperforms the state-of-the-art schemes by improving the graded recall by 4.2% and reducing the perceived risk measure by 12.5%. Convolutional neural network (CNN) provided the best performance in our scheme due to the discriminative features and use of domain-specific knowledge resources, in comparison to SVM-L that has been used in the state-of-the-art tools over similar dataset

    How dramatic events can affect emotionality in social posting: The impact of covid-19 on reddit

    Get PDF
    The COVID-19 outbreak impacted almost all the aspects of ordinary life. In this context, social networks quickly started playing the role of a sounding board for the content produced by people. Studying how dramatic events affect the way people interact with each other and react to poorly known situations is recognized as a relevant research task. Since automatically identifying country-based COVID-19 social posts on generalized social networks, like Twitter and Facebook, is a difficult task, in this work we concentrate on Reddit megathreads, which provide a unique opportunity to study focused reactions of people by both topic and country. We analyze specific reactions and we compare them with a “normal” period, not affected by the pandemic; in particular, we consider structural variations in social posting behavior, emotional reactions under the Plutchik model of basic emotions, and emotional reactions under unconventional emotions, such as skepticism, particularly relevant in the COVID-19 context

    An Investigation of Autism Support Groups on Facebook

    Get PDF
    Autism-affected users, such as autism patients, caregivers, parents, family members, and researchers, currently seek informational support and social support from communities on social media. To reveal the information needs of autism- affected users, this study centers on the research of users’ interactions and information sharing within autism communities on social media. It aims to understand how autism-affected users utilize support groups on Facebook. A systematic method was proposed to aid in the data analysis including social network analysis, topic modeling, sentiment analysis, and inferential analysis. Social network analysis method was adopted to reveal the interaction patterns appearing in the groups, and topic modeling method was employed to uncover the discussion themes that users were concerned with in their daily lives. Sentiment analysis method helped analyze the emotional characteristics of the content that users expressed in the groups. Inferential analysis method was applied to compare the similarities and differences among different autism support groups found on Facebook. This study collected user-generated content from five sampled support groups (an awareness group, a treatment group, a parents group, a research group, and a local support group) on Facebook. Findings show that the discussion topics varied in different groups. Influential users in each Facebook support group were identified through the analysis of the interaction network. The results indicated that the influential users not only attracted more attention from other group members but also led the discussion topics in the group. In addition, it was examined that autism support groups on Facebook offered a supportive emotional atmosphere for group members. The findings of this study revealed the characteristics of user interactions and information exchanges in autism support groups on social media. Theoretically, the findings demonstrated the significance of social media for autism users. The unique implication of this study is to identify support groups on Facebook as a source of informational, social, and emotional support for autism-related users. The methodology applied in this study presented a systematic approach to evaluating the information exchange in health-related support groups on social media. Further, it investigated the potential role of technology in the social lives of autism-related users. The outcomes of this study can contribute to improving online intervention programs by highlighting effective communication approaches

    Health Misinformation in Search and Social Media

    Get PDF
    People increasingly rely on the Internet in order to search for and share health-related information. Indeed, searching for and sharing information about medical treatments are among the most frequent uses of online data. While this is a convenient and fast method to collect information, online sources may contain incorrect information that has the potential to cause harm, especially if people believe what they read without further research or professional medical advice. The goal of this thesis is to address the misinformation problem in two of the most commonly used online services: search engines and social media platforms. We examined how people use these platforms to search for and share health information. To achieve this, we designed controlled laboratory user studies and employed large-scale social media data analysis tools. The solutions proposed in this thesis can be used to build systems that better support people's health-related decisions. The techniques described in this thesis addressed online searching and social media sharing in the following manner. First, with respect to search engines, we aimed to determine the extent to which people can be influenced by search engine results when trying to learn about the efficacy of various medical treatments. We conducted a controlled laboratory study wherein we biased the search results towards either correct or incorrect information. We then asked participants to determine the efficacy of different medical treatments. Results showed that people were significantly influenced both positively and negatively by search results bias. More importantly, when the subjects were exposed to incorrect information, they made more incorrect decisions than when they had no interaction with the search results. Following from this work, we extended the study to gain insights into strategies people use during this decision-making process, via the think-aloud method. We found that, even with verbalization, people were strongly influenced by the search results bias. We also noted that people paid attention to what the majority states, authoritativeness, and content quality when evaluating online content. Understanding the effects of cognitive biases that can arise during online search is a complex undertaking because of the presence of unconscious biases (such as the search results ranking) that the think-aloud method fails to show. Moving to social media, we first proposed a solution to detect and track misinformation in social media. Using Zika as a case study, we developed a tool for tracking misinformation on Twitter. We collected 13 million tweets regarding the Zika outbreak and tracked rumors outlined by the World Health Organization and the Snopes fact-checking website. We incorporated health professionals, crowdsourcing, and machine learning to capture health-related rumors as well as clarification communications. In this way, we illustrated insights that the proposed tools provide into potentially harmful information on social media, allowing public health researchers and practitioners to respond with targeted and timely action. From identifying rumor-bearing tweets, we examined individuals on social media who are posting questionable health-related information, in particular those promoting cancer treatments that have been shown to be ineffective. Specifically, we studied 4,212 Twitter users who have posted about one of 139 ineffective ``treatments'' and compared them to a baseline of users generally interested in cancer. Considering features that capture user attributes, writing style, and sentiment, we built a classifier that is able to identify users prone to propagating such misinformation. This classifier achieved an accuracy of over 90%, providing a potential tool for public health officials to identify such individuals for preventive intervention

    Automatic Detection of Emotions and Distress in Textual Data

    Get PDF
    Online data can be analyzed for many purposes, including the prediction of stock market, business, and political planning. Online data can also be used to develop systems for the automatic emotion detection and mental health assessment of users. These systems can be used as complementary measures in monitoring online forums by detecting users who are in need of attention. In this thesis, we first present a new approach for contextual emotion detection, i.e. emotion detection in short conversations. The approach is based on a neural feature extractor, composed of a recurrent neural network with an attention mechanism, followed by a final classifier, that can be neural or SVM-based. The results from our experiments showed that, by providing a higher and more robust performance, SVM can act as a better final classifier in comparison to a feed-forward neural network. We then extended our model for emotion detection, and created an ensemble approach for the task of distress detection from online data. This extended approach utilizes several attention-based neural sub-models to extract features and predict class probabilities, which are later used as input features to a Support Vector Machine (SVM) making the final classification. Our experiments show that using an ensemble approach which makes use different sub-models accessing diverse sources of information can improve classification in the absence of a large annotated dataset. The extended model was evaluated on two shared tasks, CLPsych and eRisk 2019, which aim at suicide risk assessment, and early risk detection of anorexia, respectively. The model ranked first in tasks A and C of CLPsych 2019 (with macro-average F1 scores of 0.481 and 0.268, respectively), and ranked first in the first task of eRisk 2019 in terms of F1 and latency-weighted F1 scores (0.71 and 0.69, respectively)

    Social Media Recruitment in Online Survey Research: A Systematic Literature Review

    Get PDF
    The growing percentage of the population on social media creates new and expanded op­portunities for survey researchers. Recently, a growing number of studies have been using social media to recruit survey respondents. Many social media platforms have powerful targeting capabilities that can be used to recruit even rare or hard-to-reach populations. However, thus far, the survey research literature lacks a comprehensive overview of poten­tials and limitations. This literature review aims 1) to provide an overview of the current literature on the use of social media as a recruitment tool, 2) to highlight the potential advantages and disadvantages for survey research, 3) to identify current research gaps, and finally, 4) to provide practical guidance for researchers interested in integrating social media recruitment into their research
    • 

    corecore