305 research outputs found

    Developing a Prototype System for Syndromic Surveillance and Visualization Using Social Media Data.

    Get PDF
    Syndromic surveillance of emerging diseases is crucial for timely planning and execution of epidemic response from both local and global authorities. Traditional sources of information employed by surveillance systems are not only slow but also impractical for developing countries. Internet and social media provide a free source of a large amount of data which can be utilized for Syndromic surveillance. We propose developing a prototype system for gathering, storing, filtering and presenting data collected from Twitter (a popular social media platform). Since social media data is inherently noisy we describe ways to preprocess the gathered data and utilize SVM (Support Vector Machine) to identify tweets relating to influenza like symptoms. The filtered data is presented in a web application, which allows the user to explore the underlying data in both spatial and temporal dimensions

    Global disease monitoring and forecasting with Wikipedia

    Full text link
    Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data such as social media and search queries are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with r2r^2 up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein and adjust novelty claims accordingly; revise title; various revisions for clarit

    Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance

    Get PDF
    We investigate the use of Twitter data to deliver signals for syndromic surveillance in order to assess its ability to augment existing syndromic surveillance efforts and give a better understanding of symptomatic people who do not seek healthcare advice directly. We focus on a specific syndrome—asthma/difficulty breathing. We outline data collection using the Twitter streaming API as well as analysis and pre-processing of the collected data. Even with keyword-based data collection, many of the tweets collected are not be relevant because they represent chatter, or talk of awareness instead of an individual suffering a particular condition. In light of this, we set out to identify relevant tweets to collect a strong and reliable signal. For this, we investigate text classification techniques, and in particular we focus on semi-supervised classification techniques since they enable us to use more of the Twitter data collected while only doing very minimal labelling. In this paper, we propose a semi-supervised approach to symptomatic tweet classification and relevance filtering. We also propose alternative techniques to popular deep learning approaches. Additionally, we highlight the use of emojis and other special features capturing the tweet’s tone to improve the classification performance. Our results show that negative emojis and those that denote laughter provide the best classification performance in conjunction with a simple word-level n-gram approach. We obtain good performance in classifying symptomatic tweets with both supervised and semi-supervised algorithms and found that the proposed semi-supervised algorithms preserve more of the relevant tweets and may be advantageous in the context of a weak signal. Finally, we found some correlation (r = 0.414, p = 0.0004) between the Twitter signal generated with the semi-supervised system and data from consultations for related health conditions

    Identification of Consumer Adverse Drug Reaction Messages on Social Media

    Get PDF
    The prevalence of social media has resulted in spikes of data on the Internet which can have potential use to assist in many aspects of human life. One prospective use of the data is in the development of an early warning system to monitor consumer Adverse Drug Reactions (ADRs). The direct reporting of ADRs by consumers is playing an increasingly important role in the world of pharmacovigilance. Social media provides patients a platform to exchange their experiences regarding the use of certain drugs. However, the messages posted on those social media networks contain both ADR related messages (positive examples) and non-ADR related messages (negative examples). In this paper, we integrate text mining and partially supervised learning methods to automatically extract and classify messages posted on social media networks into positive and negative examples. Our findings can provide managerial insights into how social media analytics can improve not only postmarketing surveillance, but also other problem domains where large quantity of user-generated content is available
    • …
    corecore