21 research outputs found

    The Early Bird Catches The Term: Combining Twitter and News Data For Event Detection and Situational Awareness

    Full text link
    Twitter updates now represent an enormous stream of information originating from a wide variety of formal and informal sources, much of which is relevant to real-world events. In this paper we adapt existing bio-surveillance algorithms to detect localised spikes in Twitter activity corresponding to real events with a high level of confidence. We then develop a methodology to automatically summarise these events, both by providing the tweets which fully describe the event and by linking to highly relevant news articles. We apply our methods to outbreaks of illness and events strongly affecting sentiment. In both case studies we are able to detect events verifiable by third party sources and produce high quality summaries

    Analisis Data Twitter: Ekstraksi dan Analisis Data G eospasial

    Get PDF
    Data geospasial pada media sosial Twitter dapat dimanfaatkan untuk mengetahui informasi spasial (lokasi) yang merupakan lokasi sumber munculnya persepsi publik terhadap sebuah isu di media sosial. Besarnya produksi data geospasial yang dihasilkan oleh Twitter memberikan peluang besar untuk dapat dimanfaatkan oleh berbagai pihak sehingga menghasilkan informasi yang lebih bernilai melalui proses Twitter Data Analytics. Proses pemanfaatan data geospasial Twitter dimulai dengan melakukan proses ekstraksi terhadap informasi spatial berupa titik koordinat pengguna Twitter. Titik koordinat pengguna Twitter didapatkan dari sharing location yang dilakukan oleh pengguna Twitter. Untuk mengekstrak dan menganalisis data geospasial pada Twitter dibutuhkan pengetahuan dan kerangka kerja tentang social media analytics (SMA). Pada penelitian ini dilakukan ekstraksi dan analisis data geospasial Twitter terhadap suatu isu publik yang sedang berkembang dan mengembangakan prototipe perangkat lunak yang digunakan untuk mendapatkan data geospasial yang ada pada Twitter. Proses ekstraksi dan analisis dilakukan melalui empat tahapan yaitu: proses penarikan data (crawling), penyimpanan (storing), analisis (analyzing), dan visualisasi (vizualizing). Penelitian ini bersifat exploratory yang terfokus pada pengembangan teknik ekstrasi dan analisis terhadap data geospasial twitte

    Predicting User Engagement in Twitter with Collaborative Ranking

    Get PDF

    Predicting User Engagement in Twitter with Collaborative Ranking

    Full text link
    Collaborative Filtering (CF) is a core component of popular web-based services such as Amazon, YouTube, Netflix, and Twitter. Most applications use CF to recommend a small set of items to the user. For instance, YouTube presents to a user a list of top-n videos she would likely watch next based on her rating and viewing history. Current methods of CF evaluation have been focused on assessing the quality of a predicted rating or the ranking performance for top-n recommended items. However, restricting the recommender system evaluation to these two aspects is rather limiting and neglects other dimensions that could better characterize a well-perceived recommendation. In this paper, instead of optimizing rating or top-n recommendation, we focus on the task of predicting which items generate the highest user engagement. In particular, we use Twitter as our testbed and cast the problem as a Collaborative Ranking task where the rich features extracted from the metadata of the tweets help to complement the transaction information limited to user ids, item ids, ratings and timestamps. We learn a scoring function that directly optimizes the user engagement in terms of nDCG@10 on the predicted ranking. Experiments conducted on an extended version of the MovieTweetings dataset, released as part of the RecSys Challenge 2014, show the effectiveness of our approach.Comment: RecSysChallenge'14 at RecSys 2014, October 10, 2014, Foster City, CA, US

    Pulse of the Pandemic: Iterative Topic Filtering for Clinical Information Extraction from Social Media

    Full text link
    The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the "#medtwitter" community on Twitter). However, identifying clinically-relevant content in social media without manual labeling is a challenge because of the sheer volume of irrelevant data. We present an unsupervised, iterative approach to mine clinically relevant information from social media data, which begins by heuristically filtering for HCP-authored texts and incorporates topic modeling and concept extraction with MetaMap. This approach identifies granular topics and tweets with high clinical relevance from a set of about 52 million COVID-19-related tweets from January to mid-June 2020. We also show that because the technique does not require manual labeling, it can be used to identify emerging topics on a week-to-week basis. Our method can aid in future public-health emergencies by facilitating knowledge transfer among healthcare workers in a rapidly-changing information environment, and by providing an efficient and unsupervised way of highlighting potential areas for clinical research.Comment: 24 pages, 5 figures. To be published in the Journal of Biomedical Informatic

    Estimating county health statistics with twitter

    Full text link
    Understanding the relationships among environment, behav-ior, and health is a core concern of public health researchers. While a number of recent studies have investigated the use of social media to track infectious diseases such as influenza, lit-tle work has been done to determine if other health concerns can be inferred. In this paper, we present a large-scale study of 27 health-related statistics, including obesity, health insur-ance coverage, access to healthy foods, and teen birth rates. We perform a linguistic analysis of the Twitter activity in the top 100 most populous counties in the U.S., and find a signifi-cant correlation with 6 of the 27 health statistics. When com-pared to traditional models based on demographic variables alone, we find that augmenting models with Twitter-derived information improves predictive accuracy for 20 of 27 statis-tics, suggesting that this new methodology can complement existing approaches

    A novel approach to track public emotions related to epidemics in multilingual data

    Get PDF
    Emergence of new epidemic and re-appearance of older diseases causes great impact towards public health. Surveys based techniques which are costly and time-consuming are the most popular methods to measure information related to public health and used in decision making. Early monitoring of these epidemics helps in rapid decision making. Social media platforms provide rich source of information related to public health in forms of blogs, tweets, public posts etc., but these data is in unstructured form contains multiple languages words. This research focused on developing an automatic system for detecting public emotions related to epidemics in multilingual unstructured data to gain deeper understanding of public emotions and health related information. This approach gives timely information related to epidemics, corresponding symptoms, prevention techniques and awareness, which can help government and health agencies for rapid decision making. Experimental analysis of data set provides results that significantly beat the baseline term counting methods used for sentiment analysis

    Analisis Data Twitter: Ekstraksi Dan Analisis Data G Eospasial

    Full text link
    Data geospasial pada media sosial Twitter dapat dimanfaatkan untuk mengetahui informasi spasial (lokasi) yang merupakan lokasi sumber munculnya persepsi publik terhadap sebuah isu di media sosial. Besarnya produksi data geospasial yang dihasilkan oleh Twitter memberikan peluang besar untuk dapat dimanfaatkan oleh berbagai pihak sehingga menghasilkan informasi yang lebih bernilai melalui proses Twitter Data Analytics. Proses pemanfaatan data geospasial Twitter dimulai dengan melakukan proses ekstraksi terhadap informasi spatial berupa titik koordinat pengguna Twitter. Titik koordinat pengguna Twitter didapatkan dari sharing location yang dilakukan oleh pengguna Twitter. Untuk mengekstrak dan menganalisis data geospasial pada Twitter dibutuhkan pengetahuan dan kerangka kerja tentang social media analytics (SMA). Pada penelitian ini dilakukan ekstraksi dan analisis data geospasial Twitter terhadap suatu isu publik yang sedang berkembang dan mengembangakan prototipe perangkat lunak yang digunakan untuk mendapatkan data geospasial yang ada pada Twitter. Proses ekstraksi dan analisis dilakukan melalui empat tahapan yaitu: proses penarikan data (crawling), penyimpanan (storing), analisis (analyzing), dan visualisasi (vizualizing). Penelitian ini bersifat exploratory yang terfokus pada pengembangan teknik ekstrasi dan analisis terhadap data geospasial twitte