21 research outputs found
The Early Bird Catches The Term: Combining Twitter and News Data For Event Detection and Situational Awareness
Twitter updates now represent an enormous stream of information originating
from a wide variety of formal and informal sources, much of which is relevant
to real-world events. In this paper we adapt existing bio-surveillance
algorithms to detect localised spikes in Twitter activity corresponding to real
events with a high level of confidence. We then develop a methodology to
automatically summarise these events, both by providing the tweets which fully
describe the event and by linking to highly relevant news articles. We apply
our methods to outbreaks of illness and events strongly affecting sentiment. In
both case studies we are able to detect events verifiable by third party
sources and produce high quality summaries
Analisis Data Twitter: Ekstraksi dan Analisis Data G eospasial
Data geospasial pada media sosial Twitter dapat dimanfaatkan untuk mengetahui informasi spasial (lokasi) yang merupakan lokasi sumber munculnya persepsi publik terhadap sebuah isu di media sosial. Besarnya produksi data geospasial yang dihasilkan oleh Twitter memberikan peluang besar untuk dapat dimanfaatkan oleh berbagai pihak sehingga menghasilkan informasi yang lebih bernilai melalui proses Twitter Data Analytics. Proses pemanfaatan data geospasial Twitter dimulai dengan melakukan proses ekstraksi terhadap informasi spatial berupa titik koordinat pengguna Twitter. Titik koordinat pengguna Twitter didapatkan dari sharing location yang dilakukan oleh pengguna Twitter. Untuk mengekstrak dan menganalisis data geospasial pada Twitter dibutuhkan pengetahuan dan kerangka kerja tentang social media analytics (SMA). Pada penelitian ini dilakukan ekstraksi dan analisis data geospasial Twitter terhadap suatu isu publik yang sedang berkembang dan mengembangakan prototipe perangkat lunak yang digunakan untuk mendapatkan data geospasial yang ada pada Twitter. Proses ekstraksi dan analisis dilakukan melalui empat tahapan yaitu: proses penarikan data (crawling), penyimpanan (storing), analisis (analyzing), dan visualisasi (vizualizing). Penelitian ini bersifat exploratory yang terfokus pada pengembangan teknik ekstrasi dan analisis terhadap data geospasial twitte
Predicting User Engagement in Twitter with Collaborative Ranking
Collaborative Filtering (CF) is a core component of popular web-based
services such as Amazon, YouTube, Netflix, and Twitter. Most applications use
CF to recommend a small set of items to the user. For instance, YouTube
presents to a user a list of top-n videos she would likely watch next based on
her rating and viewing history. Current methods of CF evaluation have been
focused on assessing the quality of a predicted rating or the ranking
performance for top-n recommended items. However, restricting the recommender
system evaluation to these two aspects is rather limiting and neglects other
dimensions that could better characterize a well-perceived recommendation. In
this paper, instead of optimizing rating or top-n recommendation, we focus on
the task of predicting which items generate the highest user engagement. In
particular, we use Twitter as our testbed and cast the problem as a
Collaborative Ranking task where the rich features extracted from the metadata
of the tweets help to complement the transaction information limited to user
ids, item ids, ratings and timestamps. We learn a scoring function that
directly optimizes the user engagement in terms of nDCG@10 on the predicted
ranking. Experiments conducted on an extended version of the MovieTweetings
dataset, released as part of the RecSys Challenge 2014, show the effectiveness
of our approach.Comment: RecSysChallenge'14 at RecSys 2014, October 10, 2014, Foster City, CA,
US
Pulse of the Pandemic: Iterative Topic Filtering for Clinical Information Extraction from Social Media
The rapid evolution of the COVID-19 pandemic has underscored the need to
quickly disseminate the latest clinical knowledge during a public-health
emergency. One surprisingly effective platform for healthcare professionals
(HCPs) to share knowledge and experiences from the front lines has been social
media (for example, the "#medtwitter" community on Twitter). However,
identifying clinically-relevant content in social media without manual labeling
is a challenge because of the sheer volume of irrelevant data. We present an
unsupervised, iterative approach to mine clinically relevant information from
social media data, which begins by heuristically filtering for HCP-authored
texts and incorporates topic modeling and concept extraction with MetaMap. This
approach identifies granular topics and tweets with high clinical relevance
from a set of about 52 million COVID-19-related tweets from January to mid-June
2020. We also show that because the technique does not require manual labeling,
it can be used to identify emerging topics on a week-to-week basis. Our method
can aid in future public-health emergencies by facilitating knowledge transfer
among healthcare workers in a rapidly-changing information environment, and by
providing an efficient and unsupervised way of highlighting potential areas for
clinical research.Comment: 24 pages, 5 figures. To be published in the Journal of Biomedical
Informatic
Estimating county health statistics with twitter
Understanding the relationships among environment, behav-ior, and health is a core concern of public health researchers. While a number of recent studies have investigated the use of social media to track infectious diseases such as influenza, lit-tle work has been done to determine if other health concerns can be inferred. In this paper, we present a large-scale study of 27 health-related statistics, including obesity, health insur-ance coverage, access to healthy foods, and teen birth rates. We perform a linguistic analysis of the Twitter activity in the top 100 most populous counties in the U.S., and find a signifi-cant correlation with 6 of the 27 health statistics. When com-pared to traditional models based on demographic variables alone, we find that augmenting models with Twitter-derived information improves predictive accuracy for 20 of 27 statis-tics, suggesting that this new methodology can complement existing approaches
A novel approach to track public emotions related to epidemics in multilingual data
Emergence of new epidemic and re-appearance of older diseases causes great impact towards public health. Surveys based techniques which are costly and time-consuming are the most popular methods to measure information related to public health and used in decision making. Early monitoring of these epidemics helps in rapid decision making. Social media platforms provide rich source of information related to public health in forms of blogs, tweets, public posts etc., but these data is in unstructured form contains multiple languages words. This research focused on developing an automatic system for detecting public emotions related to epidemics in multilingual unstructured data to gain deeper understanding of public emotions and health related information. This approach gives timely information related to epidemics, corresponding symptoms, prevention techniques and awareness, which can help government and health agencies for rapid decision making. Experimental analysis of data set provides results that significantly beat the baseline term counting methods used for sentiment analysis
Analisis Data Twitter: Ekstraksi Dan Analisis Data G Eospasial
Data geospasial pada media sosial Twitter dapat dimanfaatkan untuk mengetahui informasi spasial (lokasi) yang merupakan lokasi sumber munculnya persepsi publik terhadap sebuah isu di media sosial. Besarnya produksi data geospasial yang dihasilkan oleh Twitter memberikan peluang besar untuk dapat dimanfaatkan oleh berbagai pihak sehingga menghasilkan informasi yang lebih bernilai melalui proses Twitter Data Analytics. Proses pemanfaatan data geospasial Twitter dimulai dengan melakukan proses ekstraksi terhadap informasi spatial berupa titik koordinat pengguna Twitter. Titik koordinat pengguna Twitter didapatkan dari sharing location yang dilakukan oleh pengguna Twitter. Untuk mengekstrak dan menganalisis data geospasial pada Twitter dibutuhkan pengetahuan dan kerangka kerja tentang social media analytics (SMA). Pada penelitian ini dilakukan ekstraksi dan analisis data geospasial Twitter terhadap suatu isu publik yang sedang berkembang dan mengembangakan prototipe perangkat lunak yang digunakan untuk mendapatkan data geospasial yang ada pada Twitter. Proses ekstraksi dan analisis dilakukan melalui empat tahapan yaitu: proses penarikan data (crawling), penyimpanan (storing), analisis (analyzing), dan visualisasi (vizualizing). Penelitian ini bersifat exploratory yang terfokus pada pengembangan teknik ekstrasi dan analisis terhadap data geospasial twitte