Search CORE

221 research outputs found

A customisable pipeline for continuously harvesting socially-minded Twitter users

Author: C Bobel
DN Fisher
F Riquelme
G Lotan
I Bizid
L Sousa
LA Overbey
M Kardara
M Rosvall
N Booth
P Bonacich
P Missier
T Poell
WL Youmans
WX Zhao
Publication venue
Publication date: 01/01/2019
Field of study

On social media platforms and Twitter in particular, specific classes of users such as influencers have been given satisfactory operational definitions in terms of network and content metrics. Others, for instance online activists, are not less important but their characterisation still requires experimenting. We make the hypothesis that such interesting users can be found within temporally and spatially localised contexts, i.e., small but topical fragments of the network containing interactions about social events or campaigns with a significant footprint on Twitter. To explore this hypothesis, we have designed a continuous user profile discovery pipeline that produces an ever-growing dataset of user profiles by harvesting and analysing contexts from the Twitter stream. The profiles dataset includes key network and content-based users metrics, enabling experimentation with user-defined score functions that characterise specific classes of online users. The paper describes the design and implementation of the pipeline and its empirical evaluation on a case study consisting of healthcare-related campaigns in the UK, showing how it supports the operational definitions of online activism, by comparing three experimental ranking functions. The code is publicly available.Comment: Procs. ICWE 2019, June 2019, Kore

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Επιρροή στα Κοινωνικά Δίκτυα: Διερεύνηση των Οπτικών της και Ανάλυση Εργαλείων Υπολογισμού της

Author: ATHANASAKOU ANTONIA
ΑΘΑΝΑΣΑΚΟΥ ΑΝΤΩΝΙΑ
Publication venue
Publication date: 01/01/2019
Field of study

Η πτυχιακή αυτή εργασία αποσκοπεί στη μελέτη συστημάτων και αλγορίθμων που υπολογίζουν την επιρροή των χρηστών ή/και του περιεχομένου σε μέσα κοινωνικής δικτύωσης, καθώς και την υλοποίηση ενός νέου συστήματος υπολογισμού επιρροής για το Twitter [1]. Η επιρροή (influence) σαν έννοια μπορεί να έχει πολλές και διαφορετικές ερμηνείες. Ορισμένα συστήματα εκφράζουν την επιρροή ως τη δημοτικότητα (popularity). Ως προς αυτή την οπτική, επηρεάζων (influencer) χαρακτηρίζεται ένας χρήστης που διαθέτει μεγάλο αριθμό από ακόλουθους (followers). Σε άλλες περιπτώσεις η επιρροή ενός χρήστη σχετίζεται με το βαθμό δραστηριοποίησης άλλων χρηστών που μπορεί να προκαλέσει. Αντίστοιχα, ένα θέμα με επιρροή (influencing topic/content) σχετίζεται με τις δημοσιεύσεις (tweets) που αναφέρονται σε αυτό και οι οποίες έχουν σημειώσει μεγάλο αριθμό από likes και αναδημοσιεύσεις (retweets). Άλλα συστήματα θεωρούν ότι η επιρροή ενός θέματος συνδέεται άρρηκτα με το ενδιαφέρον που θα προκαλέσει στους χρήστες. Από τη μελέτη διαφόρων συστημάτων προκύπτει, ότι τα περισσότερα τείνουν να χρησιμοποιούν παρόμοιες παραμέτρους για τον υπολογισμό της επιρροής. Συγκεκριμένα, φαίνεται να απορρίπτεται η χρήση αποκλειστικά του αριθμού των followers για τον υπολογισμό της και να λαμβάνονται υπόψη χαρακτηριστικά, όπως ο αριθμός των likes, των retweets και σε κάποιες περιπτώσεις ο αριθμός των συνδέσμων (URLs) που διαθέτει ένα tweet, καθώς και το μέγεθος της ίδιας της δημοσίευσης. Μέχρι στιγμής, o αλγόριθμος που χρησιμοποιεί η πλατφόρμα του Twitter [1], για τον καθορισμό της επιρροής ενός χρήστη, κάνει χρήση μόνο του αριθμού των followers. Παρόλα αυτά, έχουν πραγματοποιηθεί αρκετές μελέτες και πειράματα από τα οποία προκύπτει ότι ένας τέτοιος αλγόριθμος δεν είναι τόσο αποδοτικός, όσο κάποιος που εξετάζει και τα χαρακτηριστικά που αναφέρθηκαν παραπάνω. Σκοπός του προτεινόμενου νέου συστήματος μέτρησης της επιρροής που υλοποιήθηκε είναι ο υπολογισμός της επιρροής ορισμένων ετικετών (hashtags) σχετικών με την υγεία (π.χ. #breastcancerawareness, #diabetes, #leukaemia κ.α.), καθώς και των tweets που περιλαμβάνουν αυτά τα hashtags και των χρηστών που τα δημοσίευσαν. Για την εύρεση της επιρροής ενός hashtag χρησιμοποιήθηκε ο αριθμός των tweets που το συμπεριλαμβάνουν καθώς και το σύνολο των likes και των retweets που αυτά έλαβαν. Για τον υπολογισμό της επιρροής ενός tweet σε σχέση με ένα hashtag, λήφθηκαν υπόψη ο αριθμός των likes και των retweets του, καθώς και οι παράμετροι που χρησιμοποιήθηκαν για τον υπολογισμό της επιρροής του hashtag. Η επιρροή ενός χρήστη, σε σχέση με ένα hashtag, προκύπτει από τη χρήση του αριθμού των retweets και των likes που έλαβαν οι δημοσιεύσεις του και οι οποίες περιλαμβάνουν το συγκεκριμένο hashtag, σε σχέση με τον αριθμό των retweets αντίστοιχα των likes όλων των tweets που το περιλαμβάνουν. Επιπλέον εξετάσθηκε και ο αριθμός των ακολούθων του χρήστη σε σχέση με τον αριθμό αυτών που ακολουθεί εκείνος (followees). Για τον κάθε τύπο χρησιμοποιήθηκαν και συντελεστές βαρύτητας. Για τον έλεγχο των αποτελεσμάτων πραγματοποιήθηκαν πειράματα με συντελεστές διαφορετικής βαρύτητας για τις παραμέτρους, καθώς και συγκρίσεις με άλλα συστήματα και αλγόριθμους που υπολογίζουν την επιρροή.The purpose of this dissertation is to study different systems and algorithms that calculate user and/or content influence in Social Networks, as well as to present the implementation of a new influence computation system for Twitter [1]. Influence can have various interpretations. Some of the existing systems that calculate influence, view it as the popularity. In this aspect, an influencer is a user that has a high number of followers. In other cases, influence is viewed in relation to the level of social activity that a user can stimulate. Similarly, an influencing topic or content is one that is being presented in many tweets, which have received numerous likes and retweets. Other systems consider that a content’s influence is linked to the interest that will cause to users. By studying various recommendation systems, we deduce that most of them tend to use similar parameters to calculate influence. More specifically, it seems that the usage of only the number of followers for the computation is rejected and characteristics like the number of likes of tweets, retweets, outlinks (URLs) and the length of the tweet are being considered. Up until now, the algorithm being used by the Twitter platform [1] in order to infer the user’s influence takes into account only his/her followers. However, many studies and experiments have shown that such an algorithm is not as efficient as one that also considers the aforementioned parameters. In this work, we propose a new system that was implemented in order to infer the influence of health related hashtags, such as #breastcancerawareness, #diabetes, #leukaemia etc., the tweets that contain them and the users that posted them. In this system, the information used for the hashtag’s influence calculation is the number of tweets that contain it and the number of likes and retweets that they received. For the tweet’s influence estimation in relation to a hashtag, the parameters used are the number of its likes and retweets, in combination with the above-mentioned parameters. Lastly, the outcome of a user’s influence, in relation to a specific hashtag, is related to the usage of the number of likes and retweets that his/her tweets (that contain the hashtag) received compared to the number of likes and retweets of all the tweets that contain the hashtag. In addition, the new system takes into consideration the number of the user’s followers and followees. Different weights used for each parameter. In order to evaluate the implemented algorithm, different weights were examined and comparisons were made with other influence calculation systems

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Mining Twitter for crisis management: realtime floods detection in the Arabian Peninsula

Author: Alabbas Waleed
Publication venue: University of Bedfordshire
Publication date: 01/04/2018
Field of study

A thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of doctor of Philosophy.In recent years, large amounts of data have been made available on microblog platforms such as Twitter, however, it is difficult to filter and extract information and knowledge from such data because of the high volume, including noisy data. On Twitter, the general public are able to report real-world events such as floods in real time, and act as social sensors. Consequently, it is beneficial to have a method that can detect flood events automatically in real time to help governmental authorities, such as crisis management authorities, to detect the event and make decisions during the early stages of the event. This thesis proposes a real time flood detection system by mining Arabic Tweets using machine learning and data mining techniques. The proposed system comprises five main components: data collection, pre-processing, flooding event extract, location inferring, location named entity link, and flooding event visualisation. An effective method of flood detection from Arabic tweets is presented and evaluated by using supervised learning techniques. Furthermore, this work presents a location named entity inferring method based on the Learning to Search method, the results show that the proposed method outperformed the existing systems with significantly higher accuracy in tasks of inferring flood locations from tweets which are written in colloquial Arabic. For the location named entity link, a method has been designed by utilising Google API services as a knowledge base to extract accurate geocode coordinates that are associated with location named entities mentioned in tweets. The results show that the proposed location link method locate 56.8% of tweets with a distance range of 0 – 10 km from the actual location. Further analysis has shown that the accuracy in locating tweets in an actual city and region are 78.9% and 84.2% respectively

University of Bedfordshire Repository

Recommended from our members

Concerns expressed by Chinese social media users during the COVID-19 pandemic: Content analysis of sina weibo microblogging data

Author: Evans R
Wang J
Zhang W
Zhou Y
Zhu C
Publication venue: 'JMIR Publications Inc.'
Publication date: 26/11/2020
Field of study

© Junze Wang, Ying Zhou, Wei Zhang, Richard Evans, Chengyan Zhu. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 26.11.2020. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included. Background: The COVID-19 pandemic has created a global health crisis that is affecting economies and societies worldwide. During times of uncertainty and unexpected change, people have turned to social media platforms as communication tools and primary information sources. Platforms such as Twitter and Sina Weibo have allowed communities to share discussion and emotional support; they also play important roles for individuals, governments, and organizations in exchanging information and expressing opinions. However, research that studies the main concerns expressed by social media users during the pandemic is limited. Objective: The aim of this study was to examine the main concerns raised and discussed by citizens on Sina Weibo, the largest social media platform in China, during the COVID-19 pandemic. Methods: We used a web crawler tool and a set of predefined search terms (New Coronavirus Pneumonia, New Coronavirus, and COVID-19) to investigate concerns raised by Sina Weibo users. Textual information and metadata (number of likes, comments, retweets, publishing time, and publishing location) of microblog posts published between December 1, 2019, and July 32, 2020, were collected. After segmenting the words of the collected text, we used a topic modeling technique, latent Dirichlet allocation (LDA), to identify the most common topics posted by users. We analyzed the emotional tendencies of the topics, calculated the proportional distribution of the topics, performed user behavior analysis on the topics using data collected from the number of likes, comments, and retweets, and studied the changes in user concerns and differences in participation between citizens living in different regions of mainland China. Results: Based on the 203,191 eligible microblog posts collected, we identified 17 topics and grouped them into 8 themes. These topics were pandemic statistics, domestic epidemic, epidemics in other countries worldwide, COVID-19 treatments, medical resources, economic shock, quarantine and investigation, patients' outcry for help, work and production resumption, psychological influence, joint prevention and control, material donation, epidemics in neighboring countries, vaccine development, fueling and saluting antiepidemic action, detection, and study resumption. The mean sentiment was positive for 11 topics and negative for 6 topics. The topic with the highest mean of retweets was domestic epidemic, while the topic with the highest mean of likes was quarantine and investigation. Conclusions: Concerns expressed by social media users are highly correlated with the evolution of the global pandemic. During the COVID-19 pandemic, social media has provided a platform for Chinese government departments and organizations to better understand public concerns and demands. Similarly, social media has provided channels to disseminate information about epidemic prevention and has influenced public attitudes and behaviors. Government departments, especially those related to health, can create appropriate policies in a timely manner through monitoring social media platforms to guide public opinion and behavior during epidemics.This work has been partially supported by the National Natural Science Foundation of China (Award # 61602198) and the National Natural Science Foundation of China (Award # 72042016)

Brunel University Research Archive

Extracting Actionable Knowledge from Domestic Violence Discourses on Social Media

Author: O'Connor M
Subramani Sudha
Publication venue: 'European Alliance for Innovation n.o.'
Publication date: 01/05/2018
Field of study

Domestic Violence (DV) is considered as big social issue and there exists a strong relationship between DV and health impacts of the public. Existing research studies have focused on social media to track and analyse real world events like emerging trends, natural disasters, user sentiment analysis, political opinions, and health care. However there is less attention given on social welfare issues like DV and its impact on public health. Recently, the victims of DV turned to social media platforms to express their feelings in the form of posts and seek the social and emotional support, for sympathetic encouragement, to show compassion and empathy among public. But, it is difficult to mine the actionable knowledge from large conversational datasets from social media due to the characteristics of high dimensions, short, noisy, huge volume, high velocity, and so on. Hence, this paper will propose a novel framework to model and discover the various themes related to DV from the public domain. The proposed framework would possibly provide unprecedentedly valuable information to the public health researchers, national family health organizations, government and public with data enrichment and consolidation to improve the social welfare of the community. Thus provides actionable knowledge by monitoring and analysing continuous and rich user generated content

arXiv.org e-Print Archive

Directory of Open Access Journals

Victoria University Eprints Repository

When Infodemic Meets Epidemic: a Systematic Literature Review

Author: Asaad Chaimae
Baïna Karim
Ghogho Mounir
Khaouja Imane
Publication venue
Publication date: 03/10/2022
Field of study

Epidemics and outbreaks present arduous challenges requiring both individual and communal efforts. Social media offer significant amounts of data that can be leveraged for bio-surveillance. They also provide a platform to quickly and efficiently reach a sizeable percentage of the population, hence their potential impact on various aspects of epidemic mitigation. The general objective of this systematic literature review is to provide a methodical overview of the integration of social media in different epidemic-related contexts. Three research questions were conceptualized for this review, resulting in over 10000 publications collected in the first PRISMA stage, 129 of which were selected for inclusion. A thematic method-oriented synthesis was undertaken and identified 5 main themes related to social media enabled epidemic surveillance, misinformation management, and mental health. Findings uncover a need for more robust applications of the lessons learned from epidemic post-mortem documentation. A vast gap exists between retrospective analysis of epidemic management and result integration in prospective studies. Harnessing the full potential of social media in epidemic related tasks requires streamlining the results of epidemic forecasting, public opinion understanding and misinformation propagation, all while keeping abreast of potential mental health implications. Pro-active prevention has thus become vital for epidemic curtailment and containment

arXiv.org e-Print Archive

Building a Test Collection for Significant-Event Detection in Arabic Tweets

Author: Almerekhi Hind Ali
Publication venue
Publication date: 01/01/2016
Field of study

With the increasing popularity of microblogging services like Twitter, researchers discov- ered a rich medium for tackling real-life problems like event detection. However, event detection in Twitter is often obstructed by the lack of public evaluation mechanisms such as test collections (set of tweets, labels, and queries to measure the eectiveness of an information retrieval system). The problem is more evident when non-English lan- guages, e.g., Arabic, are concerned. With the recent surge of signicant events in the Arab world, news agencies and decision makers rely on Twitters microblogging service to obtain recent information on events. In this thesis, we address the problem of building a test collection of Arabic tweets (named EveTAR) for the task of event detection. To build EveTAR, we rst adopted an adequate denition of an event, which is a signicant occurrence that takes place at a certain time. An occurrence is signicant if there are news articles about it. We collected Arabic tweets using Twitter's streaming API. Then, we identied a set of events from the Arabic data collection using Wikipedias current events portal. Corresponding tweets were extracted by querying the Arabic data collection with a set of manually-constructed queries. To obtain relevance judgments for those tweets, we leveraged CrowdFlower's crowdsourcing platform. Over a period of 4 weeks, we crawled over 590M tweets, from which we identied 66 events that cover 8 dierent categories and gathered more than 134k relevance judgments. Each event contains an average of 779 relevant tweets. Over all events, we got an average Kappa of 0.6, which is a substantially acceptable value. EveTAR was used to evalu- ate three state-of-the-art event detection algorithms. The best performing algorithms achieved 0.60 in F1 measure and 0.80 in both precision and recall. We plan to make our test collection available for research, including events description, manually-crafted queries to extract potentially-relevant tweets, and all judgments per tweet. EveTAR is the rst Arabic test collection built from scratch for the task of event detection. Addi- tionally, we show in our experiments that it supports other tasks like ad-hoc search

Qatar University Institutional Repository