221 research outputs found

    A customisable pipeline for continuously harvesting socially-minded Twitter users

    Full text link
    On social media platforms and Twitter in particular, specific classes of users such as influencers have been given satisfactory operational definitions in terms of network and content metrics. Others, for instance online activists, are not less important but their characterisation still requires experimenting. We make the hypothesis that such interesting users can be found within temporally and spatially localised contexts, i.e., small but topical fragments of the network containing interactions about social events or campaigns with a significant footprint on Twitter. To explore this hypothesis, we have designed a continuous user profile discovery pipeline that produces an ever-growing dataset of user profiles by harvesting and analysing contexts from the Twitter stream. The profiles dataset includes key network and content-based users metrics, enabling experimentation with user-defined score functions that characterise specific classes of online users. The paper describes the design and implementation of the pipeline and its empirical evaluation on a case study consisting of healthcare-related campaigns in the UK, showing how it supports the operational definitions of online activism, by comparing three experimental ranking functions. The code is publicly available.Comment: Procs. ICWE 2019, June 2019, Kore

    Επιρροή στα Κοινωνικά Δίκτυα: Διερεύνηση των Οπτικών της και Ανάλυση Εργαλείων Υπολογισμού της

    Get PDF
    Η πτυχιακή αυτή εργασία αποσκοπεί στη μελέτη συστημάτων και αλγορίθμων που υπολογίζουν την επιρροή των χρηστών ή/και του περιεχομένου σε μέσα κοινωνικής δικτύωσης, καθώς και την υλοποίηση ενός νέου συστήματος υπολογισμού επιρροής για το Twitter [1]. Η επιρροή (influence) σαν έννοια μπορεί να έχει πολλές και διαφορετικές ερμηνείες. Ορισμένα συστήματα εκφράζουν την επιρροή ως τη δημοτικότητα (popularity). Ως προς αυτή την οπτική, επηρεάζων (influencer) χαρακτηρίζεται ένας χρήστης που διαθέτει μεγάλο αριθμό από ακόλουθους (followers). Σε άλλες περιπτώσεις η επιρροή ενός χρήστη σχετίζεται με το βαθμό δραστηριοποίησης άλλων χρηστών που μπορεί να προκαλέσει. Αντίστοιχα, ένα θέμα με επιρροή (influencing topic/content) σχετίζεται με τις δημοσιεύσεις (tweets) που αναφέρονται σε αυτό και οι οποίες έχουν σημειώσει μεγάλο αριθμό από likes και αναδημοσιεύσεις (retweets). Άλλα συστήματα θεωρούν ότι η επιρροή ενός θέματος συνδέεται άρρηκτα με το ενδιαφέρον που θα προκαλέσει στους χρήστες. Από τη μελέτη διαφόρων συστημάτων προκύπτει, ότι τα περισσότερα τείνουν να χρησιμοποιούν παρόμοιες παραμέτρους για τον υπολογισμό της επιρροής. Συγκεκριμένα, φαίνεται να απορρίπτεται η χρήση αποκλειστικά του αριθμού των followers για τον υπολογισμό της και να λαμβάνονται υπόψη χαρακτηριστικά, όπως ο αριθμός των likes, των retweets και σε κάποιες περιπτώσεις ο αριθμός των συνδέσμων (URLs) που διαθέτει ένα tweet, καθώς και το μέγεθος της ίδιας της δημοσίευσης. Μέχρι στιγμής, o αλγόριθμος που χρησιμοποιεί η πλατφόρμα του Twitter [1], για τον καθορισμό της επιρροής ενός χρήστη, κάνει χρήση μόνο του αριθμού των followers. Παρόλα αυτά, έχουν πραγματοποιηθεί αρκετές μελέτες και πειράματα από τα οποία προκύπτει ότι ένας τέτοιος αλγόριθμος δεν είναι τόσο αποδοτικός, όσο κάποιος που εξετάζει και τα χαρακτηριστικά που αναφέρθηκαν παραπάνω. Σκοπός του προτεινόμενου νέου συστήματος μέτρησης της επιρροής που υλοποιήθηκε είναι ο υπολογισμός της επιρροής ορισμένων ετικετών (hashtags) σχετικών με την υγεία (π.χ. #breastcancerawareness, #diabetes, #leukaemia κ.α.), καθώς και των tweets που περιλαμβάνουν αυτά τα hashtags και των χρηστών που τα δημοσίευσαν. Για την εύρεση της επιρροής ενός hashtag χρησιμοποιήθηκε ο αριθμός των tweets που το συμπεριλαμβάνουν καθώς και το σύνολο των likes και των retweets που αυτά έλαβαν. Για τον υπολογισμό της επιρροής ενός tweet σε σχέση με ένα hashtag, λήφθηκαν υπόψη ο αριθμός των likes και των retweets του, καθώς και οι παράμετροι που χρησιμοποιήθηκαν για τον υπολογισμό της επιρροής του hashtag. Η επιρροή ενός χρήστη, σε σχέση με ένα hashtag, προκύπτει από τη χρήση του αριθμού των retweets και των likes που έλαβαν οι δημοσιεύσεις του και οι οποίες περιλαμβάνουν το συγκεκριμένο hashtag, σε σχέση με τον αριθμό των retweets αντίστοιχα των likes όλων των tweets που το περιλαμβάνουν. Επιπλέον εξετάσθηκε και ο αριθμός των ακολούθων του χρήστη σε σχέση με τον αριθμό αυτών που ακολουθεί εκείνος (followees). Για τον κάθε τύπο χρησιμοποιήθηκαν και συντελεστές βαρύτητας. Για τον έλεγχο των αποτελεσμάτων πραγματοποιήθηκαν πειράματα με συντελεστές διαφορετικής βαρύτητας για τις παραμέτρους, καθώς και συγκρίσεις με άλλα συστήματα και αλγόριθμους που υπολογίζουν την επιρροή.The purpose of this dissertation is to study different systems and algorithms that calculate user and/or content influence in Social Networks, as well as to present the implementation of a new influence computation system for Twitter [1]. Influence can have various interpretations. Some of the existing systems that calculate influence, view it as the popularity. In this aspect, an influencer is a user that has a high number of followers. In other cases, influence is viewed in relation to the level of social activity that a user can stimulate. Similarly, an influencing topic or content is one that is being presented in many tweets, which have received numerous likes and retweets. Other systems consider that a content’s influence is linked to the interest that will cause to users. By studying various recommendation systems, we deduce that most of them tend to use similar parameters to calculate influence. More specifically, it seems that the usage of only the number of followers for the computation is rejected and characteristics like the number of likes of tweets, retweets, outlinks (URLs) and the length of the tweet are being considered. Up until now, the algorithm being used by the Twitter platform [1] in order to infer the user’s influence takes into account only his/her followers. However, many studies and experiments have shown that such an algorithm is not as efficient as one that also considers the aforementioned parameters. In this work, we propose a new system that was implemented in order to infer the influence of health related hashtags, such as #breastcancerawareness, #diabetes, #leukaemia etc., the tweets that contain them and the users that posted them. In this system, the information used for the hashtag’s influence calculation is the number of tweets that contain it and the number of likes and retweets that they received. For the tweet’s influence estimation in relation to a hashtag, the parameters used are the number of its likes and retweets, in combination with the above-mentioned parameters. Lastly, the outcome of a user’s influence, in relation to a specific hashtag, is related to the usage of the number of likes and retweets that his/her tweets (that contain the hashtag) received compared to the number of likes and retweets of all the tweets that contain the hashtag. In addition, the new system takes into consideration the number of the user’s followers and followees. Different weights used for each parameter. In order to evaluate the implemented algorithm, different weights were examined and comparisons were made with other influence calculation systems

    Mining Twitter for crisis management: realtime floods detection in the Arabian Peninsula

    Get PDF
    A thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of doctor of Philosophy.In recent years, large amounts of data have been made available on microblog platforms such as Twitter, however, it is difficult to filter and extract information and knowledge from such data because of the high volume, including noisy data. On Twitter, the general public are able to report real-world events such as floods in real time, and act as social sensors. Consequently, it is beneficial to have a method that can detect flood events automatically in real time to help governmental authorities, such as crisis management authorities, to detect the event and make decisions during the early stages of the event. This thesis proposes a real time flood detection system by mining Arabic Tweets using machine learning and data mining techniques. The proposed system comprises five main components: data collection, pre-processing, flooding event extract, location inferring, location named entity link, and flooding event visualisation. An effective method of flood detection from Arabic tweets is presented and evaluated by using supervised learning techniques. Furthermore, this work presents a location named entity inferring method based on the Learning to Search method, the results show that the proposed method outperformed the existing systems with significantly higher accuracy in tasks of inferring flood locations from tweets which are written in colloquial Arabic. For the location named entity link, a method has been designed by utilising Google API services as a knowledge base to extract accurate geocode coordinates that are associated with location named entities mentioned in tweets. The results show that the proposed location link method locate 56.8% of tweets with a distance range of 0 – 10 km from the actual location. Further analysis has shown that the accuracy in locating tweets in an actual city and region are 78.9% and 84.2% respectively

    Extracting Actionable Knowledge from Domestic Violence Discourses on Social Media

    Get PDF
    Domestic Violence (DV) is considered as big social issue and there exists a strong relationship between DV and health impacts of the public. Existing research studies have focused on social media to track and analyse real world events like emerging trends, natural disasters, user sentiment analysis, political opinions, and health care. However there is less attention given on social welfare issues like DV and its impact on public health. Recently, the victims of DV turned to social media platforms to express their feelings in the form of posts and seek the social and emotional support, for sympathetic encouragement, to show compassion and empathy among public. But, it is difficult to mine the actionable knowledge from large conversational datasets from social media due to the characteristics of high dimensions, short, noisy, huge volume, high velocity, and so on. Hence, this paper will propose a novel framework to model and discover the various themes related to DV from the public domain. The proposed framework would possibly provide unprecedentedly valuable information to the public health researchers, national family health organizations, government and public with data enrichment and consolidation to improve the social welfare of the community. Thus provides actionable knowledge by monitoring and analysing continuous and rich user generated content

    When Infodemic Meets Epidemic: a Systematic Literature Review

    Full text link
    Epidemics and outbreaks present arduous challenges requiring both individual and communal efforts. Social media offer significant amounts of data that can be leveraged for bio-surveillance. They also provide a platform to quickly and efficiently reach a sizeable percentage of the population, hence their potential impact on various aspects of epidemic mitigation. The general objective of this systematic literature review is to provide a methodical overview of the integration of social media in different epidemic-related contexts. Three research questions were conceptualized for this review, resulting in over 10000 publications collected in the first PRISMA stage, 129 of which were selected for inclusion. A thematic method-oriented synthesis was undertaken and identified 5 main themes related to social media enabled epidemic surveillance, misinformation management, and mental health. Findings uncover a need for more robust applications of the lessons learned from epidemic post-mortem documentation. A vast gap exists between retrospective analysis of epidemic management and result integration in prospective studies. Harnessing the full potential of social media in epidemic related tasks requires streamlining the results of epidemic forecasting, public opinion understanding and misinformation propagation, all while keeping abreast of potential mental health implications. Pro-active prevention has thus become vital for epidemic curtailment and containment

    Building a Test Collection for Significant-Event Detection in Arabic Tweets

    Get PDF
    With the increasing popularity of microblogging services like Twitter, researchers discov- ered a rich medium for tackling real-life problems like event detection. However, event detection in Twitter is often obstructed by the lack of public evaluation mechanisms such as test collections (set of tweets, labels, and queries to measure the eectiveness of an information retrieval system). The problem is more evident when non-English lan- guages, e.g., Arabic, are concerned. With the recent surge of signicant events in the Arab world, news agencies and decision makers rely on Twitters microblogging service to obtain recent information on events. In this thesis, we address the problem of building a test collection of Arabic tweets (named EveTAR) for the task of event detection. To build EveTAR, we rst adopted an adequate denition of an event, which is a signicant occurrence that takes place at a certain time. An occurrence is signicant if there are news articles about it. We collected Arabic tweets using Twitter's streaming API. Then, we identied a set of events from the Arabic data collection using Wikipedias current events portal. Corresponding tweets were extracted by querying the Arabic data collection with a set of manually-constructed queries. To obtain relevance judgments for those tweets, we leveraged CrowdFlower's crowdsourcing platform. Over a period of 4 weeks, we crawled over 590M tweets, from which we identied 66 events that cover 8 dierent categories and gathered more than 134k relevance judgments. Each event contains an average of 779 relevant tweets. Over all events, we got an average Kappa of 0.6, which is a substantially acceptable value. EveTAR was used to evalu- ate three state-of-the-art event detection algorithms. The best performing algorithms achieved 0.60 in F1 measure and 0.80 in both precision and recall. We plan to make our test collection available for research, including events description, manually-crafted queries to extract potentially-relevant tweets, and all judgments per tweet. EveTAR is the rst Arabic test collection built from scratch for the task of event detection. Addi- tionally, we show in our experiments that it supports other tasks like ad-hoc search
    corecore