30,527 research outputs found

    Social media for crisis management: clustering approaches for sub-event detection

    Get PDF
    Social media is getting increasingly important for crisis management, as it enables the public to provide information in different forms: text, image and video which can be valuable for crisis management. Such information is usually spatial and time-oriented, useful for understanding the emergency needs, performing decision making and supporting learning/training after the emergency. Due to the huge amount of data gathered during a crisis, automatic processing of the data is needed to support crisis management. One way of automating the process is to uncover sub-events (i.e., special hotspots) in the data collected from social media to enable better understanding of the crisis. We propose in the present paper clustering approaches for sub-event detection that operate on Flickr and YouTube data since multimedia data is of particular importance to understand the situation. Different clustering algorithms are assessed using the textual annotations (i.e., title, tags and description) and additional metadata information, like time and location. The empirical study shows in particular that social multimedia combined with clustering in the context of crisis management is worth using for detecting sub-events. It serves to integrate social media into crisis management without cumbersome manual monitoring

    Engineering Crowdsourced Stream Processing Systems

    Full text link
    A crowdsourced stream processing system (CSP) is a system that incorporates crowdsourced tasks in the processing of a data stream. This can be seen as enabling crowdsourcing work to be applied on a sample of large-scale data at high speed, or equivalently, enabling stream processing to employ human intelligence. It also leads to a substantial expansion of the capabilities of data processing systems. Engineering a CSP system requires the combination of human and machine computation elements. From a general systems theory perspective, this means taking into account inherited as well as emerging properties from both these elements. In this paper, we position CSP systems within a broader taxonomy, outline a series of design principles and evaluation metrics, present an extensible framework for their design, and describe several design patterns. We showcase the capabilities of CSP systems by performing a case study that applies our proposed framework to the design and analysis of a real system (AIDR) that classifies social media messages during time-critical crisis events. Results show that compared to a pure stream processing system, AIDR can achieve a higher data classification accuracy, while compared to a pure crowdsourcing solution, the system makes better use of human workers by requiring much less manual work effort

    Scalable distributed event detection for Twitter

    Get PDF
    Social media streams, such as Twitter, have shown themselves to be useful sources of real-time information about what is happening in the world. Automatic detection and tracking of events identified in these streams have a variety of real-world applications, e.g. identifying and automatically reporting road accidents for emergency services. However, to be useful, events need to be identified within the stream with a very low latency. This is challenging due to the high volume of posts within these social streams. In this paper, we propose a novel event detection approach that can both effectively detect events within social streams like Twitter and can scale to thousands of posts every second. Through experimentation on a large Twitter dataset, we show that our approach can process the equivalent to the full Twitter Firehose stream, while maintaining event detection accuracy and outperforming an alternative distributed event detection system

    Crowdsourced Rumour Identification During Emergencies

    Get PDF
    When a significant event occurs, many social media users leverage platforms such as Twitter to track that event. Moreover, emergency response agencies are increasingly looking to social media as a source of real-time information about such events. However, false information and rumours are often spread during such events, which can influence public opinion and limit the usefulness of social media for emergency management. In this paper, we present an initial study into rumour identification during emergencies using crowdsourcing. In particular, through an analysis of three tweet datasets relating to emergency events from 2014, we propose a taxonomy of tweets relating to rumours. We then perform a crowdsourced labeling experiment to determine whether crowd assessors can identify rumour-related tweets and where such labeling can fail. Our results show that overall, agreement over the tweet labels produced were high (0.7634 Fleiss Kappa), indicating that crowd-based rumour labeling is possible. However, not all tweets are of equal difficulty to assess. Indeed, we show that tweets containing disputed/controversial information tend to be some of the most difficult to identify

    Εύρεση Υπό-Γεγονότων Χρησιμοποιώντας Μέσα Κοινωνικής Δικτύωσης

    Get PDF
    Η παρούσα πτυχιακή εργασία είναι βασισμένη στη δημοσίευση (paper) “Automatic SubEvent Detection in Emergency Management using Social Media” και σκοπός της είναι να μελετήσει πειραματικά τα διάφορα στάδια υλοποίησης ενός μηχανισμού αυτόματης εύρεσης υπό-γεγονότων μέσα σε ένα αρχικό γεγονός, χρησιμοποιώντας μέσα κοινωνικά δικτύωσης όπως περιγράφονται στη δημοσίευση, κάνοντας ωστόσο ορισμένες διαφοροποιήσεις. Ο μηχανισμός αυτός αποτελείται από τα εξής στάδια: εύρεση δεδομένων, προ-επεξεργασία (pre-processing) δεδομένων, συσταδοποίηση (clustering) και ανάλυση των τελικών συστάδων (clusters)-αποτελεσμάτων. Τα δεδομένα που θα χρησιμοποιήσουμε, θα τα λάβουμε από τη μεγαλύτερη πλατφόρμα κοινωνικής δικτύωσης, το Twitter , τα οποία δεν θα είναι άλλα από τα λεγόμενα tweets που έχουν κάνει διάφοροι χρήστες σε ένα καθορισμένο χρονικό διάστημα. Στη συνέχεια, θα εισάγουμε τα δεδομένα αυτά στο εργαλείο (tool) WEKA και θα κάνουμε μια προ-επεξεργασία, εφαρμόζοντας μια σειρά ενεργειών, για να τα φέρουμε στη μορφή που θέλουμε. Έπειτα, θα προχωρήσουμε σε συσταδοποίηση των δεδομένων, χρησιμοποιώντας τον αλγόριθμο k-means και τέλος σε εξαγωγή των αποτελεσμάτων για ανάλυση. Θα υπάρχουν κάποιες μικρές διαφορές σε σχέση με τη δημοσίευση που αναφέρεται παραπάνω, οι οποίες αφορούν κυρίως την πηγή των δεδομένων και τον αλγόριθμο συσταδοποίησης. Συγκεκριμένα, στη δημοσίευση χρησιμοποιούνται δεδομένα από τις πλατφόρμες YouTube και Flickr σε αντίθεση με το Twitter που επιλέξαμε εμείς, ενώ ο αλγόριθμος συσταδοποίησης που χρησιμοποιούμε είναι ο k-means σε αντίθεση με τον SOM (Self Organizing Map). Παρά τις διαφοροποιήσεις αυτές, θα παρατηρήσουμε έπ(ειτα από πειραματική μελέτη, ότι τα αποτελέσματα που παράγονται, πλησιάζουν σε μεγάλο βαθμό εκείνα της δημοσίευσης, που σημαίνει ότι μέσα από κάποιο επείγον γεγονός, μπορούμε να χρησιμοποιήσουμε δεδομένα από μεγάλες πλατφόρμες προκειμένου να εντοπίσουμε μικρότερα σημαντικά γεγονότα και να αντιδράσουμε σε αυτά.This bachelor thesis is based on the “Automatic Sub-Event Detection in Emergency Management using Social Media” paper and its purpose is to conduct experimental studies on the various stages of a mechanism implementation that automatically detects sub-events when a large event occurs, using social media the way it is described on the paper, but with few alterations. This mechanism consists of the following stages: data collection, data preprocessing, clustering and analysis of the final clusters-results. The data that we will use, are collected by the biggest social media platform, Twitter, and consist of various tweets, generated by the users of the platform within a specific period of time. Next, we are going to import our data into the WEKA tool and preprocess it until we reach the appropriate data-form that we need. After the data preprocessing is completed, we will execute the K-means clustering algorithm and then export the clusters-results so we can later analyze them. There will be a few differences with the paper, mostly the way we collect the data and execute the clustering algorithm. Specifically, in the paper, data is collected from social media platforms like Flickr and YouTube and not from Twitter, while the clustering algorithm that is being used is the Self Organizing Map algorithm and not K-Means. Despite those differences, we will notice after experimental studies that our results are similar to the results that are presented in the paper, which means that when a big event occurs, we can use data from social media platforms to detect sub-events and react to them
    corecore