182,027 research outputs found

    Preprocessing Techniques to Support Event Detection Data Fusion on Social Media Data

    Get PDF
    This thesis focuses on collection and preprocessing of streaming social media feeds for metadata as well as the visual and textual information. Today, news media has been the main source of immediate news events, large and small. However, the information conveyed on these news sources is delayed due to the lack of proximity and general knowledge of the event. Such news have started relying on social media sources for initial knowledge of these events. Previous works focused on captured textual data from social media as a data source to detect events. This preprocessing framework postures to facilitate the data fusion of images and text for event detection. Results from the preprocessing techniques explained in this work show the textual and visual data collected are able to be proceeded into a workable format for further processing. Moreover, the textual and visual data collected are transformed into bag-of-words vectors for future data fusion and event detection

    Information spreading during emergencies and anomalous events

    Full text link
    The most critical time for information to spread is in the aftermath of a serious emergency, crisis, or disaster. Individuals affected by such situations can now turn to an array of communication channels, from mobile phone calls and text messages to social media posts, when alerting social ties. These channels drastically improve the speed of information in a time-sensitive event, and provide extant records of human dynamics during and afterward the event. Retrospective analysis of such anomalous events provides researchers with a class of "found experiments" that may be used to better understand social spreading. In this chapter, we study information spreading due to a number of emergency events, including the Boston Marathon Bombing and a plane crash at a western European airport. We also contrast the different information which may be gleaned by social media data compared with mobile phone data and we estimate the rate of anomalous events in a mobile phone dataset using a proposed anomaly detection method.Comment: 19 pages, 11 figure

    Sensing real-world events using Arabic Twitter posts

    Get PDF
    In recent years, there has been increased interest in event detection using data posted to social media sites. Automatically transforming user-generated content into information relating to events is a challenging task due to the short informal language used within the content and the variety oftopics discussed on social media. Recent advances in detecting real-world events in English and other languages havebeen published. However, the detection of events in the Arabic language has been limited to date. To address this task, wepresent an end-to-end event detection framework which comprises six main components: data collection, pre-processing, classification, feature selection, topic clustering and summarization. Large-scale experiments over millions of Arabic Twitter messages show the effectiveness of our approach for detecting real-world event content from Twitter posts

    Scaling DBSCAN-like algorithms for event detection systems in Twitter

    Get PDF
    The increasing use of mobile social networks has lately transformed news media. Real-world events are nowadays reported in social networks much faster than in traditional channels. As a result, the autonomous detection of events from networks like Twitter has gained lot of interest in both research and media groups. DBSCAN-like algorithms constitute a well-known clustering approach to retrospective event detection. However, scaling such algorithms to geographically large regions and temporarily long periods present two major shortcomings. First, detecting real-world events from the vast amount of tweets cannot be performed anymore in a single machine. Second, the tweeting activity varies a lot within these broad space-time regions limiting the use of global parameters. Against this background, we propose to scale DBSCAN-like event detection techniques by parallelizing and distributing them through a novel density-aware MapReduce scheme. The proposed scheme partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. We implement the scheme in Apache Spark and evaluate its performance in a dataset composed of geo-located tweets in the Iberian peninsula during the course of several football matches. The results pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy.Peer ReviewedPostprint (author's final draft

    Event Detection from Social Media Stream: Methods, Datasets and Opportunities

    Full text link
    Social media streams contain large and diverse amount of information, ranging from daily-life stories to the latest global and local events and news. Twitter, especially, allows a fast spread of events happening real time, and enables individuals and organizations to stay informed of the events happening now. Event detection from social media data poses different challenges from traditional text and is a research area that has attracted much attention in recent years. In this paper, we survey a wide range of event detection methods for Twitter data stream, helping readers understand the recent development in this area. We present the datasets available to the public. Furthermore, a few research opportunitiesComment: 8 page

    Service quality monitoring in confined spaces through mining Twitter data

    Get PDF
    Promoting public transport depends on adapting effective tools for concurrent monitoring of perceived service quality. Social media feeds, in general, provide an opportunity to ubiquitously look for service quality events, but when applied to confined geographic area such as a transport node, the sparsity of concurrent social media data leads to two major challenges. Both the limited number of social media messages--leading to biased machine-learning--and the capturing of bursty events in the study period considerably reduce the effectiveness of general event detection methods. In contrast to previous work and to face these challenges, this paper presents a hybrid solution based on a novel fine-tuned BERT language model and aspect-based sentiment analysis. BERT enables extracting aspects from a limited context, where traditional methods such as topic modeling and word embedding fail. Moreover, leveraging aspect-based sentiment analysis improves the sensitivity of event detection. Finally, the efficacy of event detection is further improved by proposing a statistical approach to combine frequency-based and sentiment-based solutions. Experiments on a real-world case study demonstrate that the proposed solution improves the effectiveness of event detection compared to state-of-the-art approaches

    Embed2Detect: temporally clustered embedded words for event detection in social media

    Get PDF
    Social media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%

    Text Embedding-based Event Detection for Social and News Media

    Get PDF
    Today, social and news media are the leading platforms that distribute newsworthy content, and most internet users access them regularly to get information. However, due to the data’s unstructured nature and vast volume, manual analyses to extract information require enormous effort. Thus, automated intelligent mechanisms have become crucial. The literature presents several emerging approaches for social and news media event detection, along with distinct evolutions, mainly due to the variations in the media. However, most available social media event detection approaches primarily rely on data statistics, ignoring linguistics, making them vulnerable to information loss. Also, the available news media event detection approaches mostly fail to capture long-range text dependencies and support predictions of low-resource languages (i.e. languages with relatively fewer data). The possibility of utilising interconnections between different data levels to improve final predictions also has not been adequately explored. This research investigates how the characteristics of text embeddings built using prediction-based models that have proven capabilities to capture linguistics can be used in event detection while defeating available limitations. Initially, it redefines the problem of event detection based on two data granularities, coarse- and fine-grained levels, to allow systems to tackle different information requirements. Mainly, the coarse-grained level targets the notification of event occurrences and the fine-grained level targets the provision of event details. Following the new definition, this research proposes two novel approaches for coarse- and fine-grained level event detections on social media, Embed2Detect and WhatsUp, mainly utilising linguistics captured by self-learned word embeddings and their hierarchical relationships in dendrograms. For news media event detection, this proposes a TRansformer-based Event Document classification architecture (TRED) involving long-sequence and cross-lingual transformer encoders and a novel learning strategy, Two-phase Transfer Learning (TTL), supporting the capturing of long-range dependencies and data level interconnections. All the proposed approaches have been evaluated on recent real datasets, covering four aspects crucial for event detection: accuracy, efficiency, expandability and scalability. Social media data from two diverse domains and news media data from four high- and low-resource languages are mainly involved. The obtained results reveal that the proposed approaches outperform the state-of-the-art methods despite the data diversities, proving their accuracy and expandability. Additionally, the evaluations on efficiency and scalability adequately confirm the methods’ appropriateness for (near) real-time processing and ability to handle large data volumes. In summary, the achievement of all crucial requirements evidences the potential and utility of proposed approaches for event detection in social and news media

    Event detection in location-based social networks

    Get PDF
    With the advent of social networks and the rise of mobile technologies, users have become ubiquitous sensors capable of monitoring various real-world events in a crowd-sourced manner. Location-based social networks have proven to be faster than traditional media channels in reporting and geo-locating breaking news, i.e. Osama Bin Laden’s death was first confirmed on Twitter even before the announcement from the communication department at the White House. However, the deluge of user-generated data on these networks requires intelligent systems capable of identifying and characterizing such events in a comprehensive manner. The data mining community coined the term, event detection , to refer to the task of uncovering emerging patterns in data streams . Nonetheless, most data mining techniques do not reproduce the underlying data generation process, hampering to self-adapt in fast-changing scenarios. Because of this, we propose a probabilistic machine learning approach to event detection which explicitly models the data generation process and enables reasoning about the discovered events. With the aim to set forth the differences between both approaches, we present two techniques for the problem of event detection in Twitter : a data mining technique called Tweet-SCAN and a machine learning technique called Warble. We assess and compare both techniques in a dataset of tweets geo-located in the city of Barcelona during its annual festivities. Last but not least, we present the algorithmic changes and data processing frameworks to scale up the proposed techniques to big data workloads.This work is partially supported by Obra Social “la Caixa”, by the Spanish Ministry of Science and Innovation under contract (TIN2015-65316), by the Severo Ochoa Program (SEV2015-0493), by SGR programs of the Catalan Government (2014-SGR-1051, 2014-SGR-118), Collectiveware (TIN2015-66863-C2-1-R) and BSC/UPC NVIDIA GPU Center of Excellence.We would also like to thank the reviewers for their constructive feedback.Peer ReviewedPostprint (author's final draft
    corecore