148 research outputs found

    Local News And Event Detection In Twitter

    Get PDF
    Twitter, one of the most popular micro-blogging services, allows users to publish short messages on a wide variety of subjects such as news, events, stories, ideas, and opinions, called tweets. The popularity of Twitter, to some extent, arises from its capability of letting users promptly and conveniently contribute tweets to convey diverse information. Specifically, with people discussing what is happening outside in the real world by posting tweets, Twitter captures invaluable information about real-world news and events, spanning a wide scale from large national or international stories like a presidential election to small local stories such as a local farmers market. Detecting and extracting small news and events for a local place is a challenging problem and is the focus of this thesis. In particular, we explore several directions to extract and detect local news and events using tweets in Twitter: a) how to identify local influential people on Twitter for potential news seeders; b) how to recognize unusualness in tweet volume as signals of potential local events; c) how to overcome the data sparsity of local tweets to detect more and smaller undergoing local news and events. Additionally, we also try to uncover implicit correlations between location, time, and text in tweets by learning embeddings for them using a universal representation under the same semantic space. In the first part, we investigate how to measure the spatial influence of Twitter users by their interactions and thereby identify the locally influential users, which we found are usually good news and event seeders in practice. In order to do this, we built a large-scale directed interaction graph of Twitter users. Such a graph allows us to exploit PageRank based ranking procedures to select top local influential people after innovatively incorporating in geographical distance to the transition matrix used for the random walking. In the second part, we study how to recognize the unusualness in tweet volume at a local place as signals of potential ongoing local events. The intuition is that if there is suddenly an abnormal change in the number of tweets at a location (e.g., a significant increase), it may imply a potential local event. We, therefore, present DeLLe, a methodology for automatically Detecting Latest Local Events from geotagged tweet streams (i.e., tweets that contain GPS points). With the help of novel spatiotemporal tweet count prediction models, DeLLe first finds unusual locations which have aggregated an unexpected number of tweets in the latest time period and then calculates, for each such unusual location, a ranking score to identify the ones most likely to have ongoing local events by addressing the temporal burstiness, spatial business, and topical coherence. In the third part, we explore how to overcome the data sparsity of local tweets when trying to discover more and smaller local news or events. Local tweets are those whose locations fall inside a local place. They are very sparse in Twitter, which hinders the detection of small local news or events that have only a handful of tweets. A system, called Firefly, is proposed to enhance the local live tweet stream by tracking the tweets of a large body of local people, and further perform a locality-aware keyword based clustering for event detection. The intuition is that local tweets are published by local people, and tracking their tweets naturally yields a source of local tweets. However, in practice, only 20% Twitter users provide information about where they come from. Thus, a social network-based geotagging procedure is subsequently proposed to estimate locations for Twitter users whose locations are missing. Finally, in order to discover correlations between location, time and text in geotagged tweets, e.g., “find which locations are mostly related to the given topics“ and “find which locations are similar to a given location“, we present LeGo, a methodology for Learning embeddings of Geotagged tweets with respect to entities such as locations, time units (hour-of-day and day-of-week) and textual words in tweets. The resulting compact vector representations of these entities hence make it easy to measure the relatedness between locations, time and words in tweets. LeGo comprises two working modes: crossmodal search (LeGo-CM) and location-similarity search (LeGo-LS), to answer these two types of queries accordingly. In LeGo-CM, we first build a graph of entities extracted from tweets in which each edge carries the weight of co-occurrences between two entities. The embeddings of graph nodes are then learned in the same latent space under the guidance of approximating stationary residing probabilities between nodes which are computed using personalized random walk procedures. In comparison, we supplement edges between locations in LeGo-LS to address their underlying spatial proximity and topic likeliness to support location-similarity search queries

    Image Analysis Enhanced Event Detection from Geo-tagged Tweet Streams

    Full text link
    Events detected from social media streams often include early signs of accidents, crimes or disasters. Therefore, they can be used by related parties for timely and efficient response. Although significant progress has been made on event detection from tweet streams, most existing methods have not considered the posted images in tweets, which provide richer information than the text, and potentially can be a reliable indicator of whether an event occurs or not. In this paper, we design an event detection algorithm that combines textual, statistical and image information, following an unsupervised machine learning approach. Specifically, the algorithm starts with semantic and statistical analyses to obtain a list of tweet clusters, each of which corresponds to an event candidate, and then performs image analysis to separate events from non-events---a convolutional autoencoder is trained for each cluster as an anomaly detector, where a part of the images are used as the training data and the remaining images are used as the test instances. Our experiments on multiple datasets verify that when an event occurs, the mean reconstruction errors of the training and test images are much closer, compared with the case where the candidate is a non-event cluster. Based on this finding, the algorithm rejects a candidate if the difference is larger than a threshold. Experimental results over millions of tweets demonstrate that this image analysis enhanced approach can significantly increase the precision with minimum impact on the recall.Comment: 12 pages, 4 figure

    Service quality monitoring in confined spaces through mining Twitter data

    Get PDF
    Promoting public transport depends on adapting effective tools for concurrent monitoring of perceived service quality. Social media feeds, in general, provide an opportunity to ubiquitously look for service quality events, but when applied to confined geographic area such as a transport node, the sparsity of concurrent social media data leads to two major challenges. Both the limited number of social media messages--leading to biased machine-learning--and the capturing of bursty events in the study period considerably reduce the effectiveness of general event detection methods. In contrast to previous work and to face these challenges, this paper presents a hybrid solution based on a novel fine-tuned BERT language model and aspect-based sentiment analysis. BERT enables extracting aspects from a limited context, where traditional methods such as topic modeling and word embedding fail. Moreover, leveraging aspect-based sentiment analysis improves the sensitivity of event detection. Finally, the efficacy of event detection is further improved by proposing a statistical approach to combine frequency-based and sentiment-based solutions. Experiments on a real-world case study demonstrate that the proposed solution improves the effectiveness of event detection compared to state-of-the-art approaches

    Mining Twitter for crisis management: realtime floods detection in the Arabian Peninsula

    Get PDF
    A thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of doctor of Philosophy.In recent years, large amounts of data have been made available on microblog platforms such as Twitter, however, it is difficult to filter and extract information and knowledge from such data because of the high volume, including noisy data. On Twitter, the general public are able to report real-world events such as floods in real time, and act as social sensors. Consequently, it is beneficial to have a method that can detect flood events automatically in real time to help governmental authorities, such as crisis management authorities, to detect the event and make decisions during the early stages of the event. This thesis proposes a real time flood detection system by mining Arabic Tweets using machine learning and data mining techniques. The proposed system comprises five main components: data collection, pre-processing, flooding event extract, location inferring, location named entity link, and flooding event visualisation. An effective method of flood detection from Arabic tweets is presented and evaluated by using supervised learning techniques. Furthermore, this work presents a location named entity inferring method based on the Learning to Search method, the results show that the proposed method outperformed the existing systems with significantly higher accuracy in tasks of inferring flood locations from tweets which are written in colloquial Arabic. For the location named entity link, a method has been designed by utilising Google API services as a knowledge base to extract accurate geocode coordinates that are associated with location named entities mentioned in tweets. The results show that the proposed location link method locate 56.8% of tweets with a distance range of 0 – 10 km from the actual location. Further analysis has shown that the accuracy in locating tweets in an actual city and region are 78.9% and 84.2% respectively

    A Big Data Analytics Method for Tourist Behaviour Analysis

    Get PDF
    © 2016 Elsevier B.V. Big data generated across social media sites have created numerous opportunities for bringing more insights to decision-makers. Few studies on big data analytics, however, have demonstrated the support for strategic decision-making. Moreover, a formal method for analysing social media-generated big data for decision support is yet to be developed, particularly in the tourism sector. Using a design science research approach, this study aims to design and evaluate a ‘big data analytics’ method to support strategic decision-making in tourism destination management. Using geotagged photos uploaded by tourists to the photo-sharing social media site, Flickr, the applicability of the method in assisting destination management organisations to analyse and predict tourist behavioural patterns at specific destinations is shown, using Melbourne, Australia, as a representative case. Utility was confirmed using both another destination and directly with stakeholder audiences. The developed artefact demonstrates a method for analysing unstructured big data to enhance strategic decision making within a real problem domain. The proposed method is generic, and its applicability to other big data streams is discussed

    A Big Data Analytics Method for Tourist Behaviour Analysis

    Get PDF
    © 2016 Elsevier B.V. Big data generated across social media sites have created numerous opportunities for bringing more insights to decision-makers. Few studies on big data analytics, however, have demonstrated the support for strategic decision-making. Moreover, a formal method for analysing social media-generated big data for decision support is yet to be developed, particularly in the tourism sector. Using a design science research approach, this study aims to design and evaluate a ‘big data analytics’ method to support strategic decision-making in tourism destination management. Using geotagged photos uploaded by tourists to the photo-sharing social media site, Flickr, the applicability of the method in assisting destination management organisations to analyse and predict tourist behavioural patterns at specific destinations is shown, using Melbourne, Australia, as a representative case. Utility was confirmed using both another destination and directly with stakeholder audiences. The developed artefact demonstrates a method for analysing unstructured big data to enhance strategic decision making within a real problem domain. The proposed method is generic, and its applicability to other big data streams is discussed

    Using Physical and Social Sensors in Real-Time Data Streaming for Natural Hazard Monitoring and Response

    Get PDF
    Technological breakthroughs in computing over the last few decades have resulted in important advances in natural hazards analysis. In particular, integration of a wide variety of information sources, including observations from spatially-referenced physical sensors and new social media sources, enables better estimates of real-time hazard. The main goal of this work is to utilize innovative streaming algorithms for improved real-time seismic hazard analysis by integrating different data sources and processing tools into cloud applications. In streaming algorithms, a sequence of items from physical and social sensors can be processed in as little as one pass with no need to store the data locally. Massive data volumes can be analyzed in near-real time with reasonable limits on storage space, an important advantage for natural hazard analysis. Seismic hazard maps are used by policymakers to set earthquake resistant construction standards, by insurance companies to set insurance rates and by civil engineers to estimate stability and damage potential. This research first focuses on improving probabilistic seismic hazard map production. The result is a series of maps for different frequency bands at significantly increased resolution with much lower latency time that includes a range of high-resolution sensitivity tests. Second, a method is developed for real-time earthquake intensity estimation using joint streaming analysis from physical and social sensors. Automatically calculated intensity estimates from physical sensors such as seismometers use empirical relationships between ground motion and intensity, while those from social sensors employ questionaries that evaluate ground shaking levels based on personal observations. Neither is always sufficiently precise and/or timely. Results demonstrate that joint processing can significantly reduce the response time to a damaging earthquake and estimate preliminary intensity levels during the first ten minutes after an event. The combination of social media and network sensor data, in conjunction with innovative computing algorithms, provides a new paradigm for real-time earthquake detection, facilitating rapid and inexpensive risk reduction. In particular, streaming algorithms are an efficient method that addresses three major problems in hazard estimation by improving resolution, decreasing processing latency to near real-time standards and providing more accurate results through the integration of multiple data sets

    A Process Evaluation of Intelligence Gathering Using Social Media for Emergency Management Organizations in California

    Get PDF
    When responding to an emergency, correct and timely information is often the difference between a successful response and a potential disaster. The information that emergency managers in California receive from the public often dictates how agencies respond to emergencies. The emergence of social media has presented several benefits to emergency managers regarding intelligence gathering during the emergency response process. Simultaneously, the emergence of social media has raised several concerns for the stakeholders involved. One major issue involves inaccurate information circulating on social media platforms during ongoing disasters. If emergency managers cannot discern incorrect information from correct information, disaster response may be less effective. Rumors and misinformation tend to circulate before, during, and after emergencies. Although incorrect information circulating on social media cannot be stopped in totality, emergency managers can use cutting-edge technology and strategies to discern and counteract false information. New technologies and intelligence gathering tools can be used as a source of intelligence to relay lifesaving information to the public. Past negative examples of inaccurate information on social media influencing stakeholder decision-making raise the focus of this research: How can emergency management agencies in California leverage the flow of valid information on social media during crisis conditions
    • …
    corecore