74 research outputs found

    Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages

    Full text link
    Principal component analysis (PCA) and related techniques have been successfully employed in natural language processing. Text mining applications in the age of the online social media (OSM) face new challenges due to properties specific to these use cases (e.g. spelling issues specific to texts posted by users, the presence of spammers and bots, service announcements, etc.). In this paper, we employ a Robust PCA technique to separate typical outliers and highly localized topics from the low-dimensional structure present in language use in online social networks. Our focus is on identifying geospatial features among the messages posted by the users of the Twitter microblogging service. Using a dataset which consists of over 200 million geolocated tweets collected over the course of a year, we investigate whether the information present in word usage frequencies can be used to identify regional features of language use and topics of interest. Using the PCA pursuit method, we are able to identify important low-dimensional features, which constitute smoothly varying functions of the geographic location

    Determine the User Country of a Tweet

    Get PDF
    In the widely used message platform Twitter, about 2% of the tweets contains the geographical location through exact GPS coordinates (latitude and longitude). Knowing the location of a tweet is useful for many data analytics questions. This research is looking at the determination of a location for tweets that do not contain GPS coordinates. An accuracy of 82% was achieved using a Naive Bayes model trained on features such as the users' timezone, the user's language, and the parsed user location. The classifier performs well on active Twitter countries such as the Netherlands and United Kingdom. An analysis of errors made by the classifier shows that mistakes were made due to limited information and shared properties between countries such as shared timezone. A feature analysis was performed in order to see the effect of different features. The features timezone and parsed user location were the most informative features.Comment: CTIT Technical Report, University of Twent

    The impact of using social media data in crime rate calculations: shifting hot spots and changing spatial patterns

    Get PDF
    Crime rate is a statistic used to summarize the risk of criminal events. However, research has shown that choosing the appropriate denominator is non-trivial. Different crime types exhibit different spatial opportunities and so does the population at risk. The residential population is the most commonly used population at risk, but is unlikely to be suitable for crimes that involve mobile populations. In this article, we use "crowd-sourced" data in Leeds, England, to measure the population at risk, considering violent crime. These new data sources have the potential to represent mobile populations at higher spatial and temporal resolutions than other available data. Through the use of two local spatial statistics (Getis-Ord GI* and the Geographical Analysis Machine) and visualization, we show that when the volume of social media messages, as opposed to the residential population, is used as a proxy for the population at risk, criminal event hot spots shift spatially. Specifically, the results indicate a significant shift in the city center, eliminating its hot spot. Consequently, if crime reduction/prevention efforts are based on resident population based crime rates, such efforts may not only be ineffective in reducing criminal event risk, but be a waste of public resources

    Beyond data collection: Objectives and methods of research using VGI and geo-social media for disaster management

    Get PDF
    This paper investigates research using VGI and geo-social media in the disaster management context. Relying on the method of systematic mapping, it develops a classification schema that captures three levels of main category, focus, and intended use, and analyzes the relationships with the employed data sources and analysis methods. It focuses the scope to the pioneering field of disaster management, but the described approach and the developed classification schema are easily adaptable to different application domains or future developments. The results show that a hypothesized consolidation of research, characterized through the building of canonical bodies of knowledge and advanced application cases with refined methodology, has not yet happened. The majority of the studies investigate the challenges and potential solutions of data handling, with fewer studies focusing on socio-technological issues or advanced applications. This trend is currently showing no sign of change, highlighting that VGI research is still very much technology-driven as opposed to theory- or application-driven. From the results of the systematic mapping study, the authors formulate and discuss several research objectives for future work, which could lead to a stronger, more theory-driven treatment of the topic VGI in GIScience.Carlos Granell has been partly funded by the Ramón y Cajal Programme (grant number RYC-2014-16913

    How Do People Describe Locations During a Natural Disaster: An Analysis of Tweets from Hurricane Harvey

    Get PDF
    Social media platforms, such as Twitter, have been increasingly used by people during natural disasters to share information and request for help. Hurricane Harvey was a category 4 hurricane that devastated Houston, Texas, USA in August 2017 and caused catastrophic flooding in the Houston metropolitan area. Hurricane Harvey also witnessed the widespread use of social media by the general public in response to this major disaster, and geographic locations are key information pieces described in many of the social media messages. A geoparsing system, or a geoparser, can be utilized to automatically extract and locate the described locations, which can help first responders reach the people in need. While a number of geoparsers have already been developed, it is unclear how effective they are in recognizing and geo-locating the locations described by people during natural disasters. To fill this gap, this work seeks to understand how people describe locations during a natural disaster by analyzing a sample of tweets posted during Hurricane Harvey. We then identify the limitations of existing geoparsers in processing these tweets, and discuss possible approaches to overcoming these limitations

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
    • …
    corecore