19 research outputs found
GeoCorpora: building a corpus to test and train microblog geoparsers
<p>In this article, we present the GeoCorpora corpus building framework and software tools as well as a geo-annotated Twitter corpus built with these tools to foster research and development in the areas of microblog/Twitter geoparsing and geographic information retrieval. The developed framework employs crowdsourcing and geovisual analytics to support the construction of large corpora of text in which the mentioned location entities are identified and geolocated to toponyms in existing geographical gazetteers. We describe how the approach has been applied to build a corpus of geo-annotated tweets that will be made freely available to the research community alongside this article to support the evaluation, comparison and training of geoparsers. Additionally, we report lessons learned related to corpus construction for geoparsing as well as insights about the notions of place and natural spatial language that we derive from application of the framework to building this corpus.</p
Map illustrating the connectivity between districts and Eigenvector value illustrating the level of influence of each district.
<p>For clarity we have only illustrated linkages with more than 100 connections and Eigenvectors greater than 0.50. Map created using ArcGIS 10.2.</p
Connectivity between locations and human movement within Kenya.
<p>(A) Distances travelled daily, monthly and in total by each user and (B) the proportion of user’s radius of gyration (solid line).</p
Centrality values between districts in Kenya.
<p>Centrality values between districts in Kenya.</p
Movement patterns captured at different temporal scales illustrate connectivity between districts in Kenya within a 24-hour time period (N = 90,645 tracks) and during a ten month time period (N = 17,900).
<p>Each line represents a movement segment. The long distance tracks indicates population movements by plane or by train within the country. Maps created using ArcGIS 10.2.</p
Summary of Twitter data used in this analysis that was collected for Kenya between June 2013 and March 2014 (N<sub>unique users</sub> = 28,335; N<sub>tweets</sub> = 720,149).
<p>Summary of Twitter data used in this analysis that was collected for Kenya between June 2013 and March 2014 (N<sub>unique users</sub> = 28,335; N<sub>tweets</sub> = 720,149).</p
Maps illustrating (A) the distribution of geo-located tweets in the study area for users who crossed-borders (N = 770) (B) connections between Kenya and the surrounding countries and (C) a flow map showing the connectivity between different geographic locations by travel distance.
<p>Map created using ArcGIS 10.2.</p
A framework and associated data sources useful for capturing human mobility in time and space.
<p>Movements are characterized in terms of their spatial and temporal scale, which are defined in terms of physical displacement (<i>spatial</i>) and time spent (<i>temporal</i>, frequency and duration) (Source: adapted from [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0129202#pone.0129202.ref045" target="_blank">45</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0129202#pone.0129202.ref057" target="_blank">57</a>]).</p
Dendrogram View.
<p>Two layouts to visualize the hierarchical structure of CONCOR results: the left one is a tree layout and the right one is a radial layout. Slider bar is used to control the level of CONCOR results.</p
Validation results as a function of total connection numbers.
<p>Validation results as a function of total connection numbers.</p