8 research outputs found
Overture POI data for the United Kingdom: a comprehensive, queryable open data product
Point of Interest data that is comprehensive, globally-available and
open-access, is sparse, despite being important inputs for research in a number
of application areas. New data from the Overture Maps Foundation offers
significant potential in this arena, but accessing the data relies on
computational resources beyond the skillset and capacity of the average
researcher. In this article, we provide a processed version of the Overture
places (POI) dataset for the UK, in a fully-queryable format, and provide
accompanying code through which to explore the data, and generate other
national subsets. In the article, we describe the construction and
characteristics of the dataset, before considering how reliable it is
(locational accuracy, attribute comprehensiveness), through direct comparison
with Geolytix supermarket data. This dataset can support new and important
research projects in a variety of different thematic areas, and foster a
network of researchers to further evaluate its advantages and limitations.Comment: Main document: 6 pages, 1 figure, 2 tables. Supplementary: 2 pages, 2
figures, 1 tabl
Mapping Great Britain's semantic footprints through a large language model analysis of Reddit comments
Observed regional variation in geotagged social media text is often attributed to dialects, where features in language are assumed to exhibit region-specific properties. While dialects are seen as a key component in defining the identity of regions, there are a multitude of other geographic properties that may be captured within natural language text. In our work, we consider locational mentions that are directly embedded within comments on the social media website Reddit, providing a range of associated semantic information, and enabling deeper representations between locations to be captured. Using a large corpus of geoparsed Reddit comments from UK-related local discussion subreddits, we first extract embedded semantic information using a large language model, aggregated into local authority districts, representing the semantic footprint of these regions. These footprints broadly exhibit spatial autocorrelation, with clusters that conform with the national borders of Wales and Scotland. London, Wales, and Scotland also demonstrate notably different semantic footprints compared with the rest of Great Britain
Transformer based named entity recognition for place name extraction from unstructured text
Place names embedded in online natural language text present a useful source of geographic information. Despite this, many methods for the extraction of place names from text use pre-trained models that were not explicitly designed for this task. Our paper builds five custom-built Named Entity Recognition (NER) models and evaluates them against three popular pre-built models for place name extraction. The models are evaluated using a set of manually annotated Wikipedia articles with reference to the F1 score metric. Our best performing model achieves an F1 score of 0.939 compared with 0.730 for the best performing pre-built model. Our model is then used to extract all place names from Wikipedia articles in Great Britain, demonstrating the ability to more accurately capture unknown place names from volunteered sources of online geographic information
Mapping Cognitive Place Associations within the United Kingdom through Online Discussion on Reddit
This repository contains the data and code required to replicate the analysis of our paper.Data may be found within the data/ directory in the uploaded zip file.Code is found within the scripts/ directory.Replicate Processing Install project dependencies into a python virtual environment using `poetry install`.Replicate our pipeline using dvc repro -f.NOTE: Consult the dvc.yaml file to see the processing pipeline. Several stages have been frozen as they require external data.</p
Recommended from our members
Geoparsing comments from Reddit to extract mental place connectivity within the United Kingdom
Place connectivity is explored between geographic locations extracted from comments on Reddit. Unlike formally structured geographic data, this corpus of unstructured text provides connections derived from co-occurring locations, capturing subconscious links between them, alongside inherent biases. Our work demonstrates the ability to link locations mentioned by unique users, building âmentalâ place connections for over 50,000 unique locations in the United Kingdom. Sentiment regarding locations is compared against their levels of connectivity, demonstrating that user opinions regarding locations are likely drivers in mental place connectivity
Mapping cognitive place associations within the United Kingdom through online discussion on Reddit
AbstractThis paper explores cognitive place associations; conceptualised as a placeâbased mental model that derives subconscious links between geographic locations. Utilising a large corpus of online discussion data from the social media website Reddit, we experiment on the extraction of such geographic knowledge from unstructured text. First we construct a system to identify place names found in Reddit comments, disambiguating each to a set of coordinates where possible. Following this, we build a collective picture of cognitive place associations in the United Kingdom, linking locations that coâoccur in user comments and evaluating the effect of distance on the strength of these associations. Exploring these geographies nationally, associations were shown to be typically weaker over greater distances. This distance decay is also highly regional, rural areas typically have greater levels of distance decay, particularly in Wales and Scotland. When comparing major cities across the UK, we observe distinct distance decay patterns, influenced primarily by proximity to other cities.</jats:p