8 research outputs found

    Overture POI data for the United Kingdom: a comprehensive, queryable open data product

    Full text link
    Point of Interest data that is comprehensive, globally-available and open-access, is sparse, despite being important inputs for research in a number of application areas. New data from the Overture Maps Foundation offers significant potential in this arena, but accessing the data relies on computational resources beyond the skillset and capacity of the average researcher. In this article, we provide a processed version of the Overture places (POI) dataset for the UK, in a fully-queryable format, and provide accompanying code through which to explore the data, and generate other national subsets. In the article, we describe the construction and characteristics of the dataset, before considering how reliable it is (locational accuracy, attribute comprehensiveness), through direct comparison with Geolytix supermarket data. This dataset can support new and important research projects in a variety of different thematic areas, and foster a network of researchers to further evaluate its advantages and limitations.Comment: Main document: 6 pages, 1 figure, 2 tables. Supplementary: 2 pages, 2 figures, 1 tabl

    Mapping Great Britain's semantic footprints through a large language model analysis of Reddit comments

    Get PDF
    Observed regional variation in geotagged social media text is often attributed to dialects, where features in language are assumed to exhibit region-specific properties. While dialects are seen as a key component in defining the identity of regions, there are a multitude of other geographic properties that may be captured within natural language text. In our work, we consider locational mentions that are directly embedded within comments on the social media website Reddit, providing a range of associated semantic information, and enabling deeper representations between locations to be captured. Using a large corpus of geoparsed Reddit comments from UK-related local discussion subreddits, we first extract embedded semantic information using a large language model, aggregated into local authority districts, representing the semantic footprint of these regions. These footprints broadly exhibit spatial autocorrelation, with clusters that conform with the national borders of Wales and Scotland. London, Wales, and Scotland also demonstrate notably different semantic footprints compared with the rest of Great Britain

    Transformer based named entity recognition for place name extraction from unstructured text

    Get PDF
    Place names embedded in online natural language text present a useful source of geographic information. Despite this, many methods for the extraction of place names from text use pre-trained models that were not explicitly designed for this task. Our paper builds five custom-built Named Entity Recognition (NER) models and evaluates them against three popular pre-built models for place name extraction. The models are evaluated using a set of manually annotated Wikipedia articles with reference to the F1 score metric. Our best performing model achieves an F1 score of 0.939 compared with 0.730 for the best performing pre-built model. Our model is then used to extract all place names from Wikipedia articles in Great Britain, demonstrating the ability to more accurately capture unknown place names from volunteered sources of online geographic information

    Mapping Cognitive Place Associations within the United Kingdom through Online Discussion on Reddit

    No full text
    This repository contains the data and code required to replicate the analysis of our paper.Data may be found within the data/ directory in the uploaded zip file.Code is found within the scripts/ directory.Replicate Processing Install project dependencies into a python virtual environment using `poetry install`.Replicate our pipeline using dvc repro -f.NOTE: Consult the dvc.yaml file to see the processing pipeline. Several stages have been frozen as they require external data.</p

    Mapping cognitive place associations within the United Kingdom through online discussion on Reddit

    Get PDF
    AbstractThis paper explores cognitive place associations; conceptualised as a place‐based mental model that derives subconscious links between geographic locations. Utilising a large corpus of online discussion data from the social media website Reddit, we experiment on the extraction of such geographic knowledge from unstructured text. First we construct a system to identify place names found in Reddit comments, disambiguating each to a set of coordinates where possible. Following this, we build a collective picture of cognitive place associations in the United Kingdom, linking locations that co‐occur in user comments and evaluating the effect of distance on the strength of these associations. Exploring these geographies nationally, associations were shown to be typically weaker over greater distances. This distance decay is also highly regional, rural areas typically have greater levels of distance decay, particularly in Wales and Scotland. When comparing major cities across the UK, we observe distinct distance decay patterns, influenced primarily by proximity to other cities.</jats:p
    corecore