18 research outputs found

    Automatically analysing large texts in a GIS environment: The Registrar General’s reports and cholera in the nineteenth century

    Get PDF
    This is the peer reviewed version of the following article: Murrieta-Flores, P., Baron, A., Gregory, I., Hardie, A., & Rayson, P. (2015). Automatically analysing large texts in a GIS environment: The Registrar General’s reports and cholera in the nineteenth century. Transactions in GIS, 19(2), 296-320. DOI: 10.1111/tgis.12106., which has been published in final form at http://onlinelibrary.wiley.com/doi/10.1111/tgis.12106/abstract. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-ArchivingThe aim of this article is to present new research showcasing how Geographic Information Systems in combination with Natural Language Processing and Corpus Linguistics methods can offer innovative venues of research to analyze large textual collections in the Humanities, particularly in historical research. Using as examples parts of the collection of the Registrar General’s Reports that contain more than 200,000 pages of descriptions, census data and vital statistics for the UK, we introduce newly developed automated textual tools and well known spatial analyses used in combination to investigate a case study of the references made to cholera and other diseases in these historical sources, and their relationship to place-names during Victorian times. The integration of such techniques has allowed us to explore, in an automatic way, this historical source containing millions of words, to examine the geographies depicted in it, and to identify textual and geographic patterns in the corpus

    Automatically analysing large texts in a GIS environment:the Registrar General’s reports and cholera in the nineteenth century

    Get PDF
    The aim of this article is to present new research showcasing how Geographic Information Systems in combination with Natural Language Processing and Corpus Linguistics methods can offer innovative venues of research to analyze large textual collections in the Humanities, particularly in historical research. Using as examples parts of the collection of the Registrar General's Reports that contain more than 200,000 pages of descriptions, census data and vital statistics for the UK, we introduce newly developed automated textual tools and well known spatial analyses used in combination to investigate a case study of the references made to cholera and other diseases in these historical sources, and their relationship to place-names during Victorian times. The integration of such techniques has allowed us to explore, in an automatic way, this historical source containing millions of words, to examine the geographies depicted in it, and to identify textual and geographic patterns in the corpus

    Putting the Eighteenth Century on the Map

    Get PDF

    Alts, Abbreviations, and AKAs:historical onomastic variation and automated named entity recognition

    Get PDF
    The accurate automated identification of named places is a major concern for scholars in the digital humanities, and especially for those engaged in research that depends upon the gazetteer-led recognition of specific aspects. The field of onomastics examines the linguistic roots and historical development of names, which have for the most part only standardised into single officially recognised forms since the late nineteenth century. Even slight spelling variations can introduce errors in geotagging techniques, and these differences in place-name spellings are thus vital considerations when seeking high rates of correct geospatial identification in historical texts. This article offers an overview of typical name-based variation that can cause issues in the accurate geotagging of any historical resource. The article argues that the careful study and documentation of these variations can assist in the development of more complete onymic records, which in turn may inform geotaggers through a cycle of variational recognition. It demonstrates how patterns in regional naming variation and development, across both specific and generic name elements, can be identified through the historical records of each known location. The article uses examples taken from a digitised corpus of writing about the English Lake District, a collection of 80 texts that date from between 1622 and 1900. Four of the more complex spelling-based problems encountered during the creation of a manual gazetteer for this corpus are examined. Specifically, the article demonstrates how and why such variation must be expected, particularly in the years preceding the standardisation of place-name spellings. It suggests how procedural developments may be undertaken to account for such georeferential issues in the Named Entity Recognition strategies employed by future projects. Similarly, the benefits of such multi-genre corpora to assist in completing onomastic records is also shown through examples of new name forms discovered for prominent sites in the Lake District. This focus is accompanied by a discussion of the influence of literary works on place-name standardisation – an aspect not typically accounted for in traditional onomastic study – to illustrate the extent to which authorial interests in regional toponymic histories can influence linguistic development

    Georreferenciação de conteúdos de bases de dados documentais

    Get PDF
    Dissertação de mestrado integrado em Engenharia InformáticaA georreferenciação é o processo de localização geográfica de um determinado objeto espacial através da atribuição de coordenadas. Os sistemas de georreferenciação utilizam um processamento espacial automático executado por computador, por exemplo para colocar uma entidade num mapa ou fornecer um recurso espacial. Quando este processo é aplicado a coleções de documentos textuais, é descrito como uma combinação de reconhecimento de entidades nomeadas. O Livro das Propriedades, também designado como o Tombo da Mitra, contém informação relativa aos tipos de terras, acidentes de terreno, nomes de ruas, proprietários e apontamentos biográficos e genealógicos das várias propriedades que a mesa Arcebispal de Braga possuía no século XVII. Este trabalho de dissertação teve como objetivo conceber e implementar um sistema de georreferenciação textual para o conteúdo existente no Livro das Propriedades, com particular enfoque nos lugares que nele estão referidos, de forma a permitir aos estudiosos destes conteúdos possuírem informação acerca da localização geográfica desses elementos.Georeferencing is the process of geographically locating a given spatial object by assigning coordinates. Georeferencing systems use automatic spatial processing performed by a com puter, for example to place an entity on a map or provide a spatial resource. When this pro cess is applied to collections of textual documents, it is described as a combination of named entity recognition. The Livro das Propriedades, also designated as ”Tombo da Mitra”, contains information regarding land types, landforms, street names, owners, and biographical and genealogical notes of the several properties that the Archbishop’s table of ”Braga”owned in the 17th century. This dissertation work aimed to design and implement a textual georeferencing system for the existing contents of the ”Livro das Propriedades”, with particular focus on the places mentioned in it, in order to allow scholars of these contents to have infor mation about the geographical location of those elements

    Ensemble Named Entity Recognition (NER):Evaluating NER Tools in the Identification of Place Names in Historical Corpora

    Get PDF
    The field of Spatial Humanities has advanced substantially in the past years. The identification and extraction of toponyms and spatial information mentioned in historical text collections has allowed its use in innovative ways, making possible the application of spatial analysis and the mapping of these places with geographic information systems. For instance, automated place name identification is possible with Named Entity Recognition (NER) systems. Statistical NER methods based on supervised learning, in particular, are highly successful with modern datasets. However, there are still major challenges to address when dealing with historical corpora. These challenges include language changes over time, spelling variations, transliterations, OCR errors, and sources written in multiple languages among others. In this article, considering a task of place name recognition over two collections of historical correspondence, we report an evaluation of five NER systems and an approach that combines these through a voting system. We found that although individual performance of each NER system was corpus dependent, the ensemble combination was able to achieve consistent measures of precision and recall, outperforming the individual NER systems. In addition, the results showed that these NER systems are not strongly dependent on preprocessing and translation to Modern English

    Location Reference Recognition from Texts: A Survey and Comparison

    Full text link
    A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of its specific applications is still missing. Further, there is a lack of a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching–based, statistical learning-–based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references worldwide. Results from this thorough evaluation can help inform future methodological developments and can help guide the selection of proper approaches based on application needs

    Towards the Spatial Analysis of Vague and Imaginary Place and Space:Evolving the Spatial Humanities through Medieval Romance

    Get PDF
    Using a group of medieval romances as a case study and building on Lefebvre’s original proposition, the understanding of space and place changes with time and culture. The objective of this study is to delve into this issue. It proposes a preliminary exploratory methodology that, combining spatial technologies and a linguistic approach, and aims to facilitate the analysis of medieval narratives, accounting for the spatial complexity portrayed, as well as integrating and expediting the exploration of geographical, vague, and imaginary space and place in Humanities-based fields
    corecore