5 research outputs found

    Linguistic and orthographical classic Portuguese variants. Challenges for NLP

    Get PDF
    In recent times, it was made a great investment in transfer from physical ancient Portuguese texts to digital support. This support transfer allows not only the access to the texts, bringing them to the public in general, but also the possibility of texts to be readable and processed by machines. NLP tools are addressed, mainly, to contemporary Portuguese and the application of NLP to classic texts has several difficulties. The elaboration of big lexical corpora of forms previous to modern Portuguese is an opportunity for multidisciplinary field of studies allowing the enlargement of linguistic studies and also the possibility of obtaining, by NLP, validated corpora, collections and ontologies, that can be input in NLP tools for ancient Portuguese texts. In this work we will present, briefly, the problem of lexical variation of forms in processing classic Portuguese texts, the challenges that emerge from them and future perspectives of work

    Enriching the 1758 Portuguese Parish Memories (Alentejo) with Named Entities

    Get PDF
    This work presents an enriched version of the Parish Memories (1758–1761), an essential Portuguese historical source manually transcribed. It is enriched with annotations of named entities of the types PERSON, LOCATION, and ORGANIZATION. The annotation was done automatically for the whole collection where two researchers annotated a portion of it manually for evaluation purposes. In this dataset, we provide the tagged texts, the lists of extracted entities, and frequency counts. The corpus is useful for historians, allowing, for instance, comparative analyses between parishes and regions or to calculate the area of influence of a locality. The paper describes the creation and evaluation of the corpus, discusses its applications and limitations. This first release may be improved by other researchers interested in the historical source itself or in the technology employed in its annotation.FCT CEECIND/01997/2017, UIDB/00057/202

    Planear a normalização automática: tipologia de variação gráfica do corpus das Memórias Paroquiais (1758)

    Get PDF
    Digital Humanities are now essential for studies on large-scale textual corpora, where the transformation of text into processable data regarding linguistic phenomena requires a multidisciplinary treatment. In this article we will present an approach in Digital Humanities, which was applied to a Portuguese textual corpus from the 18th-century, gathered from a set of documents known as Memórias Paroquiais [“The Parish Memoirs”], with high historical and heritage value. We will highlight some corpus constitution characteristics, questions concerning the expressive spelling variation perceived in the texts. We propose a typology towards a future automatic normalization of this textual corpus.FCT - Portugal - UIDB/00057/202

    As Memórias Paroquiais: do manuscrito ao digital

    Get PDF
    This text aims to trace the history of the custody of the Parish Memories ("Memórias Paroquiais"), from the diffusion of the surveys in 1758, to the current projects which aim at their conversion into digital objects and data. Reflecting on this itineracio is also a way to evaluate and rethink working strategies on this collection. It should be noted that this is a relevant resource for the understanding of mid-eighteenth century Portugal and of interest not only to the historian, but also to many other scholars and actors in many fields.Trabalho desenvolvido no âmbito dos projetos UIDB/00057/2020 e PTDC/ART-HIS/32327/2017 - FCT – Portuga