20 research outputs found

    Automatic morphological analysis and interlinking of historical Irish cognate verb forms

    No full text
    The main aim of the author’s research project is to use computational approaches to gain more insight into the historical development of Irish verbs. One of the objectives is to investigate how a link between the electronic Dictionary of the Irish language (eDIL),1 covering the period c. 700–c. 1700, but focussing on Early Irish (7th–12th centuries), and the nascent FoclĂłir StairiĂșil na Gaeilge ‘The Historical Dictionary of Irish’, 2 covering the period 1600–2000, could be implemented. Such a link will be hugely beneficial for scholars operating at the intersection of the medieval and modern period (see Table 1), who currently lack a comprehensive lexical resource for the “intermediate” early modern period.This paper stems from research carried out during a Government of Ireland Postgraduate Scholarship (GOIPG/2017/1808) funded by the Irish Research Council. The author would also like to acknowledge the anonymous reviewer for helpful feedback and the editors for seeing this publication through.Peer reviewe

    Automatic morphological analysis and interlinking of historical Irish cognate verb forms

    Get PDF
    The main aim of the author’s research project is to use computational approaches to gain more insight into the historical development of Irish verbs. One of the objectives is to investigate how a link between the electronic Dictionary of the Irish language (eDIL),1 covering the period c. 700–c. 1700, but focussing on Early Irish (7th–12th centuries), and the nascent FoclĂłir StairiĂșil na Gaeilge ‘The Historical Dictionary of Irish’, 2 covering the period 1600–2000, could be implemented. Such a link will be hugely beneficial for scholars operating at the intersection of the medieval and modern period (see Table 1), who currently lack a comprehensive lexical resource for the “intermediate” early modern period.This paper stems from research carried out during a Government of Ireland Postgraduate Scholarship (GOIPG/2017/1808) funded by the Irish Research Council. The author would also like to acknowledge the anonymous reviewer for helpful feedback and the editors for seeing this publication through.Peer reviewe

    Cardamom: Comparative deep models for minority and historical languages

    No full text
    This paper gives an overview of the Cardamom project, which aims to close the resource gap for minority and under-resourced languages by means of deep-learning-based natural language processing (NLP) and exploiting similarities of closely-related languages. The project further extends this idea to historical languages, which can be considered as closely related to their modern form, and as such aims to provide NLP through both space and time for languages that have been ignored by current approaches

    Developing an inflectional lexicon for Old Irish

    No full text
    While Old Irish (c. 600–900 A.D.) is extensively documented, it remains digitally under- resourced, lacking the range of digital resources available for other older Indo-European languages (e.g., Latin, see Pellegrini and Passarotti, 2018). We report on the development of a fully inflected lexicon of Old Irish nouns, provided in both phonemic and orthographic notation. This involved a computer-assisted, systematic, and reproducible grapheme-to- phoneme conversion pipeline and generating morphological forms through a finite-state transducer. The inflected lexicon we develop will better enable computational studies in Old Irish morphology, further research into diachronic developments, and have a wide range of Natural Language Processing (NLP) applications. We began by extracting noun lemmata from the Old Irish Würzburg glosses (Kavanagh, 2001) and the Corpus PalaeoHibernicum (CorPH) ‘Old Irish Corpus’ (Stifter et al., 2021). We then devised a set of rules for orthography-to-phonology conversion, subsequently implemented using the Python package Epitran (Mortensen, Dalmia, and Littell, 2018). The resulting transcriptions act as the data input for a finite-state transducer (FST) adapted from Fransen (2019), allowing us to generate inflected forms of Old Irish nouns. Finally, we derived orthographic forms (and their variants) by applying conversion rules to the generated forms. Old Irish presents considerable challenges for the development of a resource of this nature, given its opaque and inconsistent orthography, complex phonology, elaborate system of morphophonological alternations, and intricate patterns of morphological inflection (Anderson, 2016; Stifter, 2009; Thurneysen, 1946; Pedersen, 1909–1913). We report on how we dealt with these problems in the development of the inflectional lexicon. While this study focused on the Old Irish nouns in the Würzburg glosses, we intend to extend the lexicon by applying this pipeline to further corpora and other parts- of-speech. This inflected lexicon makes possible systematic studies in data-driven morphology and typology (Pellegrini, 2020; Beniamine, Bonami, and Luís, 2021; Beniamine, 2021), and facilitates future research into diachronic and diatopic variation in Irish and the development of further NLP applications for the language. References Anderson, Cormac (2016). “Consonant colour and vocalism in the history of Irish”. PhD thesis. Uniwersytet im. Adama Mickiewicza w Poznaniu. URL: https://hdl.handle.net/10593/14780. Beniamine, Sacha (2021). “One lexeme, many classes: inflection class systems as lattices”. In: One-to-Many Relations. Ed. by Berthold Crysmann and Manfred Sailer. Berlin: Language Science Press. Beniamine, Sacha, Olivier Bonami, and Ana R. Luís (2021). “The fine implicative structure of European Portuguese conjugation”. In: Isogloss 7.9, pp. 1–35. DOI: https://doi.org/10.5565/rev/isogloss.109. Fransen, Theodorus (2019). “Past, present and future: Computational approaches to mapping historical Irish cognate verb forms”. PhD thesis. Trinity College Dublin, The University of Dublin. URL: https://github.com/ThFransen84/OIfst. Kavanagh, Séamus (2001). A Lexicon of the Old Irish Glosses in the Würzburg Manuscript of the Epistles of St. Paul. Ed. by Dagmar S. Wodtko. Mitteilungen der Prähistorischen Kommission 45. + 1 CD-ROM. Wien: Verlag der Österreichischen Akademie der Wissenschaften. DOI: 10.1553/0x0001fb6e. Mortensen, David R., Siddharth Dalmia, and Patrick Littell (May 2018). “Epitran: Precision G2P for Many Languages”. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Ed. by Nicoletta Calzolari (Conference chair) et al. Miyazaki, Japan: European Language Resources Association (ELRA). Pedersen, Holger (1909–1913). Vergleichende Grammatik der keltischen Sprachen. 2 Vols. Göttingen: Vandenhoeck & Ruprecht. Pellegrini, Matteo (2020). “Using LatInfLexi for an Entropy-Based Assessment of Predictability in Latin Inflection”. English. In: Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages. Marseille, France: European Language Resources Association (ELRA), pp. 37–46. URL: https://aclanthology.org/2020.lt4hala-1.6. Pellegrini, Matteo and Marco Passarotti (2018). “LatInfLexi: an Inflected Lexicon of Latin Verbs”. In: Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018) (Turin, Italy, Dec. 10, 2018). Ed. by Elena Cabrio, Alessandro Mazzei, and Fabio Tamburini. Vol. 2253. CEUR Workshop Proceedings. Aachen. URL: http://ceur-ws.org/Vol-2253/paper23.pdf. Stifter, David (2009). “Early Irish”. In: The Celtic Languages. Ed. by Martin Ball and Nicole Müller. Hoboken: Routledge. Stifter, David et al. (2021). Corpus PalaeoHibernicum (CorPH) v1.0. URL: http://chronhib.maynoothuniversity.ie. Thurneysen, Rudolf (1946). A Grammar of Old Irish. Trans. by Daniel A. Binchy and Osborn Bergin. Revised and enlarged edition. Dublin: Dublin Institute for Advanced Studies. Repr. 1993, with supplement

    Automatic morphological parsing of Old Irish verbs using finite-state transducers

    No full text
    The topic of this paper constitutes the main part of a recently finished Ph.D. project carried out by the author which investigates how computational methods can be employed to map cognate verb forms in Early Irish (ca. 7th–12th centuries A.D.) and Modern Irish (ca. 1200 onwards). This paper discusses the development of a finite-state morphological transducer using foma (Hulden, 2009) for the Old Irish language (ca. 7th–9th centuries A.D.), focusing on verbs. Two main challenges are discussed. First, different practices of word segmentation have significant repercussions for the encoding of dependencies both on and beyond the word level. A second challenge is complex verb stem formation and considerable stem allomorphy. This has been tackled by operating with “monolithic stem” entries for each verb lemma, i.e., synchronic, invariable hard-coded stems, representing a semi-surface-level base form

    Towards a computational lexical resource for the diachronic study of Irish verbs

    No full text
    In this paper, we propose a computational framework for a lexical resource that will better facilitate diachronic study of Irish verbs. The verbal system is subject to major morphological changes between Early Irish (c. 7th-12th centuries A.D.) and Modern Irish varieties (post-12th centuries) (McCone 1997). Moreover, whereas the literary output in the Old Irish period (c. 8th-9th centuries A.D.) points to a standardised language (Stifter 2009), all post-Old Irish historical varieties, except for bardic poetry (Early Modern Irish period, c. 13th-17th centuries A.D.), show a substantial degree of grammatical, orthographical and – particularly evident in the case of Early Modern Irish prose (Ó hUiginn 2013) – stylistic variation (cf. contributions in McCone 1994). The available digital support is insufficient to systematically trace the linguistic change and variation. The research described here aims to mitigate the lack of digital support by creating and linking verb forms in morphologically annotated corpora by using a morphological analyser for contemporary, standardised Irish – already in the process of being adapted for successively earlier Modern Irish texts (UíDhonnchadha et al. 2014) – and by developing new tagging tools for Old Irish, to project forward to later forms. This paper will focus on the creation of a morphological analyser for Old Irish using finite- state morphology (Beesley and Karttunen 2003). Recognition rates for an Early Irish sample text and associated findings and challenges will be reported on. The paper concludes with an outlook on the implementation stage of the lexical resource, its benefits and potential further research. We will (a) discuss challenges in morphologically tagging and accurately linking verbal cognates across historical corpora, (b) explore the ways in which this resource can serve and advance (digital) scholarship in historical Irish philology and linguistics, and (c) address more general questions relating to the balance between computational methods and manual work in successfully linking cognate verb forms

    Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-Resource Languages

    No full text
    We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT) of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was conducted as part of the fourth workshop on technologies for machine translation of low resource languages (LoResMT). Parallel corpora is presented and publicly available which includes the following directions: English↔Irish, English↔Marathi, and Taiwanese Sign language↔Traditional Chinese. Training data consists of 8112, 20933 and 128608 segments, respectively. There are additional monolingual data sets for Marathi and English that consist of 21901 segments. The results presented here are based on entries from a total of eight teams. Three teams submitted systems for English↔Irish while five teams submitted systems for English↔Marathi. Unfortunately, there were no systems submissions for the Taiwanese Sign language↔Traditional Chinese task. Maximum system performance was computed using BLEU and follow as 36.0 for English–Irish, 34.6 for Irish–English, 24.2 for English–Marathi, and 31.3 for Marathi–English

    How Computers Can Future-Proof Minority Languages

    No full text
    Dr. Theodorus Fransen & Dr. John McCrae explore how digital language tools can potentially resolve the underrepresentation of minority languages in terms of digital technology and the Web
    corecore