26 research outputs found
Recommended from our members
Vowel Harmony Viewed as Error-Correcting Code
Robustness reduces the risk of information loss. At present the notion of error-correcting codes (ECCs) is used to achieve robustness in technical fields only. Viewing fault-tolerant natural systems as systems equipped with error-correcting codes permits a formal comparison of natural and technical robustness.
Instancing natural language (NL), we show differences in technical and natural error-correcting approaches. By picking a specific grammar phenomenon which some NLs exhibit – vowel harmony (VH) – we show that (1) VH can be formalized as an ECC as well as (2) VH adds to the robustness of its NL. We provide empirical as well as formal evidence on this fact. (3) Consequently, the example of VH shows that the notion of an ECC serves as a suitable formal model not only for technical but also for natural robustness
Numerals and what counts
This study discusses the way different numerals and related expressions are currently annotated in the Universal Dependencies project, with a specific focus on the Uralic language family and only occasional references to the other language groups. We analyse different annotation conventions between individual treebanks, and aim to highlight some areas where further development work and systematization could prove beneficial. At the same time, the Universal Dependencies project already offers a wide range of conventions to mark nuanced variation in numerals and counting expressions, and the harmonization of conventions between different languages could be the next step to take. The discussion here makes specific reference to Universal Dependencies version 2.8, and some differences found may already have been harmonized in version 2.9. Regardless of whether this takes place or not, we believe that the study still forms an important documentation of this period in the project.Peer reviewe
An OCR system for the Unified Northern Alphabet
Partanen N, Rießler M. An OCR system for the Unified Northern Alphabet. In: Pirinen TA, Kaalep H-J, Tyers FM, Association for Computational Linguistics, eds. The fifth International Workshop on Computational Linguistics for Uralic Languages. Tartu: Association for Computational Linguistics; 2019: 77-89.This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98% and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions
Dependency parsing of code-switching data with cross-lingual feature representations
Partanen N, KyungTae L, Rießler M, Poibeau T. Dependency parsing of code-switching data with cross-lingual feature representations. In: Pirinen TA, Rießler M, Rueter J, Trosterud T, Tyers FM, eds. Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages. Helsinki: Association for Computational Linguistics; 2018: 1-17
Relatório de estágio em farmácia comunitária
Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr
Building language technology infrastructures to support a collaborative approach to language resource building
The publication will be available in March 2021.Digital infrastructures are a vital part of support for providing a research framework and platform in engineering their digital lexicography and grammars and deploying the to end-users as real NLP software products