43 research outputs found

    Corpus-based typology: Applications, challenges and some solutions

    Get PDF
    Over the last few years, the number of corpora that can be used for language comparison has dramatically increased. The corpora are so diverse in their structure, size and annotation style, that a novice might not know where to start. The present paper charts this new and changing territory, providing a few landmarks, warning signs and safe paths. Although no corpora corpus at present can replace the traditional type of typological data based on language description in reference grammars, they corpora can help with diverse tasks, being particularly well suited for investigating probabilistic and gradient properties of languages and for discovering and interpreting cross-linguistic generalizations based on processing and communicative mechanisms. At the same time, the use of corpora for typological purposes has not only advantages and opportunities, but also numerous challenges. This paper also contains an empirical case study addressing two pertinent problems: the role of text types in language comparison and the problem of the word as a comparative concept

    Partitive Determiners, Partitive Pronouns and Partitive Case

    Get PDF
    The fine-grained morpho-syntactic and semantic variation displayed by partitive elements across European languages is far from being well-described, let alone well-understood. This volume focuses on Partitive Determiners, Partitive Pronouns and Partitive Case in European languages, their emergence and spread in diachrony, their acquisition by L2 speakers, and their syntax and interpretation in a cross-theoretical typological perspective

    Partitive Determiners, Partitive Pronouns and Partitive Case

    Get PDF

    Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution

    Get PDF
    Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding

    Forgotten Laxdæla poetry : a study and an edition of Tyrfingur Finnsson's Vísur uppá Laxdæla sögu

    Get PDF
    The paper discusses the metre and the diction of a previously unpublished small poem about characters of Laxdæla saga, composed in 18th century. The stanzas are ostensibly in skaldic dróttkvætt; the analysis shows it to be an imitation of the classical metre, yet a remarkably successful one, implying an extraordinarily good grasp of dróttkvætt poetics on the part of a poet composing several hundred years after the end of the classical dróttkvætt period

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
    corecore