73 research outputs found

    BME-HAS System for CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

    Get PDF

    Synonym acquisition from translation graph

    Get PDF
    We present a language-independent method for leveraging synonyms from a large translation graph. A new WordNet-based precisionlike measure is introduced

    Building basic vocabulary across 40 languages

    Get PDF
    The paper explores the options for building bilingual dictionaries by automated methods. We define the notion ‘basic vocabulary ’ and investigate how well the conceptual units that make up this language-independent vocabulary are covered by language-specific bindings in 40 languages

    Automatic punctuation restoration with BERT models

    Get PDF
    We present an approach for automatic punctuation restoration with BERT models for English and Hungarian. For English, we conduct our experiments on Ted Talks, a commonly used benchmark for punctuation restoration, while for Hungarian we evaluate our models on the Szeged Treebank dataset. Our best models achieve a macro-averaged F1-score of 79.8 in English and 82.2 in Hungarian. Our code is publicly available

    Building word embeddings from dictionary definitions

    Get PDF

    Investigation of epilithic biofilms in the River Danube

    Get PDF

    Entitásorientált véleménykinyerés magyar nyelven

    Get PDF
    Napjainkban a digitális formában fellelhető, strukturálatlan adatok mennyisége folyamatosan növekszik, ezáltal a bennük említett entitásokra vonatkozó vélemények polaritásának automatizált elemzése is egyre fontosabbá válik. Cikkünkben bemutatunk egy olyan alkalmazást, mely segítségével magyar nyelvű szövegekből lehetséges a tulajdon-, földrajzi- és cégnevekre vonatkozó, részletes szerzői attitűd kinyerése. A forráskódot és a megoldást virtualizált formában is nyilvánosságra hoztuk

    The Role of Interpretable Patterns in Deep Learning for Morphology

    Get PDF
    We examine the role of character patterns in three tasks: morphological analysis, lemmatization and copy. We use a modified version of the standard sequence-to-sequence model, where the encoder is a pattern matching network. Each pattern scores all possible N character long subwords (substrings) on the source side, and the highest scoring subword’s score is used to initialize the decoder as well as the input to the attention mechanism. This method allows learning which subwords of the input are important for generating the output. By training the models on the same source but different target, we can compare what subwords are important for different tasks and how they relate to each other. We define a similarity metric, a generalized form of the Jaccard similarity, and assign a similarity score to each pair of the three tasks that work on the same source but may differ in target. We examine how these three tasks are related to each other in 12 languages. Our code is publicly available
    corecore