3 research outputs found

    OpusTools and Parallel Corpus Diagnostics

    Get PDF
    12th Edition of its Language Resources and Evaluation Conference was cancelled due to Covid 19 pandemic.This paper introduces OpusTools, a package for downloading and processing parallel corpora included in the OPUS corpus collection. The package implements tools for accessing compressed data in their archived release format and make it possible to easily convert between common formats. OpusTools also includes tools for language identification and data filtering as well as tools for importing data from various sources into the OPUS format. We show the use of these tools in parallel corpus creation and data diagnostics. The latter is especially useful for the identification of potential problems and errors in the extensive data set. Using these tools, we can now monitor the validity of data sets and improve the overall quality and consitency of the data collection.Peer reviewe

    Finn–magyar fordítási párok kinyerése automatikus módszerekkel

    Get PDF

    Doktoranduszok tanulmányai az alkalmazott nyelvészet köréből 2021

    Get PDF
    corecore