131 research outputs found

    Parking Assistant

    Get PDF
    Tato práce se zabývá návrhem a realizací parkovacího asistenta. Seznamuje s typy senzorů pro měření vzdálenosti a možnostmi využití kamerového systému. Při realizaci je využíváno ultrazvukových senzorů, konkrétně dálkoměrů SRF08 a webových kamer. Také bylo navrženo a implementováno uživatelské rozhraní, které slučuje údaje z jednotlivých senzorů. Parkovací asistent obsahuje funkci pro detekci hran, zvukovou a grafickou signalizaci vzdálenosti spolu s možností automatického nočního režimu.This thesis deals with design and implementation of parking assistant. It introduces the types of sensors for distance measurement and possibilities of using camera system. In implementation there are used ultrasonic sensors, namely rangefinder SRF08 and web cameras. User interface that combines data from individual sensors was designed and implemented. Parking assistant provides function for edge detection, sound and graphics signalization together with automatic night mode.

    Merged bilingual trees based on Universal Dependencies in Machine Translation

    Get PDF
    In this paper, we present our new experimental system of merging dependency representations of two parallel sentences into one dependency tree. All the inner nodes in dependency tree represent source-target pairs of words, the extra words are in form of leaf nodes. We use Universal Dependencies annotation style, in which the function words, whose usage often differs between languages, are annotated as leaves. The parallel treebank is parsed in minimally supervised way. Unaligned words are there automatically pushed to leaves. We present a simple translation system trained on such merged trees and evaluate it in WMT 2016 English-to-Czech and Czech-to-English translation task. Even though the model is so far very simple and no language model and word-reordering model were used, the Czech-to-English variant reached similar BLEU score as another established tree-based system

    Towards Parallel Czech-Russian Dependency Treebank

    Get PDF
    Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk. NEALT Proceedings Series, Vol. 10 (2010), 44-52. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15893

    Measuring Memorization Effect in Word-Level Neural Networks Probing

    Full text link
    Multiple studies have probed representations emerging in neural networks trained for end-to-end NLP tasks and examined what word-level linguistic information may be encoded in the representations. In classical probing, a classifier is trained on the representations to extract the target linguistic information. However, there is a threat of the classifier simply memorizing the linguistic labels for individual words, instead of extracting the linguistic abstractions from the representations, thus reporting false positive results. While considerable efforts have been made to minimize the memorization problem, the task of actually measuring the amount of memorization happening in the classifier has been understudied so far. In our work, we propose a simple general method for measuring the memorization effect, based on a symmetric selection of comparable sets of test words seen versus unseen in training. Our method can be used to explicitly quantify the amount of memorization happening in a probing setup, so that an adequate setup can be chosen and the results of the probing can be interpreted with a reliability estimate. We exemplify this by showcasing our method on a case study of probing for part of speech in a trained neural machine translation encoder.Comment: Accepted to TSD 2020. Will be published in Springer LNC

    Input Combination Strategies for Multi-Source Transformer Decoder

    Get PDF
    In multi-source sequence-to-sequence tasks, the attention mechanism can be modeled in several ways. This topic has been thoroughly studied on recurrent architectures. In this paper, we extend the previous work to the encoder-decoder attention in the Transformer architecture. We propose four different input combination strategies for the encoder-decoder attention: serial, parallel, flat, and hierarchical. We evaluate our methods on tasks of multimodal translation and translation with multiple source languages. The experiments show that the models are able to use multiple sources and improve over single source baselines.Comment: Published at WMT1

    Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages

    Full text link
    Multilingual language models have recently gained attention as a promising solution for representing multiple languages in a single model. In this paper, we propose new criteria to evaluate the quality of lexical representation and vocabulary overlap observed in sub-word tokenizers. Our findings show that the overlap of vocabulary across languages can be actually detrimental to certain downstream tasks (POS, dependency tree labeling). In contrast, NER and sentence-level tasks (cross-lingual retrieval, NLI) benefit from sharing vocabulary. We also observe that the coverage of the language-specific tokens in the multilingual vocabulary significantly impacts the word-level tasks. Our study offers a deeper understanding of the role of tokenizers in multilingual language models and guidelines for future model developers to choose the most suitable tokenizer for their specific application before undertaking costly model pre-trainingComment: in ACL Findings 202
    corecore