9,439 research outputs found

    РОЗПІЗНАВАННЯ РУКОПИСНОГО ТЕКСТУ НА ОСНОВІ РУХУ ВЕКТОРІВ

    Get PDF
    Article reviewed approaches of using recognition text technologies of handwriting input. The particular requirements of handwriting text input software modules that are used in mobile devices are identified. The designed method and the analysis of the effectiveness of its constituents are presented in this article. According to the results of the experiments found the comparative effectiveness of existing methods to recognize text in handwritten input on mobile devices.В статье проанализированы подходы к распознаванию текста при использовании технологий рукописного вводу. Определены особенности требований к программным модулям рукописного вводу текста, которые используются в мобильных устройствах. Представлено описание разработанного метода и приведен анализ эффективности его составляющих. По результатам экспериментов делается вывод сравнительную эффективность существующих методов распознания текста при рукописном вводе на мобильных устройствах.У статті проаналізовано підходи до розпізнання тексту при використанні технологій рукописного вводу. Визначено особливості вимог до програмних модулів рукописного вводу, які використовуються у мобільних пристроях. Представлено опис розробленого методу та наведено аналіз ефективності його складових. За результатами експериментів робиться висновок про порівняльну ефективність запропонованого та існуючих методів для розпізнання тексту при рукописному вводі на мобільних пристроях

    Multimodal One-Shot Learning of Speech and Images

    Full text link
    Imagine a robot is shown new concepts visually together with spoken tags, e.g. "milk", "eggs", "butter". After seeing one paired audio-visual example per class, it is shown a new set of unseen instances of these objects, and asked to pick the "milk". Without receiving any hard labels, could it learn to match the new continuous speech input to the correct visual instance? Although unimodal one-shot learning has been studied, where one labelled example in a single modality is given per class, this example motivates multimodal one-shot learning. Our main contribution is to formally define this task, and to propose several baseline and advanced models. We use a dataset of paired spoken and visual digits to specifically investigate recent advances in Siamese convolutional neural networks. Our best Siamese model achieves twice the accuracy of a nearest neighbour model using pixel-distance over images and dynamic time warping over speech in 11-way cross-modal matching.Comment: 5 pages, 1 figure, 3 tables; accepted to ICASSP 201

    Image and interpretation using artificial intelligence to read ancient Roman texts

    Get PDF
    The ink and stylus tablets discovered at the Roman Fort of Vindolanda are a unique resource for scholars of ancient history. However, the stylus tablets have proved particularly difficult to read. This paper describes a system that assists expert papyrologists in the interpretation of the Vindolanda writing tablets. A model-based approach is taken that relies on models of the written form of characters, and statistical modelling of language, to produce plausible interpretations of the documents. Fusion of the contributions from the language, character, and image feature models is achieved by utilizing the GRAVA agent architecture that uses Minimum Description Length as the basis for information fusion across semantic levels. A system is developed that reads in image data and outputs plausible interpretations of the Vindolanda tablets

    Learning recurrent representations for hierarchical behavior modeling

    Get PDF
    We propose a framework for detecting action patterns from motion sequences and modeling the sensory-motor relationship of animals, using a generative recurrent neural network. The network has a discriminative part (classifying actions) and a generative part (predicting motion), whose recurrent cells are laterally connected, allowing higher levels of the network to represent high level phenomena. We test our framework on two types of data, fruit fly behavior and online handwriting. Our results show that 1) taking advantage of unlabeled sequences, by predicting future motion, significantly improves action detection performance when training labels are scarce, 2) the network learns to represent high level phenomena such as writer identity and fly gender, without supervision, and 3) simulated motion trajectories, generated by treating motion prediction as input to the network, look realistic and may be used to qualitatively evaluate whether the model has learnt generative control rules
    corecore