9,439 research outputs found
РОЗПІЗНАВАННЯ РУКОПИСНОГО ТЕКСТУ НА ОСНОВІ РУХУ ВЕКТОРІВ
Article reviewed approaches of using recognition text technologies of handwriting input. The particular requirements of handwriting text input software modules that are used in mobile devices are identified. The designed method and the analysis of the effectiveness of its constituents are presented in this article. According to the results of the experiments found the comparative effectiveness of existing methods to recognize text in handwritten input on mobile devices.В статье проанализированы подходы к распознаванию текста при использовании технологий рукописного вводу. Определены особенности требований к программным модулям рукописного вводу текста, которые используются в мобильных устройствах. Представлено описание разработанного метода и приведен анализ эффективности его составляющих. По результатам экспериментов делается вывод сравнительную эффективность существующих методов распознания текста при рукописном вводе на мобильных устройствах.У статті проаналізовано підходи до розпізнання тексту при використанні технологій рукописного вводу. Визначено особливості вимог до програмних модулів рукописного вводу, які використовуються у мобільних пристроях. Представлено опис розробленого методу та наведено аналіз ефективності його складових. За результатами експериментів робиться висновок про порівняльну ефективність запропонованого та існуючих методів для розпізнання тексту при рукописному вводі на мобільних пристроях
Multimodal One-Shot Learning of Speech and Images
Imagine a robot is shown new concepts visually together with spoken tags,
e.g. "milk", "eggs", "butter". After seeing one paired audio-visual example per
class, it is shown a new set of unseen instances of these objects, and asked to
pick the "milk". Without receiving any hard labels, could it learn to match the
new continuous speech input to the correct visual instance? Although unimodal
one-shot learning has been studied, where one labelled example in a single
modality is given per class, this example motivates multimodal one-shot
learning. Our main contribution is to formally define this task, and to propose
several baseline and advanced models. We use a dataset of paired spoken and
visual digits to specifically investigate recent advances in Siamese
convolutional neural networks. Our best Siamese model achieves twice the
accuracy of a nearest neighbour model using pixel-distance over images and
dynamic time warping over speech in 11-way cross-modal matching.Comment: 5 pages, 1 figure, 3 tables; accepted to ICASSP 201
Image and interpretation using artificial intelligence to read ancient Roman texts
The ink and stylus tablets discovered at the Roman Fort of Vindolanda are a unique resource for scholars of ancient history. However, the stylus tablets have proved particularly difficult to read. This paper describes a system that assists expert papyrologists in the interpretation of the Vindolanda writing tablets. A model-based approach is taken that relies on models of the written form of characters, and statistical modelling of language, to produce plausible interpretations of the documents. Fusion of the contributions from the language, character, and image feature models is achieved by utilizing the GRAVA agent architecture that uses Minimum Description Length as the basis for information fusion across semantic levels. A system is developed that reads in image data and outputs plausible interpretations of the Vindolanda tablets
Learning recurrent representations for hierarchical behavior modeling
We propose a framework for detecting action patterns from motion sequences
and modeling the sensory-motor relationship of animals, using a generative
recurrent neural network. The network has a discriminative part (classifying
actions) and a generative part (predicting motion), whose recurrent cells are
laterally connected, allowing higher levels of the network to represent high
level phenomena. We test our framework on two types of data, fruit fly behavior
and online handwriting. Our results show that 1) taking advantage of unlabeled
sequences, by predicting future motion, significantly improves action detection
performance when training labels are scarce, 2) the network learns to represent
high level phenomena such as writer identity and fly gender, without
supervision, and 3) simulated motion trajectories, generated by treating motion
prediction as input to the network, look realistic and may be used to
qualitatively evaluate whether the model has learnt generative control rules
- …