28,912 research outputs found
Learning to Read by Spelling: Towards Unsupervised Text Recognition
This work presents a method for visual text recognition without using any
paired supervisory data. We formulate the text recognition task as one of
aligning the conditional distribution of strings predicted from given text
images, with lexically valid strings sampled from target corpora. This enables
fully automated, and unsupervised learning from just line-level text-images,
and unpaired text-string samples, obviating the need for large aligned
datasets. We present detailed analysis for various aspects of the proposed
method, namely - (1) impact of the length of training sequences on convergence,
(2) relation between character frequencies and the order in which they are
learnt, (3) generalisation ability of our recognition network to inputs of
arbitrary lengths, and (4) impact of varying the text corpus on recognition
accuracy. Finally, we demonstrate excellent text recognition accuracy on both
synthetically generated text images, and scanned images of real printed books,
using no labelled training examples
Comparative Analysis of Word Embeddings for Capturing Word Similarities
Distributed language representation has become the most widely used technique
for language representation in various natural language processing tasks. Most
of the natural language processing models that are based on deep learning
techniques use already pre-trained distributed word representations, commonly
called word embeddings. Determining the most qualitative word embeddings is of
crucial importance for such models. However, selecting the appropriate word
embeddings is a perplexing task since the projected embedding space is not
intuitive to humans. In this paper, we explore different approaches for
creating distributed word representations. We perform an intrinsic evaluation
of several state-of-the-art word embedding methods. Their performance on
capturing word similarities is analysed with existing benchmark datasets for
word pairs similarities. The research in this paper conducts a correlation
analysis between ground truth word similarities and similarities obtained by
different word embedding methods.Comment: Part of the 6th International Conference on Natural Language
Processing (NATP 2020
Evaluation of Croatian Word Embeddings
Croatian is poorly resourced and highly inflected language from Slavic
language family. Nowadays, research is focusing mostly on English. We created a
new word analogy corpus based on the original English Word2vec word analogy
corpus and added some of the specific linguistic aspects from Croatian
language. Next, we created Croatian WordSim353 and RG65 corpora for a basic
evaluation of word similarities. We compared created corpora on two popular
word representation models, based on Word2Vec tool and fastText tool. Models
has been trained on 1.37B tokens training data corpus and tested on a new
robust Croatian word analogy corpus. Results show that models are able to
create meaningful word representation. This research has shown that free word
order and the higher morphological complexity of Croatian language influences
the quality of resulting word embeddings.Comment: In review process on LREC 2018 conferenc
Predictive uncertainty in auditory sequence processing
Copyright © 2014 Hansen and Pearce. This is an open-access article distributed under
the terms of the Creative Commons Attribution License (CC BY). The use, distribution
or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance
with accepted academic practice. No use, distribution or reproduction is permitted
which does not comply with these terms
- …