Search CORE

28,912 research outputs found

Learning to Read by Spelling: Towards Unsupervised Text Recognition

Author: Gupta Ankush
Vedaldi Andrea
Zisserman Andrew
Publication venue
Publication date: 10/11/2018
Field of study

This work presents a method for visual text recognition without using any paired supervisory data. We formulate the text recognition task as one of aligning the conditional distribution of strings predicted from given text images, with lexically valid strings sampled from target corpora. This enables fully automated, and unsupervised learning from just line-level text-images, and unpaired text-string samples, obviating the need for large aligned datasets. We present detailed analysis for various aspects of the proposed method, namely - (1) impact of the length of training sequences on convergence, (2) relation between character frequencies and the order in which they are learnt, (3) generalisation ability of our recognition network to inputs of arbitrary lengths, and (4) impact of varying the text corpus on recognition accuracy. Finally, we demonstrate excellent text recognition accuracy on both synthetically generated text images, and scanned images of real printed books, using no labelled training examples

arXiv.org e-Print Archive

Comparative Analysis of Word Embeddings for Capturing Word Similarities

Author: Kalajdjieski Jovan
Stojanovska Frosina
Toshevska Martina
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 07/05/2020
Field of study

Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.Comment: Part of the 6th International Conference on Natural Language Processing (NATP 2020

arXiv.org e-Print Archive

Crossref

Evaluation of Croatian Word Embeddings

Author: Beliga Slobodan
Svoboda Lukas
Publication venue
Publication date: 07/11/2017
Field of study

Croatian is poorly resourced and highly inflected language from Slavic language family. Nowadays, research is focusing mostly on English. We created a new word analogy corpus based on the original English Word2vec word analogy corpus and added some of the specific linguistic aspects from Croatian language. Next, we created Croatian WordSim353 and RG65 corpora for a basic evaluation of word similarities. We compared created corpora on two popular word representation models, based on Word2Vec tool and fastText tool. Models has been trained on 1.37B tokens training data corpus and tested on a new robust Croatian word analogy corpus. Results show that models are able to create meaningful word representation. This research has shown that free word order and the higher morphological complexity of Croatian language influences the quality of resulting word embeddings.Comment: In review process on LREC 2018 conferenc

arXiv.org e-Print Archive

Repository of the University of Rijeka

Predictive uncertainty in auditory sequence processing

Author: Aarden
Bar
Bar
Bharucha
Bigand
Bigand
Bigand
Brown
Bubic
Bunton
Carhart-Harris
Conklin
Conover
Conway
Conway
Creel
CristiÃ
Cuddy
DeLong
Desain
Dienes
Duane
Eerola
Egner
Farrow
Finn
Fiser
Friston
Friston
Friston
Fujioka
Gross
Hale
Hiller
Hiller
Hirsh
Hunt
Huron
Huron
Jonaitis
Jones
Kim
Kirkham
Koelsch
Krumhansl
Krumhansl
Krumhansl
Krumhansl
Krumhansl
Krumhansl
Krumhansl
Loui
Loui
MacKay
Madsen
Madsen
Manning
Margulis
Mathews
Meyer
MÃ¼llensiefen
Narmour
Narmour
NÃ¤Ã¤tÃ¤nen
Omigie
Omigie
Oram
Pearce
Pearce
Pearce
Pearce
Pearce
Pearce
Perruchet
Platt
Rohrmeier
RÃ¼sseler
Saffran
Saffran
Saffran
Saffran
Saffran
Saffran
Saffran
Saffran
Saffran
Sawilowsky
Schaffrath
Schaffrath
Schellenberg
Schellenberg
Schmuckler
Shannon
Siromoney
Steiger
Stiles
Sun
Swait
Taupin
Temperley
Tillmann
Tillmann
Tillmann
Tillmann
Tillmann
Tillmann
Toiviainen
Toro
von Helmholtz
Vuust
Vuust
Vuust
Willingham
Wolpert
Youngblood
Zanten
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

Copyright © 2014 Hansen and Pearce. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Queen Mary Research Online