Search CORE

4 research outputs found

Learning to Read by Spelling: Towards Unsupervised Text Recognition

Author: Gupta Ankush
Vedaldi Andrea
Zisserman Andrew
Publication venue
Publication date: 23/09/2018
Field of study

This work presents a method for visual text recognition without using any paired supervisory data. We formulate the text recognition task as one of aligning the conditional distribution of strings predicted from given text images, with lexically valid strings sampled from target corpora. This enables fully automated, and unsupervised learning from just line-level text-images, and unpaired text-string samples, obviating the need for large aligned datasets. We present detailed analysis for various aspects of the proposed method, namely - (1) impact of the length of training sequences on convergence, (2) relation between character frequencies and the order in which they are learnt, (3) generalisation ability of our recognition network to inputs of arbitrary lengths, and (4) impact of varying the text corpus on recognition accuracy. Finally, we demonstrate excellent text recognition accuracy on both synthetically generated text images, and scanned images of real printed books, using no labelled training examples

arXiv.org e-Print Archive

Oxford University Research Archive

Decipherment Complexity in 1:1 Substitution Ciphers

Author: Hermann Ney
Malte Nuhn
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we show that even for the case of 1:1 substitution ciphers—which encipher plaintext symbols by exchanging them with a unique substitute—finding the optimal decipherment with respect to a bigram language model is NP-hard. We show that in this case the decipherment problem is equivalent to the quadratic assignment problem (QAP). To the best of our knowledge, this connection between the QAP and the decipherment problem has not been known in the literature before.

CiteSeerX

Publikationsserver der RWTH Aachen University

Itzulpen automatiko gainbegiratu gabea

Author: Artexe Zurutuza Mikel
Publication venue
Publication date: 29/07/2020
Field of study

192 p.Modern machine translation relies on strong supervision in the form of parallel corpora. Such arequirement greatly departs from the way in which humans acquire language, and poses a major practicalproblem for low-resource language pairs. In this thesis, we develop a new paradigm that removes thedependency on parallel data altogether, relying on nothing but monolingual corpora to train unsupervisedmachine translation systems. For that purpose, our approach first aligns separately trained wordrepresentations in different languages based on their structural similarity, and uses them to initializeeither a neural or a statistical machine translation system, which is further trained through iterative backtranslation.While previous attempts at learning machine translation systems from monolingual corporahad strong limitations, our work¿along with other contemporaneous developments¿is the first to reportpositive results in standard, large-scale settings, establishing the foundations of unsupervised machinetranslation and opening exciting opportunities for future research

Archivo Digital para la Docencia y la Investigación