Search CORE

2,992 research outputs found

A Unified Multilingual Handwriting Recognition System using multigrams sub-lexical units

Author: Paquet Thierry
Soullard Yann
Swaileh Wassim
Publication venue: 'Elsevier BV'
Publication date: 28/08/2018
Field of study

We address the design of a unified multilingual system for handwriting recognition. Most of multi- lingual systems rests on specialized models that are trained on a single language and one of them is selected at test time. While some recognition systems are based on a unified optical model, dealing with a unified language model remains a major issue, as traditional language models are generally trained on corpora composed of large word lexicons per language. Here, we bring a solution by con- sidering language models based on sub-lexical units, called multigrams. Dealing with multigrams strongly reduces the lexicon size and thus decreases the language model complexity. This makes pos- sible the design of an end-to-end unified multilingual recognition system where both a single optical model and a single language model are trained on all the languages. We discuss the impact of the language unification on each model and show that our system reaches state-of-the-art methods perfor- mance with a strong reduction of the complexity.Comment: preprin

arXiv.org e-Print Archive

HAL - Normandie Université

Six degree of freedom manual controls study report

Author: King M. L.
Lippay A.
Mckinnon G. M.
Publication venue
Publication date
Field of study

The feasibility of using degree of freedom manual controls in space in an on orbit environment was determined. Several six degree of freedom controls were tested in a laboratory environment, and replica controls were used to control robot arms. The selection of six degrees of freedom as a design goal was based on the fact that six degrees are sufficient to define the location and orientation of a rigid body in space

NASA Technical Reports Server

The Determinacy of Context-Free Games

Author: Equipe De Logique Mathématique
Olivier Finkel
Publication venue
Publication date: 06/12/2011
Field of study

We prove that the determinacy of Gale-Stewart games whose winning sets are accepted by real-time 1-counter B\"uchi automata is equivalent to the determinacy of (effective) analytic Gale-Stewart games which is known to be a large cardinal assumption. We show also that the determinacy of Wadge games between two players in charge of omega-languages accepted by 1-counter B\"uchi automata is equivalent to the (effective) analytic Wadge determinacy. Using some results of set theory we prove that one can effectively construct a 1-counter B\"uchi automaton A and a B\"uchi automaton B such that: (1) There exists a model of ZFC in which Player 2 has a winning strategy in the Wadge game W(L(A), L(B)); (2) There exists a model of ZFC in which the Wadge game W(L(A), L(B)) is not determined. Moreover these are the only two possibilities, i.e. there are no models of ZFC in which Player 1 has a winning strategy in the Wadge game W(L(A), L(B)).Comment: To appear in the Proceedings of the 29 th International Symposium on Theoretical Aspects of Computer Science, STACS 201

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Hal-Diderot

Multilingual modelling of cross-lingual spelling variants

Author: Linden Krister
Publication venue
Publication date: 01/01/2006
Field of study

Peer reviewe

CiteSeerX

Helsingin yliopiston digitaalinen arkisto

Re-pairing brackets

Author: Chistikov Dmitry
Vyalyi Mikhail
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/04/2019
Field of study

Consider the following one-player game. Take a well-formed sequence of opening and closing brackets (a Dyck word). As a move, the player can pair any opening bracket with any closing bracket to its right, erasing them. The goal is to re-pair (erase) the entire sequence, and the cost of a strategy is measured by its width: the maximum number of nonempty segments of symbols (separated by blank space) seen during the play. For various initial sequences, we prove upper and lower bounds on the minimum width sufficient for re-pairing. (In particular, the sequence associated with the complete binary tree of height n admits a strategy of width sub-exponential in log n.) Our two key contributions are (1) lower bounds on the width and (2) their application in automata theory: quasi-polynomial lower bounds on the translation from one-counter automata to Parikh-equivalent nondeterministic finite automata. The latter result answers a question by Atig et al. (2016)

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Recovering capitalization and punctuation marks for automatic speech recognition: case study for Portuguese broadcast news

Author: Batista F.
Caseiro D.
Mamede N.
Trancoso I.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

The following material presents a study about recovering punctuation marks, and capitalization information from European Portuguese broadcast news speech transcriptions. Different approaches were tested for capitalization, both generative and discriminative, using: finite state transducers automatically built from language models; and maximum entropy models. Several resources were used, including lexica, written newspaper corpora and speech transcriptions. Finite state transducers produced the best results for written newspaper corpora, but the maximum entropy approach also proved to be a good choice, suitable for the capitalization of speech transcriptions, and allowing straightforward on-the-fly capitalization. Evaluation results are presented both for written newspaper corpora and for broadcast news speech transcriptions. The frequency of each punctuation mark in BN speech transcriptions was analyzed for three different languages: English, Spanish and Portuguese. The punctuation task was performed using a maximum entropy modeling approach, which combines different types of information both lexical and acoustic. The contribution of each feature was analyzed individually and separated results for each focus condition are given, making it possible to analyze the performance differences between planned and spontaneous speech. All results were evaluated on speech transcriptions of a Portuguese broadcast news corpus. The benefits of enriching speech recognition with punctuation and capitalization are shown in an example, illustrating the effects of described experiments into spoken texts.info:eu-repo/semantics/acceptedVersio

Crossref

Repositório Institucional do ISCTE-IUL

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

Author: Khalilov Maxim
Rodríguez Fonollosa José Adrián
Publication venue
Publication date: 30/03/2009
Field of study

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. We provide a stepby- step comparison of the systems and report results in terms of automatic evaluation metrics and required computational resources for a smaller Arabic-to-English translation task (1.5M tokens in the training corpus). Human error analysis clarifies advantages and disadvantages of the systems under consideration. Finally, we combine the output of both systems to yield significant improvements in translation quality.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC