2,992 research outputs found
A Unified Multilingual Handwriting Recognition System using multigrams sub-lexical units
We address the design of a unified multilingual system for handwriting
recognition. Most of multi- lingual systems rests on specialized models that
are trained on a single language and one of them is selected at test time.
While some recognition systems are based on a unified optical model, dealing
with a unified language model remains a major issue, as traditional language
models are generally trained on corpora composed of large word lexicons per
language. Here, we bring a solution by con- sidering language models based on
sub-lexical units, called multigrams. Dealing with multigrams strongly reduces
the lexicon size and thus decreases the language model complexity. This makes
pos- sible the design of an end-to-end unified multilingual recognition system
where both a single optical model and a single language model are trained on
all the languages. We discuss the impact of the language unification on each
model and show that our system reaches state-of-the-art methods perfor- mance
with a strong reduction of the complexity.Comment: preprin
Six degree of freedom manual controls study report
The feasibility of using degree of freedom manual controls in space in an on orbit environment was determined. Several six degree of freedom controls were tested in a laboratory environment, and replica controls were used to control robot arms. The selection of six degrees of freedom as a design goal was based on the fact that six degrees are sufficient to define the location and orientation of a rigid body in space
The Determinacy of Context-Free Games
We prove that the determinacy of Gale-Stewart games whose winning sets are
accepted by real-time 1-counter B\"uchi automata is equivalent to the
determinacy of (effective) analytic Gale-Stewart games which is known to be a
large cardinal assumption. We show also that the determinacy of Wadge games
between two players in charge of omega-languages accepted by 1-counter B\"uchi
automata is equivalent to the (effective) analytic Wadge determinacy. Using
some results of set theory we prove that one can effectively construct a
1-counter B\"uchi automaton A and a B\"uchi automaton B such that: (1) There
exists a model of ZFC in which Player 2 has a winning strategy in the Wadge
game W(L(A), L(B)); (2) There exists a model of ZFC in which the Wadge game
W(L(A), L(B)) is not determined. Moreover these are the only two possibilities,
i.e. there are no models of ZFC in which Player 1 has a winning strategy in the
Wadge game W(L(A), L(B)).Comment: To appear in the Proceedings of the 29 th International Symposium on
Theoretical Aspects of Computer Science, STACS 201
Re-pairing brackets
Consider the following one-player game. Take a well-formed sequence of opening and closing brackets (a Dyck word). As a move, the player can pair any opening bracket with any closing bracket to its right, erasing them. The goal is to re-pair (erase) the entire sequence, and the cost of a strategy is measured by its width: the maximum number of nonempty segments of symbols (separated by blank space) seen during the play.
For various initial sequences, we prove upper and lower bounds on the minimum width sufficient for re-pairing. (In particular, the sequence associated with the complete binary tree of height n admits a strategy of width sub-exponential in log n.) Our two key contributions are (1) lower bounds on the width and (2) their application in automata theory: quasi-polynomial lower bounds on the translation from one-counter automata to Parikh-equivalent nondeterministic finite automata. The latter result answers a question by Atig et al. (2016)
Recovering capitalization and punctuation marks for automatic speech recognition: case study for Portuguese broadcast news
The following material presents a study about recovering punctuation marks, and capitalization information from European Portuguese broadcast news speech transcriptions. Different approaches were tested for capitalization, both generative and discriminative, using: finite state transducers automatically built from language models; and maximum entropy models. Several resources were used, including lexica, written newspaper corpora and speech transcriptions. Finite state transducers produced the best results for written newspaper corpora, but the maximum entropy approach also proved to be a good choice, suitable for the capitalization of speech transcriptions, and allowing straightforward on-the-fly capitalization. Evaluation results are presented both for written newspaper corpora and for broadcast news speech transcriptions. The frequency of each punctuation mark in BN speech transcriptions was analyzed for three different languages: English, Spanish and Portuguese. The punctuation task was performed using a maximum entropy modeling approach, which combines different types of information both lexical and acoustic. The contribution of each feature was analyzed individually and separated results for each focus condition are given, making it possible to analyze the performance differences between planned and spontaneous speech. All results were evaluated on speech transcriptions of a Portuguese broadcast news corpus. The benefits of enriching speech recognition with punctuation and capitalization are shown in an example, illustrating the effects of described experiments into spoken texts.info:eu-repo/semantics/acceptedVersio
N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
In this paper we compare and contrast
two approaches to Machine Translation
(MT): the CMU-UKA Syntax Augmented
Machine Translation system (SAMT) and
UPC-TALP N-gram-based Statistical Machine
Translation (SMT). SAMT is a hierarchical
syntax-driven translation system
underlain by a phrase-based model and a
target part parse tree. In N-gram-based
SMT, the translation process is based on
bilingual units related to word-to-word
alignment and statistical modeling of the
bilingual context following a maximumentropy
framework. We provide a stepby-
step comparison of the systems and report
results in terms of automatic evaluation
metrics and required computational
resources for a smaller Arabic-to-English
translation task (1.5M tokens in the training
corpus). Human error analysis clarifies
advantages and disadvantages of the
systems under consideration. Finally, we
combine the output of both systems to
yield significant improvements in translation
quality.Postprint (published version
- …