585 research outputs found
Example-based machine translation of the Basque language
Basque is both a minority and a highly inflected language with free order of sentence constituents. Machine Translation of Basque is thus both a real need and a test bed for MT techniques. In this paper, we present a modular Data-Driven MT system which includes different chunkers as well as chunk aligners which can deal with the free order of sentence constituents of Basque. We conducted Basque to English translation experiments, evaluated on a large corpus
(270, 000 sentence pairs). The experimental results show that our system significantly outperforms state-of-the-art
approaches according to several common automatic evaluation metrics
Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking
The natural language generation (NLG) component of a spoken dialogue system
(SDS) usually needs a substantial amount of handcrafting or a well-labeled
dataset to be trained on. These limitations add significantly to development
costs and make cross-domain, multi-lingual dialogue systems intractable.
Moreover, human languages are context-aware. The most natural response should
be directly learned from data rather than depending on predefined syntaxes or
rules. This paper presents a statistical language generator based on a joint
recurrent and convolutional neural network structure which can be trained on
dialogue act-utterance pairs without any semantic alignments or predefined
grammar trees. Objective metrics suggest that this new model outperforms
previous methods under the same experimental conditions. Results of an
evaluation by human judges indicate that it produces not only high quality but
linguistically varied utterances which are preferred compared to n-gram and
rule-based systems.Comment: To be appear in SigDial 201
EUSMT: incorporating linguistic information to SMT for a morphologically rich language. Its use in SMT-RBMT-EBMT hybridation
148 p.: graf.This thesis is defined in the framework of machine translation for Basque. Having developed a Rule-Based Machine Translation (RBMT) system for Basque in the IXA group (Mayor, 2007), we decided to tackle the Statistical Machine Translation (SMT) approach and experiment on how we could adapt it to the peculiarities of the Basque language.
First, we analyzed the impact of the agglutinative nature of Basque and the best way to deal with it. In order to deal with the problems presented above, we have split up Basque words into the lemma and some tags which represent the morphological information expressed by the inflection. By dividing each Basque word in this way, we aim to reduce the sparseness produced by the agglutinative nature of Basque and the small amount of training data.
Similarly, we also studied the differences in word order between Spanish and Basque, examining different techniques for dealing with them. we confirm the weakness of the basic SMT in dealing with great word order differences in the source and target languages. Distance-based reordering, which is the technique used by the baseline system, does not have enough information to properly handle great word order differences, so any of the techniques tested in this work (based on both statistics and manually generated rules) outperforms the baseline.
Once we had obtained a more accurate SMT system, we started the first attempts to combine different MT systems into a hybrid one that would allow us to get the best of the different paradigms. The hybridization attempts carried out in this PhD dissertation are preliminaries, but, even so, this work can help us to determine the ongoing steps.
This thesis is defined in the framework of machine translation for Basque. Having developed a Rule-Based Machine Translation (RBMT) system for Basque in the IXA group (Mayor, 2007), we decided to tackle the Statistical Machine Translation (SMT) approach and experiment on how we could adapt it to the peculiarities of the Basque language.
First, we analyzed the impact of the agglutinative nature of Basque and the best way to deal with it. In order to deal with the problems presented above, we have split up Basque words into the lemma and some tags which represent the morphological information expressed by the inflection. By dividing each Basque word in this way, we aim to reduce the sparseness produced by the agglutinative nature of Basque and the small amount of training data.
Similarly, we also studied the differences in word order between Spanish and Basque, examining different techniques for dealing with them. we confirm the weakness of the basic SMT in dealing with great word order differences in the source and target languages. Distance-based reordering, which is the technique used by the baseline system, does not have enough information to properly handle great word order differences, so any of the techniques tested in this work (based on both statistics and manually generated rules) outperforms the baseline.
Once we had obtained a more accurate SMT system, we started the first attempts to combine different MT systems into a hybrid one that would allow us to get the best of the different paradigms. The hybridization attempts carried out in this PhD dissertation are preliminaries, but, even so, this work can help us to determine the ongoing steps.Eusko Jaurlaritzaren ikertzaileak prestatzeko beka batekin (BFI05.326)eginda
Recommended from our members
Component processes of early reading, spelling, and narrative writing skills in Turkish: a longitudinal study
The study examined: (a) the role of phonological, grammatical, and rapid automatized naming (RAN) skills in reading and spelling development; and (b) the component processes of early narrative writing skills. Fifty-seven Turkish-speaking children were followed from Grade 1 to Grade 2. RAN was the most powerful longitudinal predictor of reading speed and its effect was evident even when previous reading skills were taken into account. Broadly, the phonological and grammatical skills made reliable contributions to spelling performance but their effects were completely mediated by previous spelling skills. Different aspects of the narrative writing skills were related to different processing skills. While handwriting speed predicted writing fluency, spelling accuracy predicted spelling error rate. Vocabulary and working memory were the only reliable longitudinal predictors of the quality of composition content. The overall model, however, failed to explain any reliable variance in the structural quality of the composition
Multilingual audio information management system based on semantic knowledge in complex environments
This paper proposes a multilingual audio information management system based on semantic knowledge in complex environments. The complex environment is defined by the limited resources (financial, material, human, and audio resources); the poor quality of the audio signal taken from an internet radio channel; the multilingual context (Spanish, French, and Basque that is in under-resourced situation in some areas); and the regular appearance of cross-lingual elements between the three languages. In addition to this, the system is also constrained by the requirements of the local multilingual industrial sector. We present the first evolutionary system based on a scalable architecture that is able to fulfill these specifications with automatic adaptation based on automatic semantic speech recognition, folksonomies, automatic configuration selection, machine learning, neural computing methodologies, and collaborative networks. As a result, it can be said that the initial goals have been accomplished and the usability of the final application has been tested successfully, even with non-experienced users.This work is being funded by Grants: TEC201677791-C4 from Plan Nacional de I + D + i, Ministry of Economic Affairs and Competitiveness of Spain and from the DomusVi Foundation Kms para recorder, the Basque Government (ELKARTEK KK-2018/00114, GEJ IT1189-19, the Government of Gipuzkoa (DG18/14 DG17/16), UPV/EHU (GIU19/090), COST ACTION (CA18106, CA15225)
Recommended from our members
English compound and non-compound processing in bilingual and multilingual speakers: effects of dominance and sequential multilingualism
This article reports on a study investigating the relative influence of the first and dominant language on L2 and L3 morpho-lexical processing. A lexical decision task compared the responses to English NV-er compounds (e.g., taxi driver) and non-compounds provided by a group of native speakers and three groups of learners at various levels of English proficiency: L1 Spanish-L2 English sequential bilinguals and two groups of early Spanish-Basque bilinguals with English as their L3. Crucially, the two trilingual groups differed in their first and dominant language (i.e., L1 Spanish-L2 Basque vs. L1 Basque-L2 Spanish). Our materials exploit an (a)symmetry between these languages: while Basque and English pattern together in the basic structure of (productive) NV-er compounds, Spanish presents a construction that differs in directionality as well as inflection of the verbal element (V[3SG] + N). Results show between and within group differences in accuracy and response times that may be ascribable to two factors besides proficiency: the number of languages spoken by a given participant and their dominant language. An examination of response bias reveals an influence of the participants' first and dominant language on the processing of NV-er compounds. Our data suggest that morphological information in the nonnative lexicon may extend beyond morphemic structure and that, similarly to bilingualism, there are costs to sequential multilingualism in lexical retrieval
Inquiries into the lexicon-syntax relations in Basque
Index:- Foreword. B. Oyharçabal.- Morphosyntactic disambiguation and shallow parsing in computational processing in Basque. I. Aduriz, A. Díaz de Ilarraza.- The transitivity of borrowed verbs in Basque: an outline. X. Alberdi.- Patrixa: a unification-based parser for Basque and its application to the automatic analysis of verbs. I. Aldezabal, M. J. Aranzabe, A. Atutxa, K.Gojenola, K, Sarasola.- Learning argument/adjunct distinction for Basque. I. Aldezabal, M. J. Aranzabe, K. Gojenola, K, Sarasola, A. Atutxa.- Analyzing verbal subcategorization aimed at its computation application. I. Aldezabal, P. Goenaga.- Automatic extraction of verb paterns from “hauta-lanerako euskal hiztegia”. J. M. Arriola, X. Artola, A. Soroa.- The case of an enlightening, provoking an admirable Basque derivational siffux with implications for the theory of argument structure. X. Artiagoitia.- Verb-deriving processes in Basque. J. C. Odriozola.- Lexical causatives and causative alternation in Basque. B. Oyharçabal.- Causation and semantic control; diagnosis of incorrect use in minorized languages. I. Zabala.- Subject index.- Contributions
- …