4,764 research outputs found
The UPC Text-to-Speech System for Spanish and Catalan
This paper summarizes the text-to-speech system that has been developed in the Speech Group of the Universitat Politècnica de Catalunya (UPC). The system is composed of a core and different interfaces so that it is compatible for research, for telephone applications (either CTI boards or standard ISDN PC cards supporting CAPI), and Windows applications developed using Microsoft SAPI. The paper reviews the system making
emphasis in the parts of the system which are language
dependent and which allow the reading of bilingual text (Spanish
and Catalan). The paper also presents new approaches in prosodic modeling (segmental duration modeling) and generation of the database of speech segments, which have been introduced last year.Peer ReviewedPostprint (published version
A bilingual Spanish-Catalan database of units for concatenative synthesis
Different databases of phonetic units are required in multilingual Text-to-Speech systems based on concatenative synthesis. We are currently developing a TTS system able to convert text either in Catalan and Spanish, with some of the modules being used indistinctly by the two languages while others are specific to each language. In order to reduce the total amount of units, a bilingual database has been obtained from two monolingual databases recorded by the same speaker, which contains all possible units for both languages. Common units have been selected according to their phonetic representation. The bilingual database has 1099 units, including diphones and some long units, while the two monolingual databases would result in 1545 units. An analysis of Catalan unit frequencies has been done to select what units should be included in the database. The experiments carried out showed that that synthetic speech has a strong Catalan accent, probably due to the speaker's accent. Some common units, even if they are represented with the same symbol, should be considered separately in a bilingual database in order to cope with acoustically different allophones.Peer ReviewedPostprint (published version
Bilingual newsgroups in Catalonia: a challenge for machine translation
This paper presents a linguistic analysis of a corpus of messages written in Catalan and Spanish, which come from several informal newsgroups on the Universitat Oberta de Catalunya (Open University of Catalonia; henceforth, UOC) Virtual Campus. The surrounding environment is one of extensive bilingualism and contact between Spanish and Catalan. The study was carried out as part of the INTERLINGUA project conducted by the UOC's Internet Interdisciplinary Institute (IN3). Its main goal is to ascertain the linguistic characteristics of the e-mail register in the newsgroups in order to assess their implications for the creation of an online machine translation environment. The results shed empirical light on the relevance of characteristics of the e-mail register, the impact of language contact and interference, and their implications for the use of machine translation for CMC data in order to facilitate cross-linguistic communication on the Internet
An Italian to Catalan RBMT system reusing data from existing language pairs
This paper presents an Italian! Catalan RBMT system automatically built by combining the linguistic data of the
existing pairs Spanish–Catalan and Spanish–Italian. A lightweight manual postprocessing is carried out in order to
fix inconsistencies in the automatically derived dictionaries and to add very frequent words that are missing according to a corpus analysis. The system is
evaluated on the KDE4 corpus and outperforms Google Translate by approximately ten absolute points in terms of
both TER and GTM
Why Catalan-Spanish Neural Machine Translation? Analysis, comparison and combination with standard Rule and Phrase-based technologies
Catalan and Spanish are two related languages given that both derive from Latin. They share similarities in several linguistic levels including morphology, syntax and semantics. This makes them particularly interesting for the MT task. Given the recent appearance and popularity of neural MT, this paper analyzes the performance of this new approach compared to the well-established rule-based and phrase-based MT systems. Experiments are reported on a large database of 180 million words. Results, in terms of standard automatic measures, show that neural MT clearly outperforms the rule-based and phrase-based MT system on in-domain test set, but it is worst in the out-of-domain test set. A naive system combination specially works for the latter. In-domain manual analysis shows that neural MT tends to improve both adequacy and fluency, for example, by being able to generate more natural translations instead of literal ones, choosing to the adequate target word when the source word has several translations and improving gender agreement. However, out-of-domain manual analysis shows how neural MT is more affected by unknown words or contexts.Postprint (published version
Perception of Fa by non-native listeners in a study abroad context
The present study aims at exploring the under-investigated interface between SA and L2 phonological development by assessing the impact of a 3-month SA programme on the pronunciation of a group of 23 Catalan/Spanish learners of English (NNSs) by means of phonetic measures and perceived FA measures. 6 native speakers (NS) in an exchange programme in Spain provided baseline data for comparison purposes. The participants were recorded performing a reading aloud task before (pre-test) and immediately after (post-test) the SA. Another group of 37 proficient non-native listeners, also bilingual in Catalan/Spanish and trained in English phonetics, assessed the NNS' speech samples for degree of FA. Phonetic measures consisted of pronunciation accuracy scores computed by counting pronunciation errors (phonemic deletions, insertions and substitutions, and stress misplacement). Measures of perceived FA were obtained with two experiments. In experiment 1, the listeners heard a random presentation of the sentences produced by the NSs and by the NNSs at pre-test and post-test and rated them on a 7-point Likert scale for degree of FA (1 = “native” , 7 = “heavy foreign accent”). In experiment 2, they heard paired pre-test/post-test sentences (i.e. produced by the same NNS at pre-test and posttest) and indicated which of the two sounded more native-like. Then, they stated their judgment confidence level on a 7-point scale (1 = “unsure”, 7 = “sure”). Results indicated a slight, non-significant improvement in perceived FA after SA. However, a significant decrease was found in pronunciation accuracy scores after SA. Measures of pronunciation accuracy and FA ratings were also found to be strongly correlated. These findings are discussed in light of the often reported mixed results as regards pronunciation improvement during short-term immersion
Report on first selection of resources
The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin
A prototype machine translation system between Turkmen and Turkish
In this work, we present a prototype system for translation of Turkmen texts into Turkish. Although machine translation (MT) is a very hard task, it is easier to implement a MT system between very close language pairs which have similar syntactic structure and word order. We implement a direct translation system between Turkmen and Turkish which performs a word-to-word transfer. We also use a Turkish Language Model to find the most probable Turkish sentence among all possible candidate translations generated by our system
Recommended from our members
"El nen s'ha menjat una aranya": The development of narratives in Catalan speaking children
The production of a well-constructed narrative is the culmination of several years of language acquisition and is an important milestone in children's development. There is no current description of narrative development for Catalan speaking children. This study collected elicited narratives in Catalan from 118 children aged 4;0-10;11. Narratives were scored for macrostructure and microstructure. Narrative scores improved with age with maximum performance for macrostructure by 9 years. Children's ability to use micro-structural components of Catalan is variable with some developments continuing beyond 9 years. The results are discussed in relation to theoretical arguments about universal and specific features of narrative development. We conclude by highlighting the usefulness of the new test for future language assessment of children acquiring Catalan
Narrative comprehension and production in children with SLI: An eye movement study
This study investigates narrative comprehension and production in children with specific language impairment (SLI). Twelve children with SLI (mean age 5; 8 years) and 12 typically developing children (mean age 5; 6 years) participated in an eye-tracking experiment designed to investigate online narrative comprehension and production in Catalan- and Spanish-speaking children with SLI. The comprehension task involved the recording of eye movements during the visual exploration of successive scenes in a story, while listening to the associated narrative. With regard to production, the children were asked to retell the story, while once again looking at the scenes, as their eye movements were monitored. During narrative production, children with SLI look at the most semantically relevant areas of the scenes fewer times than their age-matched controls, but no differences were found in narrative comprehension. Moreover, the analyses of speech productions revealed that children with SLI retained less information and made more semantic and syntactic errors during retelling. Implications for theories that characterize SLI are discussed
- …