Search CORE

4,764 research outputs found

The UPC Text-to-Speech System for Spanish and Catalan

Author: Bonafonte Cávez Antonio
Esquerra Llucià Ignasi
Febrer A
Rodríguez Fonollosa José Adrián
Vallverdú Bayés Sisco
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 01/01/1998
Field of study

This paper summarizes the text-to-speech system that has been developed in the Speech Group of the Universitat Politècnica de Catalunya (UPC). The system is composed of a core and different interfaces so that it is compatible for research, for telephone applications (either CTI boards or standard ISDN PC cards supporting CAPI), and Windows applications developed using Microsoft SAPI. The paper reviews the system making emphasis in the parts of the system which are language dependent and which allow the reading of bilingual text (Spanish and Catalan). The paper also presents new approaches in prosodic modeling (segmental duration modeling) and generation of the database of speech segments, which have been introduced last year.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

A bilingual Spanish-Catalan database of units for concatenative synthesis

Author: Bonafonte Cávez Antonio
Esquerra Llucià Ignasi
Febrer Godayol Albert
Vallverdú Bayés Sisco
Publication venue
Publication date: 01/01/1998
Field of study

Different databases of phonetic units are required in multilingual Text-to-Speech systems based on concatenative synthesis. We are currently developing a TTS system able to convert text either in Catalan and Spanish, with some of the modules being used indistinctly by the two languages while others are specific to each language. In order to reduce the total amount of units, a bilingual database has been obtained from two monolingual databases recorded by the same speaker, which contains all possible units for both languages. Common units have been selected according to their phonetic representation. The bilingual database has 1099 units, including diphones and some long units, while the two monolingual databases would result in 1545 units. An analysis of Catalan unit frequencies has been done to select what units should be included in the database. The experiments carried out showed that that synthetic speech has a strong Catalan accent, probably due to the speaker's accent. Some common units, even if they are represented with the same symbol, should be considered separately in a bilingual database in order to cope with acoustically different allophones.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Bilingual newsgroups in Catalonia: a challenge for machine translation

Author: Climent Roca Salvador
Moré López Joaquim
Oliver González Antoni
Salvatierra Mallarach Míriam
Sánchez Sáiz Imma
Taulé Delor Mariona
Vallmanya Cucurull Lluïsa
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

This paper presents a linguistic analysis of a corpus of messages written in Catalan and Spanish, which come from several informal newsgroups on the Universitat Oberta de Catalunya (Open University of Catalonia; henceforth, UOC) Virtual Campus. The surrounding environment is one of extensive bilingualism and contact between Spanish and Catalan. The study was carried out as part of the INTERLINGUA project conducted by the UOC's Internet Interdisciplinary Institute (IN3). Its main goal is to ascertain the linguistic characteristics of the e-mail register in the newsgroups in order to assess their implications for the creation of an online machine translation environment. The results shed empirical light on the relevance of characteristics of the e-mail register, the impact of language contact and interference, and their implications for the use of machine translation for CMC data in order to facilitate cross-linguistic communication on the Internet

The Oberta in open access

An Italian to Catalan RBMT system reusing data from existing language pairs

Author: Ginestí-Rosell Mireia
Toral Antonio
Tyers Francis
Publication venue
Publication date: 01/01/2011
Field of study

This paper presents an Italian! Catalan RBMT system automatically built by combining the linguistic data of the existing pairs Spanish–Catalan and Spanish–Italian. A lightweight manual postprocessing is carried out in order to fix inconsistencies in the automatically derived dictionaries and to add very frequent words that are missing according to a corpus analysis. The system is evaluated on the KDE4 corpus and outperforms Google Translate by approximately ten absolute points in terms of both TER and GTM

DCU Online Research Access Service

Why Catalan-Spanish Neural Machine Translation? Analysis, comparison and combination with standard Rule and Phrase-based technologies

Author: Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2017
Field of study

Catalan and Spanish are two related languages given that both derive from Latin. They share similarities in several linguistic levels including morphology, syntax and semantics. This makes them particularly interesting for the MT task. Given the recent appearance and popularity of neural MT, this paper analyzes the performance of this new approach compared to the well-established rule-based and phrase-based MT systems. Experiments are reported on a large database of 180 million words. Results, in terms of standard automatic measures, show that neural MT clearly outperforms the rule-based and phrase-based MT system on in-domain test set, but it is worst in the out-of-domain test set. A naive system combination specially works for the latter. In-domain manual analysis shows that neural MT tends to improve both adequacy and fluency, for example, by being able to generate more natural translations instead of literal ones, choosing to the adequate target word when the source word has several translations and improving gender agreement. However, out-of-domain manual analysis shows how neural MT is more affected by unknown words or contexts.Postprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Perception of Fa by non-native listeners in a study abroad context

Author: Avello Pilar
Mora Joan Carles
Pérez-Vidal Carmen
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/10/2012
Field of study

The present study aims at exploring the under-investigated interface between SA and L2 phonological development by assessing the impact of a 3-month SA programme on the pronunciation of a group of 23 Catalan/Spanish learners of English (NNSs) by means of phonetic measures and perceived FA measures. 6 native speakers (NS) in an exchange programme in Spain provided baseline data for comparison purposes. The participants were recorded performing a reading aloud task before (pre-test) and immediately after (post-test) the SA. Another group of 37 proficient non-native listeners, also bilingual in Catalan/Spanish and trained in English phonetics, assessed the NNS' speech samples for degree of FA. Phonetic measures consisted of pronunciation accuracy scores computed by counting pronunciation errors (phonemic deletions, insertions and substitutions, and stress misplacement). Measures of perceived FA were obtained with two experiments. In experiment 1, the listeners heard a random presentation of the sentences produced by the NSs and by the NNSs at pre-test and post-test and rated them on a 7-point Likert scale for degree of FA (1 = “native” , 7 = “heavy foreign accent”). In experiment 2, they heard paired pre-test/post-test sentences (i.e. produced by the same NNS at pre-test and posttest) and indicated which of the two sounded more native-like. Then, they stated their judgment confidence level on a 7-point scale (1 = “unsure”, 7 = “sure”). Results indicated a slight, non-significant improvement in perceived FA after SA. However, a significant decrease was found in pronunciation accuracy scores after SA. Measures of pronunciation accuracy and FA ratings were also found to be strongly correlated. These findings are discussed in light of the often reported mixed results as regards pronunciation improvement during short-term immersion

Crossref

Biblioteka Nauki - repozytorium artykuÅÃ³w

Repozytorium Uniwersytetu Łódzkiego (University of Lodz Repository)

Report on first selection of resources

Author: Ananiadou Sophia
Bel Nùria
Branco Antonio
Cristea Dan
McNaught John
Meinedo Hugo
Mendes Amalia
Moreno Bilbao M. Asunción
Revilla Espí Eva
Rosner Mike
Thompson Paul
Trancoso Isabel
Trandaba¿ Diana
Tufis Dan
Vivaldi Jorge
Publication venue
Publication date: 01/01/2011
Field of study

The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A prototype machine translation system between Turkmen and Turkish

Author: Adali Esref
Adalı Eşref
Oflazer Kemal
Tantug A. Cuneyd
Tantuğ A. Cüneyd
Publication venue
Publication date: 01/06/2006
Field of study

In this work, we present a prototype system for translation of Turkmen texts into Turkish. Although machine translation (MT) is a very hard task, it is easier to implement a MT system between very close language pairs which have similar syntactic structure and word order. We implement a direct translation system between Turkmen and Turkish which performs a word-to-word transfer. We also use a Turkish Language Model to find the most probable Turkish sentence among all possible candidate translations generated by our system

Sabanci University Research Database

Recommended from our members

"El nen s'ha menjat una aranya": The development of narratives in Catalan speaking children

Author: Anna JONES
Bel
Berman
Gary MORGAN
Griffin
Hemphill
Karmiloff-Smith
Karmiloff-Smith
Labov
Marta CABALLERO
Melina APARICI
Mònica SANZ-TORRENT
Nippold
Pizzuto
Ros HERMAN
Sebastián
Sorace
Stein
To
Tolchinsky
Trabasso
Verhoeven
Westby
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2020
Field of study

The production of a well-constructed narrative is the culmination of several years of language acquisition and is an important milestone in children's development. There is no current description of narrative development for Catalan speaking children. This study collected elicited narratives in Catalan from 118 children aged 4;0-10;11. Narratives were scored for macrostructure and microstructure. Narrative scores improved with age with maximum performance for macrostructure by 9 years. Children's ability to use micro-structural components of Catalan is variable with some developments continuing beyond 9 years. The results are discussed in relation to theoretical arguments about universal and specific features of narrative development. We conclude by highlighting the usefulness of the new test for future language assessment of children acquiring Catalan

City Research Online

Crossref

Western Sydney ResearchDirect

Narrative comprehension and production in children with SLI: An eye movement study

Author: Andreu Barrachina Llorenç
Guàrdia Olmos Joan
MacWhinney Brian
Sanz Torrent Mònica
Publication venue: 'Informa UK Limited'
Publication date: 01/09/2011
Field of study

This study investigates narrative comprehension and production in children with specific language impairment (SLI). Twelve children with SLI (mean age 5; 8 years) and 12 typically developing children (mean age 5; 6 years) participated in an eye-tracking experiment designed to investigate online narrative comprehension and production in Catalan- and Spanish-speaking children with SLI. The comprehension task involved the recording of eye movements during the visual exploration of successive scenes in a story, while listening to the associated narrative. With regard to production, the children were asked to retell the story, while once again looking at the scenes, as their eye movements were monitored. During narrative production, children with SLI look at the most semantically relevant areas of the scenes fewer times than their age-matched controls, but no differences were found in narrative comprehension. Moreover, the analyses of speech productions revealed that children with SLI retained less information and made more semantic and syntactic errors during retelling. Implications for theories that characterize SLI are discussed

The Oberta in open access