14 research outputs found

    The TALP on-line Spanish-Catalan machine-translation system

    Get PDF
    In this paper the statistical machine translator (SMT) between Catalan and Spanish developed at the TALP research center (UPC) and its web demonstration are described.Postprint (published version

    Search engine for multilingual audiovisual contents

    Get PDF
    This paper describes the BUCEADOR search engine, a web server that allows retrieving. multimedia documents (text, audio, video) in different languages. All the documents are translated into the user language and are presented either as text (for instance, subtitles in video documents) or dubbed audio. The user query consist in a sequence of keywords and can be typed or spoken. Multiple Spoken Language Technologies (SLT) servers have been implemented, such as speech recognition, speech machine translation and text-to-speech conversion. The platform can be used in the four Spanish official (Spanish, Basque, Catalan and Galician) and in English.Peer ReviewedPostprint (published version

    BUCEADOR, a multi-language search engine for digital libraries

    Get PDF
    This paper presents a web-based multimedia search engine built within the Buceador (www.buceador.org) research project. A proof-of-concept tool has been implemented which is able to retrieve information from a digital library made of multimedia documents in the 4 official languages in Spain (Spanish, Basque, Catalan and Galician). The retrieved documents are presented in the user language after translation and dubbing (the four previous languages + English). The paper presents the tool functionality, the architecture, the digital library and provide some information about the technology involved in the fields of automatic speech recognition, statistical machine translation, text-to-speech synthesis and information retrieval. Each technology has been adapted to the purposes of the presented tool as well as to interact with the rest of the technologies involved.Peer ReviewedPostprint (published version

    The TALP & I2R SMT Systems for IWSLT 2008

    Get PDF
    This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Polit`ecnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation tasks.Postprint (published version

    Técnicas de Cobertura aplicadas al Sistema de Traducción Automática Neuronal basado en Caracteres

    No full text
    In recent years, Neural Machine Translation (NMT) has achieved state-of-the-art performance in translating from a language; source language, to another; target language. However, many of the proposed methods use word embedding techniques to represent a sentence in the source or target language. Character embedding techniques for this task has been suggested to represent the words in a sentence better. Moreover, recent NMT models use attention mechanism where the most relevant words in a source sentence are used to generate a target word. The problem with this approach is that while some words are translated multiple times, some other words are not translated. To address this problem, coverage model has been integrated into NMT to keep track of already-translated words and focus on the untranslated ones. In this research, we present a new architecture in which we use character embedding for representing the source and target languages, and also use coverage model to make certain that all words are translated. Experiments were performed to compare our model with coverage and character model and the results show that our model performs better than the other two models.En los últimos años, la traducción automática basada en el aprendizaje profundo ha conseguido resultados estado del arte. Sin embargo, muchos de los métodos propuestos utilizan espacios de palabras embebidos para representar una oración en el idioma de origen y destino y esto genera muchos problemas a nivel de cobertura de vocabulario. Avances recientes en la traducción automática basada en aprendizaje profundo incluyen la utilización de caracteres que permite reducir las palabras fuera de vocabulario. Por otro lado, la mayoría de algoritmos de traducción automática basada en aprendizaje profundo usan mecanismos de atención donde las palabras más relevantes en de la oración fuente se utilizan para generar la traducción destino. El problema con este enfoque es que mientras algunas palabras se traducen varias veces, algunas otras palabras no se traducen. Para abordar este problema, usamos el modelo de cobertura que realiza un seguimiento de las palabras ya traducidas y se centra en las no traducidas. En este trabajo, presentamos una nueva arquitectura en la que utilizamos la incorporación de caracteres para representar el lenguaje origen, y también usamos el modelo de cobertura para asegurarnos que la frase origen se traduce en su totalidad. Presentamos experimentos para comparar nuestro modelo que integra el modelo de cobertura y modelo de caracteres. Los resultados muestran que nuestro modelo se comporta mejor que los otros dos modelos.This work is supported by Ministerio de Economía y Competitividad and Fondo Europeo de Desarrollo Regional, through contract TEC2015-69266-P (MINECO/FEDER, UE) and the postdoctoral senior grant Ramón y Cajal

    The TALP on-line Spanish-Catalan machine-translation system

    No full text
    In this paper the statistical machine translator (SMT) between Catalan and Spanish developed at the TALP research center (UPC) and its web demonstration are described

    Search engine for multilingual audiovisual contents

    No full text
    This paper describes the BUCEADOR search engine, a web server that allows retrieving. multimedia documents (text, audio, video) in different languages. All the documents are translated into the user language and are presented either as text (for instance, subtitles in video documents) or dubbed audio. The user query consist in a sequence of keywords and can be typed or spoken. Multiple Spoken Language Technologies (SLT) servers have been implemented, such as speech recognition, speech machine translation and text-to-speech conversion. The platform can be used in the four Spanish official (Spanish, Basque, Catalan and Galician) and in English.Peer Reviewe

    BUCEADOR, a multi-language search engine for digital libraries

    No full text
    This paper presents a web-based multimedia search engine built within the Buceador (www.buceador.org) research project. A proof-of-concept tool has been implemented which is able to retrieve information from a digital library made of multimedia documents in the 4 official languages in Spain (Spanish, Basque, Catalan and Galician). The retrieved documents are presented in the user language after translation and dubbing (the four previous languages + English). The paper presents the tool functionality, the architecture, the digital library and provide some information about the technology involved in the fields of automatic speech recognition, statistical machine translation, text-to-speech synthesis and information retrieval. Each technology has been adapted to the purposes of the presented tool as well as to interact with the rest of the technologies involved.Peer Reviewe

    BUCEADOR, a multi-language search engine for digital libraries

    No full text
    This paper presents a web-based multimedia search engine built within the Buceador (www.buceador.org) research project. A proof-of-concept tool has been implemented which is able to retrieve information from a digital library made of multimedia documents in the 4 official languages in Spain (Spanish, Basque, Catalan and Galician). The retrieved documents are presented in the user language after translation and dubbing (the four previous languages + English). The paper presents the tool functionality, the architecture, the digital library and provide some information about the technology involved in the fields of automatic speech recognition, statistical machine translation, text-to-speech synthesis and information retrieval. Each technology has been adapted to the purposes of the presented tool as well as to interact with the rest of the technologies involved.Peer Reviewe

    Search engine for multilingual audiovisual contents

    No full text
    This paper describes the BUCEADOR search engine, a web server that allows retrieving. multimedia documents (text, audio, video) in different languages. All the documents are translated into the user language and are presented either as text (for instance, subtitles in video documents) or dubbed audio. The user query consist in a sequence of keywords and can be typed or spoken. Multiple Spoken Language Technologies (SLT) servers have been implemented, such as speech recognition, speech machine translation and text-to-speech conversion. The platform can be used in the four Spanish official (Spanish, Basque, Catalan and Galician) and in English.Peer Reviewe
    corecore