471 research outputs found

    BUCEADOR, a multi-language search engine for digital libraries

    Get PDF
    This paper presents a web-based multimedia search engine built within the Buceador (www.buceador.org) research project. A proof-of-concept tool has been implemented which is able to retrieve information from a digital library made of multimedia documents in the 4 official languages in Spain (Spanish, Basque, Catalan and Galician). The retrieved documents are presented in the user language after translation and dubbing (the four previous languages + English). The paper presents the tool functionality, the architecture, the digital library and provide some information about the technology involved in the fields of automatic speech recognition, statistical machine translation, text-to-speech synthesis and information retrieval. Each technology has been adapted to the purposes of the presented tool as well as to interact with the rest of the technologies involved.Peer ReviewedPostprint (published version

    Accesibilidad y multilingüismo: un estudio exploratorio sobre la traducción automática de descripciones de audio

    Get PDF
    This article presents the results of an exploratory study which assesses the machine translation of audio descriptions as offering a possible solution to increase accessibility in multilingual environments. Accessibility is understood to encompass two different categories: sensorial accessibility (in this specific case, for the blind and visually impaired, who cannot access the visual content of audiovisual productions), and linguistic accessibility (for those who want to access this content in their own language). The article presents some thoughts on translation as a means of promoting multilingualism, on the feasibility of translating audio descriptions, and on machine translation as applied to this audiovisual translation mode, before summarising the findings of the present study and, most importantly, opening up new potential avenues for research.Este artículo presenta los resultados de un estudio exploratorio que evalúa la traducción automática de audiodescripciones como una posible solución para aumentar la accesibilidad en entornos multilingües. Se entiende que la accesibilidad abarca dos categorías diferentes: accesibilidad sensorial (en este caso específico, para los ciegos y discapacitados visuales, que no pueden acceder al contenido visual de las producciones audiovisuales) y accesibilidad lingüística (para aquellos que quieren acceder a este contenido en su propio idioma). El artículo presenta algunas reflexiones sobre la traducción como medio para promover el multilingüismo, sobre la viabilidad de traducir descripciones de audio y sobre la traducción automática tal y como se aplica a este tipo de traducción audiovisual, antes de sintetizar los hallazgos del presente estudio y, lo que es más importante, abrir la posibilidad de nuevas vías a la investigación

    Multilingual audio information management system based on semantic knowledge in complex environments

    Get PDF
    This paper proposes a multilingual audio information management system based on semantic knowledge in complex environments. The complex environment is defined by the limited resources (financial, material, human, and audio resources); the poor quality of the audio signal taken from an internet radio channel; the multilingual context (Spanish, French, and Basque that is in under-resourced situation in some areas); and the regular appearance of cross-lingual elements between the three languages. In addition to this, the system is also constrained by the requirements of the local multilingual industrial sector. We present the first evolutionary system based on a scalable architecture that is able to fulfill these specifications with automatic adaptation based on automatic semantic speech recognition, folksonomies, automatic configuration selection, machine learning, neural computing methodologies, and collaborative networks. As a result, it can be said that the initial goals have been accomplished and the usability of the final application has been tested successfully, even with non-experienced users.This work is being funded by Grants: TEC201677791-C4 from Plan Nacional de I + D + i, Ministry of Economic Affairs and Competitiveness of Spain and from the DomusVi Foundation Kms para recorder, the Basque Government (ELKARTEK KK-2018/00114, GEJ IT1189-19, the Government of Gipuzkoa (DG18/14 DG17/16), UPV/EHU (GIU19/090), COST ACTION (CA18106, CA15225)

    Multilingual sentiment analysis in social media.

    Get PDF
    252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

    Multilingual sentiment analysis in social media.

    Get PDF
    252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

    Sub-Sync: automatic synchronization of subtitles in the broadcasting of true live programs in spanish

    Get PDF
    Individuals With Sensory Impairment (Hearing Or Visual) Encounter Serious Communication Barriers Within Society And The World Around Them. These Barriers Hinder The Communication Process And Make Access To Information An Obstacle They Must Overcome On A Daily Basis. In This Context, One Of The Most Common Complaints Made By The Television (Tv) Users With Sensory Impairment Is The Lack Of Synchronism Between Audio And Subtitles In Some Types Of Programs. In Addition, Synchronization Remains One Of The Most Significant Factors In Audience Perception Of Quality In Live-Originated Tv Subtitles For The Deaf And Hard Of Hearing. This Paper Introduces The Sub-Sync Framework Intended For Use In Automatic Synchronization Of Audio-Visual Contents And Subtitles, Taking Advantage Of Current Well-Known Techniques Used In Symbol Sequences Alignment. In This Particular Case, These Symbol Sequences Are The Subtitles Produced By The Broadcaster Subtitling System And The Word Flow Generated By An Automatic Speech Recognizing The Procedure. The Goal Of Sub-Sync Is To Address The Lack Of Synchronism That Occurs In The Subtitles When Produced During The Broadcast Of Live Tv Programs Or Other Programs That Have Some Improvised Parts. Furthermore, It Also Aims To Resolve The Problematic Interphase Of Synchronized And Unsynchronized Parts Of Mixed Type Programs. In Addition, The Framework Is Able To Synchronize The Subtitles Even When They Do Not Correspond Literally To The Original Audio And/Or The Audio Cannot Be Completely Transcribed By An Automatic Process. Sub-Sync Has Been Successfully Tested In Different Live Broadcasts, Including Mixed Programs, In Which The Synchronized Parts (Recorded, Scripted) Are Interspersed With Desynchronized (Improvised) Ones

    Language report for Catalan (English version)

    Get PDF
    The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin

    On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism

    Full text link
    Barrón Cedeño, LA. (2012). On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16012Palanci
    • …
    corecore