18,269 research outputs found

    The strategic impact of META-NET on the regional, national and international level

    Get PDF
    This article provides an overview of the dissemination work carried out in META-NET from 2010 until 2015; we describe its impact on the regional, national and international level, mainly with regard to politics and the funding situation for LT topics. The article documents the initiative's work throughout Europe in order to boost progress and innovation in our field.Peer ReviewedPostprint (author's final draft

    Evaluation of innovative computer-assisted transcription and translation strategies for video lecture repositories

    Full text link
    Nowadays, the technology enhanced learning area has experienced a strong growth with many new learning approaches like blended learning, flip teaching, massive open online courses, and open educational resources to complement face-to-face lectures. Specifically, video lectures are fast becoming an everyday educational resource in higher education for all of these new learning approaches, and they are being incorporated into existing university curricula around the world. Transcriptions and translations can improve the utility of these audiovisual assets, but rarely are present due to a lack of cost-effective solutions to do so. Lecture searchability, accessibility to people with impairments, translatability for foreign students, plagiarism detection, content recommendation, note-taking, and discovery of content-related videos are examples of advantages of the presence of transcriptions. For this reason, the aim of this thesis is to test in real-life case studies ways to obtain multilingual captions for video lectures in a cost-effective way by using state-of-the-art automatic speech recognition and machine translation techniques. Also, we explore interaction protocols to review these automatic transcriptions and translations, because unfortunately automatic subtitles are not error-free. In addition, we take a step further into multilingualism by extending our findings and evaluation to several languages. Finally, the outcomes of this thesis have been applied to thousands of video lectures in European universities and institutions.Hoy en día, el área del aprendizaje mejorado por la tecnología ha experimentado un fuerte crecimiento con muchos nuevos enfoques de aprendizaje como el aprendizaje combinado, la clase inversa, los cursos masivos abiertos en línea, y nuevos recursos educativos abiertos para complementar las clases presenciales. En concreto, los videos docentes se están convirtiendo rápidamente en un recurso educativo cotidiano en la educación superior para todos estos nuevos enfoques de aprendizaje, y se están incorporando a los planes de estudios universitarios existentes en todo el mundo. Las transcripciones y las traducciones pueden mejorar la utilidad de estos recursos audiovisuales, pero rara vez están presentes debido a la falta de soluciones rentables para hacerlo. La búsqueda de y en los videos, la accesibilidad a personas con impedimentos, la traducción para estudiantes extranjeros, la detección de plagios, la recomendación de contenido, la toma de notas y el descubrimiento de videos relacionados son ejemplos de las ventajas de la presencia de transcripciones. Por esta razón, el objetivo de esta tesis es probar en casos de estudio de la vida real las formas de obtener subtítulos multilingües para videos docentes de una manera rentable, mediante el uso de técnicas avanzadas de reconocimiento automático de voz y de traducción automática. Además, exploramos diferentes modelos de interacción para revisar estas transcripciones y traducciones automáticas, pues desafortunadamente los subtítulos automáticos no están libres de errores. Además, damos un paso más en el multilingüismo extendiendo nuestros hallazgos y evaluaciones a muchos idiomas. Por último, destacar que los resultados de esta tesis se han aplicado a miles de vídeos docentes en universidades e instituciones europeas.Hui en dia, l'àrea d'aprenentatge millorat per la tecnologia ha experimentat un fort creixement, amb molts nous enfocaments d'aprenentatge com l'aprenentatge combinat, la classe inversa, els cursos massius oberts en línia i nous recursos educatius oberts per tal de complementar les classes presencials. En concret, els vídeos docents s'estan convertint ràpidament en un recurs educatiu quotidià en l'educació superior per a tots aquests nous enfocaments d'aprenentatge i estan incorporant-se als plans d'estudi universitari existents arreu del món. Les transcripcions i les traduccions poden millorar la utilitat d'aquests recursos audiovisuals, però rara vegada estan presents a causa de la falta de solucions rendibles per fer-ho. La cerca de i als vídeos, l'accessibilitat a persones amb impediments, la traducció per estudiants estrangers, la detecció de plagi, la recomanació de contingut, la presa de notes i el descobriment de vídeos relacionats són un exemple dels avantatges de la presència de transcripcions. Per aquesta raó, l'objectiu d'aquesta tesi és provar en casos d'estudi de la vida real les formes d'obtenir subtítols multilingües per a vídeos docents d'una manera rendible, mitjançant l'ús de tècniques avançades de reconeixement automàtic de veu i de traducció automàtica. A més a més, s'exploren diferents models d'interacció per a revisar aquestes transcripcions i traduccions automàtiques, puix malauradament els subtítols automàtics no estan lliures d'errades. A més, es fa un pas més en el multilingüisme estenent els nostres descobriments i avaluacions a molts idiomes. Per últim, destacar que els resultats d'aquesta tesi s'han aplicat a milers de vídeos docents en universitats i institucions europees.Valor Miró, JD. (2017). Evaluation of innovative computer-assisted transcription and translation strategies for video lecture repositories [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90496TESI

    Privacy in text documents

    Get PDF
    The process of sensitive data preservation is a manual and a semi-automatic procedure. Sensitive data preservation suffers various problems, in particular, affect the handling of confidential, sensitive and personal information, such as the identification of sensitive data in documents requiring human intervention that is costly and propense to generate error, and the identification of sensitive data in large-scale documents does not allow an approach that depends on human expertise for their identification and relationship. DataSense will be highly exportable software that will enable organizations to identify and understand the sensitive data in their possession in unstructured textual information (digital documents) in order to comply with legal, compliance and security purposes. The goal is to identify and classify sensitive data (Personal Data) present in large-scale structured and non-structured information in a way that allows entities and/or organizations to understand it without calling into question security or confidentiality issues. The DataSense project will be based on European-Portuguese text documents with different approaches of NLP (Natural Language Processing) technologies and the advances in machine learning, such as Named Entity Recognition, Disambiguation, Co-referencing (ARE) and Automatic Learning and Human Feedback. It will also be characterized by the ability to assist organizations in complying with standards such as the GDPR (General Data Protection Regulation), which regulate data protection in the European Union.info:eu-repo/semantics/acceptedVersio

    Hybrid rule-based - example-based MT: feeding apertium with sub-sentential translation units

    Get PDF
    This paper describes a hybrid machine translation (MT) approach that consists of integrating bilingual chunks (sub-sentential translation units) obtained from parallel corpora into an MT system built using the Apertium free/open-source rule-based machine translation platform, which uses a shallow-transfer translation approach. In the integration of bilingual chunks, special care has been taken so as not to break the application of the existing Apertium structural transfer rules, since this would increase the number of ungrammatical translations. The method consists of (i) the application of a dynamic-programming algorithm to compute the best translation coverage of the input sentence given the collection of bilingual chunks available; (ii) the translation of the input sentence as usual by Apertium; and (iii) the application of a language model to choose one of the possible translations for each of the bilingual chunks detected. Results are reported for the translation from English-to-Spanish, and vice versa, when marker-based bilingual chunks automatically obtained from parallel corpora are used

    Report on first selection of resources

    Get PDF
    The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin

    BioVisualSpeech: Deployment of an Interactive Platform for Speech Therapy Sessions With Children

    Get PDF
    Sigmatism is a speech sound disorder (SSD) that prevents people from correctly pro- nouncing sibilant consonant sounds ([Z], [z], [S] and [s]). If left untreated, it can negatively impact children’s ability to communicate and socialize. Parents are advised to seek speech therapy for their kids whenever they are not reaching the milestones that are expected of their age, and while the exercises employed in speech therapy sessions are vital for the treatment of these disorders, they can also become repetitive. BioVisualSpeech is a research project that explores ways to provide biofeedback in speech therapy sessions through the usage of serious games. An example of this is the BioVisualSpeech Therapy Support Platform, an interactive tool that contains many types of games in one place, and which children can play in therapy sessions and at home by using the computer’s microphone to capture their voices. However, because the platform was developed in an academic context, it was important for us to adapt this system to the context of real life in collaboration with speech-language pathologists (SLPs). To achieve this, we set the goal of deploying the platform to SLPs’ computers. For that we first reengineered the system to turn it into an in-session focused application, instead of a system where children can practice with SLPs and at home. In addition, we also integrated Windows Speech Recognition into the platform, made the system easier to install and capable of collecting data from players, such as voice productions that could be used in the future to train better classification models, and other objective parameters concerning game performance. Our deployment with SLPs was accompanied by the questionnaires, documentation and data collection protocol needed to proceed with: firstly, the further validation of the platform along with two of its games and, secondly, the design of a user study focused on gathering voice productions from children. In the end, not only did we get promising results regarding the validation of the platform, but SLPs also got the opportunity to own a system that can continue to be used, and distributed by future researchers, even after the termination of this project.O sigmatismo é uma perturbação da fala que impede quem sofre deste de pronunciar corretamente as consoantes sibilantes ([Z], [z], [S] and [s]). Se deixado por tratar, este pode ter um impacto negativo na habilidade das crianças de comunicar e socializar. Pais destas crianças são aconselhados a procurar consultas de terapia da fala para os seus filhos, e enquanto que os exercícios utilizados durante as sessões de terapia da fala são vitais para o tratamento de perturbações, estes também correm o risco de se tornarem repetitivos. BioVisualSpeech é um projeto de investigação que explora formas de fornecer bio- feedback em sessões de terapia da fala através de jogos sérios. Um exemplo destes é a Plataforma de Apoio à Terapia da Fala da BioVisualSpeech, um sistema que contém vários tipos de jogos que as crianças podem jogar em sessões de terapia e em casa, utilizando para isso o microfone do computador para capturarem as suas vozes. Contudo, visto que a plataforma foi desenvolvida num contexto académico, era importante adaptá-la ao contexto do mundo real em colaboração com terapeutas da fala e da linguagem (TFLs). Assim, o objetivo desta dissertação foi implantar a plataforma para os computadores de TFLs. Para isso foi primeiro preciso mudar o foco da plataforma de modo a se tornar numa aplicação de apoio às sessões de terapia, exclusivamente. Para além disto, também se integrou o Sistema de Reconhecimento de Voz do Windows na plataforma, tornou-se o sistema mais fácil de instalar e capaz de recolher dados dos jogadores, como produções de voz que podem no futuro ser utilizadas para treinar melhores classificadores de fala, e outros parâmetros objetivos sobre os jogos. A implantação com TFLs foi acompanhada pelos questionários, documentação e protocolo necessários para proceder com: primeiro, a validação da plataforma e dois dos seus jogos e, segundo, o desenho de um estudo focado na recolha de produções de voz de crianças. No final, não só foram obtidos resultados promissores no que toca à validação da plataforma, mas os TFLs também tiveram a oportunidade de ficar com um sistema que pode continuar a ser utilizado mesmo depois deste projeto acabar

    TectoMT – a deep-­linguistic core of the combined Chimera MT system

    Get PDF
    Chimera is a machine translation system that combines the TectoMT deep-linguistic core with phrase-based MT system Moses. For English–Czech pair it also uses the Depfix post-correction system. All the components run on Unix/Linux platform and are open source (available from Perl repository CPAN and the LINDAT/CLARIN repository). The main website is https://ufal.mff.cuni.cz/tectomt. The development is currently supported by the QTLeap 7th FP project (http://qtleap.eu)
    corecore