18,269 research outputs found
The strategic impact of META-NET on the regional, national and international level
This article provides an overview of the dissemination work carried out in META-NET from 2010 until 2015; we describe its impact on the regional, national and international level, mainly with regard to politics and the funding situation for LT topics. The article documents the initiative's work throughout Europe in order to boost progress and innovation in our field.Peer ReviewedPostprint (author's final draft
Evaluation of innovative computer-assisted transcription and translation strategies for video lecture repositories
Nowadays, the technology enhanced learning area has experienced a strong growth with many new learning approaches like blended learning, flip teaching, massive open online courses, and open educational resources to complement face-to-face lectures. Specifically, video lectures are fast becoming an everyday educational resource in higher education for all of these new learning approaches, and they are being incorporated into existing university curricula around the world.
Transcriptions and translations can improve the utility of these audiovisual assets, but rarely are present due to a lack of cost-effective solutions to do so. Lecture searchability, accessibility to people with impairments, translatability for foreign students, plagiarism detection, content recommendation, note-taking, and discovery of content-related videos are examples of advantages of the presence of transcriptions.
For this reason, the aim of this thesis is to test in real-life case studies ways to obtain multilingual captions for video lectures in a cost-effective way by using state-of-the-art automatic speech recognition and machine translation techniques. Also, we explore interaction protocols to review these automatic transcriptions and translations, because unfortunately automatic subtitles are not error-free. In addition, we take a step further into multilingualism by extending our findings and evaluation to several languages. Finally, the outcomes of this thesis have been applied to thousands of video lectures in European universities and institutions.Hoy en día, el área del aprendizaje mejorado por la tecnología ha experimentado un fuerte crecimiento con muchos nuevos enfoques de aprendizaje como el aprendizaje combinado, la clase inversa, los cursos masivos abiertos en línea, y nuevos recursos educativos abiertos para complementar las clases presenciales. En concreto, los videos docentes se están convirtiendo rápidamente en un recurso educativo cotidiano en la educación superior para todos estos nuevos enfoques de aprendizaje, y se están incorporando a los planes de estudios universitarios existentes en todo el mundo.
Las transcripciones y las traducciones pueden mejorar la utilidad de estos recursos audiovisuales, pero rara vez están presentes debido a la falta de soluciones rentables para hacerlo. La búsqueda de y en los videos, la accesibilidad a personas con impedimentos, la traducción para estudiantes extranjeros, la detección de plagios, la recomendación de contenido, la toma de notas y el descubrimiento de videos relacionados son ejemplos de las ventajas de la presencia de transcripciones.
Por esta razón, el objetivo de esta tesis es probar en casos de estudio de la vida real las formas de obtener subtítulos multilingües para videos docentes de una manera rentable, mediante el uso de técnicas avanzadas de reconocimiento automático de voz y de traducción automática. Además, exploramos diferentes modelos de interacción para revisar estas transcripciones y traducciones automáticas, pues desafortunadamente los subtítulos automáticos no están libres de errores. Además, damos un paso más en el multilingüismo extendiendo nuestros hallazgos y evaluaciones a muchos idiomas. Por último, destacar que los resultados de esta tesis se han aplicado a miles de vídeos docentes en universidades e instituciones europeas.Hui en dia, l'àrea d'aprenentatge millorat per la tecnologia ha experimentat un fort creixement, amb molts nous enfocaments d'aprenentatge com l'aprenentatge combinat, la classe inversa, els cursos massius oberts en línia i nous recursos educatius oberts per tal de complementar les classes presencials. En concret, els vídeos docents s'estan convertint ràpidament en un recurs educatiu quotidià en l'educació superior per a tots aquests nous enfocaments d'aprenentatge i estan incorporant-se als plans d'estudi universitari existents arreu del món.
Les transcripcions i les traduccions poden millorar la utilitat d'aquests recursos audiovisuals, però rara vegada estan presents a causa de la falta de solucions rendibles per fer-ho. La cerca de i als vídeos, l'accessibilitat a persones amb impediments, la traducció per estudiants estrangers, la detecció de plagi, la recomanació de contingut, la presa de notes i el descobriment de vídeos relacionats són un exemple dels avantatges de la presència de transcripcions.
Per aquesta raó, l'objectiu d'aquesta tesi és provar en casos d'estudi de la vida real les formes d'obtenir subtítols multilingües per a vídeos docents d'una manera rendible, mitjançant l'ús de tècniques avançades de reconeixement automàtic de veu i de traducció automàtica. A més a més, s'exploren diferents models d'interacció per a revisar aquestes transcripcions i traduccions automàtiques, puix malauradament els subtítols automàtics no estan lliures d'errades. A més, es fa un pas més en el multilingüisme estenent els nostres descobriments i avaluacions a molts idiomes. Per últim, destacar que els resultats d'aquesta tesi s'han aplicat a milers de vídeos docents en universitats i institucions europees.Valor Miró, JD. (2017). Evaluation of innovative computer-assisted transcription and translation strategies for video lecture repositories [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90496TESI
Privacy in text documents
The process of sensitive data preservation is a manual and a semi-automatic procedure. Sensitive data preservation suffers various problems, in particular, affect the handling of confidential, sensitive and personal information, such as the identification of sensitive data in documents requiring human intervention that is costly and propense to generate error, and the identification of sensitive data in large-scale documents does not allow an approach that depends on human expertise for their identification and relationship. DataSense will be highly exportable software that will enable organizations to identify and understand the sensitive data in their possession in unstructured textual information (digital documents) in order to comply with legal, compliance and security purposes. The goal is to identify and classify sensitive data (Personal Data) present in large-scale structured and non-structured information in a way that allows entities and/or organizations to understand it without calling into question security or confidentiality issues. The DataSense project will be based on European-Portuguese text documents with different approaches of NLP (Natural Language Processing) technologies and the advances in machine learning, such as Named Entity Recognition, Disambiguation, Co-referencing (ARE) and Automatic Learning and Human Feedback. It will also be characterized by the ability to assist organizations in complying with standards such as the GDPR (General Data Protection Regulation), which regulate data protection in the European Union.info:eu-repo/semantics/acceptedVersio
Hybrid rule-based - example-based MT: feeding apertium with sub-sentential translation units
This paper describes a hybrid machine translation (MT) approach that consists of integrating bilingual chunks (sub-sentential translation units) obtained from parallel corpora into an MT system built using the Apertium free/open-source rule-based machine translation platform, which uses a shallow-transfer translation approach. In the integration of bilingual chunks, special care has been
taken so as not to break the application of the existing Apertium structural transfer rules, since this would increase the number of ungrammatical translations. The method consists of (i) the application of a dynamic-programming algorithm to compute the best translation coverage of the input sentence given the collection of bilingual chunks available; (ii) the translation of the input sentence as usual by Apertium; and (iii) the application of a language model to choose one of the possible translations for each of the bilingual chunks detected. Results are reported for the translation from English-to-Spanish, and vice versa, when marker-based bilingual chunks automatically obtained from parallel
corpora are used
Report on first selection of resources
The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin
BioVisualSpeech: Deployment of an Interactive Platform for Speech Therapy Sessions With Children
Sigmatism is a speech sound disorder (SSD) that prevents people from correctly pro-
nouncing sibilant consonant sounds ([Z], [z], [S] and [s]). If left untreated, it can negatively
impact children’s ability to communicate and socialize. Parents are advised to seek speech
therapy for their kids whenever they are not reaching the milestones that are expected of
their age, and while the exercises employed in speech therapy sessions are vital for the
treatment of these disorders, they can also become repetitive.
BioVisualSpeech is a research project that explores ways to provide biofeedback in
speech therapy sessions through the usage of serious games. An example of this is the
BioVisualSpeech Therapy Support Platform, an interactive tool that contains many types
of games in one place, and which children can play in therapy sessions and at home by
using the computer’s microphone to capture their voices. However, because the platform
was developed in an academic context, it was important for us to adapt this system to the
context of real life in collaboration with speech-language pathologists (SLPs).
To achieve this, we set the goal of deploying the platform to SLPs’ computers. For
that we first reengineered the system to turn it into an in-session focused application,
instead of a system where children can practice with SLPs and at home. In addition,
we also integrated Windows Speech Recognition into the platform, made the system
easier to install and capable of collecting data from players, such as voice productions
that could be used in the future to train better classification models, and other objective
parameters concerning game performance. Our deployment with SLPs was accompanied
by the questionnaires, documentation and data collection protocol needed to proceed
with: firstly, the further validation of the platform along with two of its games and,
secondly, the design of a user study focused on gathering voice productions from children.
In the end, not only did we get promising results regarding the validation of the
platform, but SLPs also got the opportunity to own a system that can continue to be used,
and distributed by future researchers, even after the termination of this project.O sigmatismo é uma perturbação da fala que impede quem sofre deste de pronunciar
corretamente as consoantes sibilantes ([Z], [z], [S] and [s]). Se deixado por tratar, este pode
ter um impacto negativo na habilidade das crianças de comunicar e socializar. Pais destas
crianças são aconselhados a procurar consultas de terapia da fala para os seus filhos, e
enquanto que os exercícios utilizados durante as sessões de terapia da fala são vitais para
o tratamento de perturbações, estes também correm o risco de se tornarem repetitivos.
BioVisualSpeech é um projeto de investigação que explora formas de fornecer bio-
feedback em sessões de terapia da fala através de jogos sérios. Um exemplo destes é a
Plataforma de Apoio à Terapia da Fala da BioVisualSpeech, um sistema que contém vários
tipos de jogos que as crianças podem jogar em sessões de terapia e em casa, utilizando
para isso o microfone do computador para capturarem as suas vozes. Contudo, visto
que a plataforma foi desenvolvida num contexto académico, era importante adaptá-la ao
contexto do mundo real em colaboração com terapeutas da fala e da linguagem (TFLs).
Assim, o objetivo desta dissertação foi implantar a plataforma para os computadores
de TFLs. Para isso foi primeiro preciso mudar o foco da plataforma de modo a se tornar
numa aplicação de apoio às sessões de terapia, exclusivamente. Para além disto, também
se integrou o Sistema de Reconhecimento de Voz do Windows na plataforma, tornou-se o
sistema mais fácil de instalar e capaz de recolher dados dos jogadores, como produções
de voz que podem no futuro ser utilizadas para treinar melhores classificadores de fala,
e outros parâmetros objetivos sobre os jogos. A implantação com TFLs foi acompanhada
pelos questionários, documentação e protocolo necessários para proceder com: primeiro,
a validação da plataforma e dois dos seus jogos e, segundo, o desenho de um estudo
focado na recolha de produções de voz de crianças.
No final, não só foram obtidos resultados promissores no que toca à validação da
plataforma, mas os TFLs também tiveram a oportunidade de ficar com um sistema que
pode continuar a ser utilizado mesmo depois deste projeto acabar
TectoMT – a deep-linguistic core of the combined Chimera MT system
Chimera is a machine translation system that combines the TectoMT deep-linguistic core with phrase-based MT system Moses. For English–Czech pair it also uses the Depfix post-correction system. All the components run on Unix/Linux platform and are open source (available from Perl repository CPAN and the LINDAT/CLARIN repository). The main website is https://ufal.mff.cuni.cz/tectomt. The development is currently supported by the QTLeap 7th FP project (http://qtleap.eu)
- …