Search CORE

274 research outputs found

Romanian Language Technology — a view from an academic perspective

Author: Tufiș Dan
Publication venue: 'Agora University of Oradea'
Publication date: 05/01/2022
Field of study

The article reports on research and developments pursued by the Research Institute for Artificial Intelligence "Mihai Draganescu" of the Romanian Academy in order to narrow the gaps identified by the deep analysis on the European languages made by Meta-Net white papers and published by Springer in 2012. Except English, all the European languages needed significant research and development in order to reach an adequate technological level, in line with the expectations and requirements of the knowledge society

Agora University Editing House: Journals

Automatic transcription and phonetic labelling of dyslexic children's reading in Bahasa Melayu

Author: Nik Nurhidayat Nik Him
Publication venue
Publication date: 01/01/2015
Field of study

Automatic speech recognition (ASR) is potentially helpful for children who suffer from dyslexia. Highly phonetically similar errors of dyslexic children‟s reading affect the accuracy of ASR. Thus, this study aims to evaluate acceptable accuracy of ASR using automatic transcription and phonetic labelling of dyslexic children‟s reading in BM. For that, three objectives have been set: first to produce manual transcription and phonetic labelling; second to construct automatic transcription and phonetic labelling using forced alignment; and third to compare between accuracy using automatic transcription and phonetic labelling and manual transcription and phonetic labelling. Therefore, to accomplish these goals methods have been used including manual speech labelling and segmentation, forced alignment, Hidden Markov Model (HMM) and Artificial Neural Network (ANN) for training, and for measure accuracy of ASR, Word Error Rate (WER) and False Alarm Rate (FAR) were used. A number of 585 speech files are used for manual transcription, forced alignment and training experiment. The recognition ASR engine using automatic transcription and phonetic labelling obtained optimum results is 76.04% with WER as low as 23.96% and FAR is 17.9%. These results are almost similar with ASR engine using manual transcription namely 76.26%, WER as low as 23.97% and FAR a 17.9%. As conclusion, the accuracy of automatic transcription and phonetic labelling is acceptable to use it for help dyslexic children learning using ASR in Bahasa Melayu (BM

Universiti Utara Malaysia: UUM eTheses

Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference (LREC2022), 20-25 June 2022, Marseille, France

Author: Chiarcos Christian
Declerck Thierry
Ionov Maxim
McCrae John Philip
Montiel Elena
Publication venue
Publication date: 20/04/2023
Field of study

OPUS Augsburg

Lightweight Adapter Tuning for Multilingual Speech Translation

Author: Besacier Laurent
Gu Jiatao
Le Hang
Pino Juan
Schwab Didier
Wang Changhan
Publication venue
Publication date: 12/07/2021
Field of study

Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP. Adapter tuning consists in freezing pretrained parameters of a model and injecting lightweight modules between layers, resulting in the addition of only a small number of task-specific trainable parameters. While adapter tuning was investigated for multilingual neural machine translation, this paper proposes a comprehensive analysis of adapters for multilingual speech translation (ST). Starting from different pre-trained models (a multilingual ST trained on parallel data or a multilingual BART (mBART) trained on non-parallel multilingual data), we show that adapters can be used to: (a) efficiently specialize ST to specific language pairs with a low extra cost in terms of parameters, and (b) transfer from an automatic speech recognition (ASR) task and an mBART pre-trained model to a multilingual ST task. Experiments show that adapter tuning offer competitive results to full fine-tuning, while being much more parameter-efficient.Comment: Accepted at ACL-IJCNLP 202

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Pre-training for Speech Translation: CTC Meets Optimal Transport

Author: Gong Hongyu
Le Phuong-Hang
Lecouteux Benjamin
Pino Juan
Schwab Didier
Wang Changhan
Publication venue
Publication date: 27/01/2023
Field of study

The gap between speech and text modalities is a major challenge in speech-to-text translation (ST). Different methods have been proposed to reduce this gap, but most of them require architectural changes in ST training. In this work, we propose to mitigate this issue at the pre-training stage, requiring no change in the ST model. First, we show that the connectionist temporal classification (CTC) loss can reduce the modality gap by design. We provide a quantitative comparison with the more common cross-entropy loss, showing that pre-training with CTC consistently achieves better final ST accuracy. Nevertheless, CTC is only a partial solution and thus, in our second contribution, we propose a novel pre-training method combining CTC and optimal transport to further reduce this gap. Our method pre-trains a Siamese-like model composed of two encoders, one for acoustic inputs and the other for textual inputs, such that they produce representations that are close to each other in the Wasserstein space. Extensive experiments on the standard CoVoST-2 and MuST-C datasets show that our pre-training method applied to the vanilla encoder-decoder Transformer achieves state-of-the-art performance under the no-external-data setting, and performs on par with recent strong multi-task learning systems trained with external data. Finally, our method can also be applied on top of these multi-task systems, leading to further improvements for these models. Code and pre-trained models are available at https://github.com/formiel/fairseq.Comment: ICML 2023 (oral presentation). This version fixed URLs, updated affiliations & acknowledgements, and improved formattin

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

The European Language Resources and Technologies Forum: Shaping the Future of the Multilingual Digital Europe

Author: Baroni Paola
Bel N?ria
Budin Gerhard
Calzolari Nicoletta
Choukri Khalid
Goggi Sara
Mariani Joseph
Monachini Monica
Odijk Jan
Piperidis Stelios
Quochi Valeria
Soria Claudia
Toral Antonio
Publication venue: Istituto di Linguistica Computazionale del CNR - Pisa, ITALY
Publication date
Field of study

Proceedings of the 1st FLaReNet Forum on the European Language Resources and Technologies, held in Vienna, at the Austrian Academy of Science, on 12-13 February 2009

PUblication MAnagement

Final FLaReNet deliverable: Language Resources for the Future - The Future of Language Resources

Author: Bel N.
Calzolari N.
Choukri Khalid
LS OZ Taal en spraaktechnologie
Mariani J.
Monachini M.
Odijk J.E.J.M.
Piperidis S
Quochi V.
Soria C.
UiL OTS LLI
Publication venue
Publication date: 01/01/2011
Field of study

Language Technologies (LT), together with their backbone, Language Resources (LR), provide an essential support to the challenge of Multilingualism and ICT of the future. The main task of language technologies is to bridge language barriers and to help creating a new environment where information flows smoothly across frontiers and languages, no matter the country, and the language, of origin. To achieve this goal, all players involved need to act as a community able to join forces on a set of shared priorities. However, until now the field of Language Resources and Technology has long suffered from an excess of individuality and fragmentation, with a lack of coherence concerning the priorities for the field, the direction to move, not to mention a common timeframe. The context encountered by the FLaReNet project was thus represented by an active field needing a coherence that can only be given by sharing common priorities and endeavours. FLaReNet has contributed to the creation of this coherence by gathering a wide community of experts and making them participate in the definition of an exhaustive set of recommendations

PUblication MAnagement

Utrecht University Repository

Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology

Author: De Boer Victor
Hein Carola
Khademi Seyran
Löffler Beate
Mager Tino
Siebes Ronald
van Gemert Jan
Publication venue: Bielefeld University
Publication date: 01/01/2023
Field of study

Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/

Proceedings - University of Groningen

TU Delft Repository

ARTS repository - University of Groningen

Dissertations of the University of Groningen