274 research outputs found
Romanian Language Technology — a view from an academic perspective
The article reports on research and developments pursued by the Research Institute for Artificial Intelligence "Mihai Draganescu" of the Romanian Academy in order to narrow the gaps identified by the deep analysis on the European languages made by Meta-Net white papers and published by Springer in 2012. Except English, all the European languages needed significant research and development in order to reach an adequate technological level, in line with the expectations and requirements of the knowledge society
Automatic transcription and phonetic labelling of dyslexic children's reading in Bahasa Melayu
Automatic speech recognition (ASR) is potentially helpful for children who suffer
from dyslexia. Highly phonetically similar errors of dyslexic children‟s reading affect the accuracy of ASR. Thus, this study aims to evaluate acceptable accuracy of ASR using automatic transcription and phonetic labelling of dyslexic children‟s reading in BM. For that, three objectives have been set: first to produce manual transcription and phonetic labelling; second to construct automatic transcription and phonetic labelling using forced alignment; and third to compare between accuracy using automatic transcription and phonetic labelling and manual transcription and
phonetic labelling. Therefore, to accomplish these goals methods have been used including manual speech labelling and segmentation, forced alignment, Hidden Markov Model (HMM) and Artificial Neural Network (ANN) for training, and for measure accuracy of ASR, Word Error Rate (WER) and False Alarm Rate (FAR) were used. A number of 585 speech files are used for manual transcription, forced alignment and training experiment. The recognition ASR engine using automatic transcription and phonetic labelling obtained optimum results is 76.04% with WER as low as 23.96% and FAR is 17.9%. These results are almost similar with ASR
engine using manual transcription namely 76.26%, WER as low as 23.97% and FAR a 17.9%. As conclusion, the accuracy of automatic transcription and phonetic labelling is acceptable to use it for help dyslexic children learning using ASR in Bahasa Melayu (BM
Lightweight Adapter Tuning for Multilingual Speech Translation
Adapter modules were recently introduced as an efficient alternative to
fine-tuning in NLP. Adapter tuning consists in freezing pretrained parameters
of a model and injecting lightweight modules between layers, resulting in the
addition of only a small number of task-specific trainable parameters. While
adapter tuning was investigated for multilingual neural machine translation,
this paper proposes a comprehensive analysis of adapters for multilingual
speech translation (ST). Starting from different pre-trained models (a
multilingual ST trained on parallel data or a multilingual BART (mBART) trained
on non-parallel multilingual data), we show that adapters can be used to: (a)
efficiently specialize ST to specific language pairs with a low extra cost in
terms of parameters, and (b) transfer from an automatic speech recognition
(ASR) task and an mBART pre-trained model to a multilingual ST task.
Experiments show that adapter tuning offer competitive results to full
fine-tuning, while being much more parameter-efficient.Comment: Accepted at ACL-IJCNLP 202
Pre-training for Speech Translation: CTC Meets Optimal Transport
The gap between speech and text modalities is a major challenge in
speech-to-text translation (ST). Different methods have been proposed to reduce
this gap, but most of them require architectural changes in ST training. In
this work, we propose to mitigate this issue at the pre-training stage,
requiring no change in the ST model. First, we show that the connectionist
temporal classification (CTC) loss can reduce the modality gap by design. We
provide a quantitative comparison with the more common cross-entropy loss,
showing that pre-training with CTC consistently achieves better final ST
accuracy. Nevertheless, CTC is only a partial solution and thus, in our second
contribution, we propose a novel pre-training method combining CTC and optimal
transport to further reduce this gap. Our method pre-trains a Siamese-like
model composed of two encoders, one for acoustic inputs and the other for
textual inputs, such that they produce representations that are close to each
other in the Wasserstein space. Extensive experiments on the standard CoVoST-2
and MuST-C datasets show that our pre-training method applied to the vanilla
encoder-decoder Transformer achieves state-of-the-art performance under the
no-external-data setting, and performs on par with recent strong multi-task
learning systems trained with external data. Finally, our method can also be
applied on top of these multi-task systems, leading to further improvements for
these models. Code and pre-trained models are available at
https://github.com/formiel/fairseq.Comment: ICML 2023 (oral presentation). This version fixed URLs, updated
affiliations & acknowledgements, and improved formattin
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
The European Language Resources and Technologies Forum: Shaping the Future of the Multilingual Digital Europe
Proceedings of the 1st FLaReNet Forum on the European Language Resources and Technologies, held in Vienna, at the Austrian Academy of Science, on 12-13 February 2009
Final FLaReNet deliverable: Language Resources for the Future - The Future of Language Resources
Language Technologies (LT), together with their backbone, Language Resources (LR), provide an essential support to the challenge of Multilingualism and ICT of the future. The main task of language technologies is to bridge language barriers and to help creating a new environment where information flows smoothly across frontiers and languages, no matter the country, and the language, of origin. To achieve this goal, all players involved need to act as a community able to join forces on a set of shared priorities. However, until now the field of Language Resources and Technology has long suffered from an excess of individuality and fragmentation, with a lack of coherence concerning the priorities for the field, the direction to move, not to mention a common timeframe. The context encountered by the FLaReNet project was thus represented by an active field needing a coherence that can only be given by sharing common priorities and endeavours. FLaReNet has contributed to the creation of this coherence by gathering a wide community of experts and making them participate in the definition of an exhaustive set of recommendations
Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology
Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/
- …