90 research outputs found

    Yet Another Model for Arabic Dialect Identification

    Full text link
    In this paper, we describe a spoken Arabic dialect identification (ADI) model for Arabic that consistently outperforms previously published results on two benchmark datasets: ADI-5 and ADI-17. We explore two architectural variations: ResNet and ECAPA-TDNN, coupled with two types of acoustic features: MFCCs and features exratected from the pre-trained self-supervised model UniSpeech-SAT Large, as well as a fusion of all four variants. We find that individually, ECAPA-TDNN network outperforms ResNet, and models with UniSpeech-SAT features outperform models with MFCCs by a large margin. Furthermore, a fusion of all four variants consistently outperforms individual models. Our best models outperform previously reported results on both datasets, with accuracies of 84.7% and 96.9% on ADI-5 and ADI-17, respectively.Comment: ACCEPTED AT ArabicNLP 202

    A Topological Invariant for Modular Fusion Categories

    Get PDF
    The modular data of a modular category C, consisting of the S-matrix and the T-matrix, is known to be a incomplete invariant of C. More generally, the invariants of framed links and knots defined by a modular category as part of a topological quantum field theory can be viewed as numerical invariants of the category. Among these invariants, we study the invariant defined by the Borromean link colored by three objects. Thus we obtain a tensor that we call B. We derive a formula for the Borromean tensor for the twisted Drinfeld doubles of finite groups. Along with T , it distinguishes the p non-equivalent modular categories of the form Z(Vec^ω_G) for G the non-abelian group of order pq, which are not distinguished by the modular data

    Training Convolutional Neural Networks with the Forward-Forward algorithm

    Full text link
    The recent successes in analyzing images with deep neural networks are almost exclusively achieved with Convolutional Neural Networks (CNNs). The training of these CNNs, and in fact of all deep neural network architectures, uses the backpropagation algorithm where the output of the network is compared with the desired result and the difference is then used to tune the weights of the network towards the desired outcome. In a 2022 preprint, Geoffrey Hinton suggested an alternative way of training which passes the desired results together with the images at the input of the network. This so called Forward Forward (FF) algorithm has up to now only been used in fully connected networks. In this paper, we show how the FF paradigm can be extended to CNNs. Our FF-trained CNN, featuring a novel spatially-extended labeling technique, achieves a classification accuracy of 99.16% on the MNIST hand-written digits dataset. We show how different hyperparameters affect the performance of the proposed algorithm and compare the results with CNN trained with the standard backpropagation approach. Furthermore, we use Class Activation Maps to investigate which type of features are learnt by the FF algorithm.Comment: 19 pages, 9 figure

    ArTST: Arabic Text and Speech Transformer

    Full text link
    We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language. The model architecture follows the unified-modal framework, SpeechT5, that was recently released for English, and is focused on Modern Standard Arabic (MSA), with plans to extend the model for dialectal and code-switched Arabic in future editions. We pre-trained the model from scratch on MSA speech and text data, and fine-tuned it for the following tasks: Automatic Speech Recognition (ASR), Text-To-Speech synthesis (TTS), and spoken dialect identification. In our experiments comparing ArTST with SpeechT5, as well as with previously reported results in these tasks, ArTST performs on a par with or exceeding the current state-of-the-art in all three tasks. Moreover, we find that our pre-training is conducive for generalization, which is particularly evident in the low-resource TTS task. The pre-trained model as well as the fine-tuned ASR and TTS models are released for research use.Comment: 11 pages, 1 figure, SIGARAB ArabicNLP 202

    Efficacy and safety of ferric carboxymaltose in Indian pregnant women with iron deficiency anemia

    Get PDF
    Background: Iron deficiency anemia (IDA) is a significant problem worldwide particularly in women. The aim of the study was to evaluate the effectiveness of intravenous ferric carboxymaltose (FCM) in in Indian pregnant women with anemia.Method: This was a single centre, prospective, observational, open label, clinical study at real life scenario with 4 weeks follow up. Fifty pregnant women with IDA and visiting to the Radhakrishna multispecialty hospital, Bangalore, for antenatal care were enrolled for the study. IV FCM was given as per the standard protocol. Change in the laboratory parameters such as hemoglobin, mean corpuscular volume (MCV), mean corpuscular hemoglobin concentration (MCHC), packed cell volume (PCV) level at baseline and after 4 weeks of completion of parenteral iron therapy was recorded and fatigue score was assessed. The pregnant women were monitored for the adverse events. Results: All pregnant women received a single IV infusion of FCM 1000 mg. A significant increase in the hemoglobin of 2.37±0.51 g/dl (p<0.001) was noted at 4 weeks, MCV rise of 19.89±21.94 (p<0.001) was noted at 4 weeks, MCHC rise was of 2.56±5.65 and PCV rise was of 4.45±2.67 (p<0.011) at over 4 weeks. Significant improvement in fatigue score was observed at 4 weeks after single FCM infusion. No adverse effects were observed in any pregnant woman throughout the duration of the study.Conclusions: This real-life observational study highlights IV FCM is effective in management of IDA in pregnant women and well tolerated. Trial registration number: CTRI/2021/02/030874

    Toward a Common Earth Data Publication Framework

    Get PDF
    Data publication is an essential activity for all data archives. Each of NASA's twelve Distributed Active Archive Centers (DAACs) have established publication workflows which account for the heterogeneous suite of missions, instruments, data providers, and datasets managed within the Earth Observation System Data and Information System (EOSDIS) program. Some aspects of data publication vary across DAACs: workflows range from manual to automatic, terms used to describe publication elements differ, and systems used to publish and manage data vary. Despite these differences, the DAAC data publication processes are generally the same: obtain the data and related information from data providers, describe the data with metadata and documentation, and release the data for access by the user community. In order to improve consistency and reduce the time required to publish data, we have developed a cross-DAAC initiative called the Common Earthdata Publication Framework (Earthdata Pub). Earthdata Pub seeks to: standardize communications and interactions with data providers; identify and standardize common workflows and steps in the data publication process; and design/implement a front-end system with features that include a common web interface, email & status tracking, and common application programming interfaces (APIs) to communicate with various DAAC-specific software components (services and applications) on the back-end. We will present the latest updates on this effort's progress and future plans

    Analysis of expressivity transfer in non-autoregressive end-to-end multispeaker TTS systems

    Get PDF
    International audienceThe main objective of this work is to study the expressivity transfer in a speaker's voice for which no expressive speech data is available in non-autoregressive end-to-end TTS systems. We investigated the expressivity transfer capability of probability density estimation based on deep generative models, namely Generative Flow (Glow) and diffusion probabilistic models (DPM). The usage of deep generative models provides better log likelihood estimates and tractability of the system, subsequently providing high-quality speech synthesis with faster inference speed. Furthermore, we propose the usage of various expressivity encoders, which assist in expressivity transfer in the text-to-speech (TTS) system. More precisely, we used self-attention statistical pooling and multi-scale expressivity encoder architectures for creating a meaningful representation of expressivity. In addition to traditional subjective metrics used for speech synthesis evaluation, we incorporated cosine-similarity to measure the strength of attributes associated with speaker and expressivity. The performance of a non-autoregressive TTS system with a multi-scale expressivity encoder showed better expressivity transfer on Glow and DPM-based decoders. Thus, illustrating the ability of multi-scale architecture to apprehend the underlying attributes of expressivity from multiple acoustic features

    Transfer learning of the expressivity using flow metric learning in multispeaker text-to-speech synthesis

    Get PDF
    International audienceIn this paper, we present a novel deep metric learning architecture along with variational inference incorporated in a paramet-ric multispeaker expressive text-to-speech (TTS) system. We proposed inverse autoregressive flow (IAF) as a way to perform the variational inference, thus providing flexible approximate posterior distribution. The proposed approach condition the text-to-speech system on speaker embeddings so that latent space represents the emotion as semantic characteristics. For representing the speaker, we extracted speaker em-beddings from the x-vector based speaker recognition model trained on speech data from many speakers. To predict the vocoder features, we used the acoustic model conditioned on the textual features as well as on the speaker embedding. We transferred the expressivity by using the mean of the latent variables for each emotion to generate expressive speech in different speaker's voices for which no expressive speech data is available. We compared the results obtained using flow-based variational inference with variational autoencoder as a base-line model. The performance measured by mean opinion score (MOS), speaker MOS, and expressive MOS shows that N-pair loss based deep metric learning along with IAF model improves the transfer of expressivity in the desired speaker's voice in synthesized speech
    corecore