83 research outputs found

    Masked Autoencoders Are Articulatory Learners

    Full text link
    Articulatory recordings track the positions and motion of different articulators along the vocal tract and are widely used to study speech production and to develop speech technologies such as articulatory based speech synthesizers and speech inversion systems. The University of Wisconsin X-Ray microbeam (XRMB) dataset is one of various datasets that provide articulatory recordings synced with audio recordings. The XRMB articulatory recordings employ pellets placed on a number of articulators which can be tracked by the microbeam. However, a significant portion of the articulatory recordings are mistracked, and have been so far unsuable. In this work, we present a deep learning based approach using Masked Autoencoders to accurately reconstruct the mistracked articulatory recordings for 41 out of 47 speakers of the XRMB dataset. Our model is able to reconstruct articulatory trajectories that closely match ground truth, even when three out of eight articulators are mistracked, and retrieve 3.28 out of 3.4 hours of previously unusable recordings

    Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables

    Full text link
    The performance of deep learning models depends significantly on their capacity to encode input features efficiently and decode them into meaningful outputs. Better input and output representation has the potential to boost models' performance and generalization. In the context of acoustic-to-articulatory speech inversion (SI) systems, we study the impact of utilizing speech representations acquired via self-supervised learning (SSL) models, such as HuBERT compared to conventional acoustic features. Additionally, we investigate the incorporation of novel tract variables (TVs) through an improved geometric transformation model. By combining these two approaches, we improve the Pearson product-moment correlation (PPMC) scores which evaluate the accuracy of TV estimation of the SI system from 0.7452 to 0.8141, a 6.9% increase. Our findings underscore the profound influence of rich feature representations from SSL models and improved geometric transformations with target TVs on the enhanced functionality of SI systems

    Enhancing Speech Articulation Analysis using a Geometric Transformation of the X-ray Microbeam Dataset

    Full text link
    Accurate analysis of speech articulation is crucial for speech analysis. However, X-Y coordinates of articulators strongly depend on the anatomy of the speakers and the variability of pellet placements, and existing methods for mapping anatomical landmarks in the X-ray Microbeam Dataset (XRMB) fail to capture the entire anatomy of the vocal tract. In this paper, we propose a new geometric transformation that improves the accuracy of these measurements. Our transformation maps anatomical landmarks' X-Y coordinates along the midsagittal plane onto six relative measures: Lip Aperture (LA), Lip Protusion (LP), Tongue Body Constriction Location (TTCL), Degree (TBCD), Tongue Tip Constriction Location (TTCL) and Degree (TTCD). Our novel contribution is the extension of the palate trace towards the inferred anterior pharyngeal line, which improves measurements of tongue body constriction

    Audio Data Augmentation for Acoustic-to-articulatory Speech Inversion using Bidirectional Gated RNNs

    Full text link
    Data augmentation has proven to be a promising prospect in improving the performance of deep learning models by adding variability to training data. In previous work with developing a noise robust acoustic-to-articulatory speech inversion system, we have shown the importance of noise augmentation to improve the performance of speech inversion in noisy speech. In this work, we compare and contrast different ways of doing data augmentation and show how this technique improves the performance of articulatory speech inversion not only on noisy speech, but also on clean speech data. We also propose a Bidirectional Gated Recurrent Neural Network as the speech inversion system instead of the previously used feed forward neural network. The inversion system uses mel-frequency cepstral coefficients (MFCCs) as the input acoustic features and six vocal tract-variables (TVs) as the output articulatory features. The Performance of the system was measured by computing the correlation between estimated and actual TVs on the U. Wisc. X-ray Microbeam database. The proposed speech inversion system shows a 5% relative improvement in correlation over the baseline noise robust system for clean speech data. The pre-trained model, when adapted to each unseen speaker in the test set, improves the average correlation by another 6%.Comment: EUSIPCO 202

    Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

    Full text link
    Recent advancements in Automatic Speech Recognition (ASR) systems, exemplified by Whisper, have demonstrated the potential of these systems to approach human-level performance given sufficient data. However, this progress doesn't readily extend to ASR for children due to the limited availability of suitable child-specific databases and the distinct characteristics of children's speech. A recent study investigated leveraging the My Science Tutor (MyST) children's speech corpus to enhance Whisper's performance in recognizing children's speech. They were able to demonstrate some improvement on a limited testset. This paper builds on these findings by enhancing the utility of the MyST dataset through more efficient data preprocessing. We reduce the Word Error Rate (WER) on the MyST testset 13.93% to 9.11% with Whisper-Small and from 13.23% to 8.61% with Whisper-Medium and show that this improvement can be generalized to unseen datasets. We also highlight important challenges towards improving children's ASR performance. The results showcase the viable and efficient integration of Whisper for effective children's speech recognition

    Phenotypic and Genotypic Identification of Vancomycin Resistant Enterococci from Different Sources

    Get PDF
    Enterococci are reservoirs for transmission of the most clinically important antimicrobial resistances such as vancomycin resistance. Therefore, this work aimed to determine the occurrence of enterococci and their respective vancomycine resistance genes (vanA and vanB) from different sources. Two hundred and twenty-four samples from chickens, turkey, fish and human urine, as well as, two types of human food including milk (raw and milk from mastitic animals) and sausage were tested for isolation of Enterococcus species. The isolates were identified morphologically and biochemically using catalase test, sodium chloride tolerance and growth at pH 9.6 and 10- 45˚C. The vancomycin resistance profile of the isolates was verified by both disc diffusion and agar dilution methods. The genotypic enterococcal identification at both genus and species levels and their vancomycine resistance genes were also ascertained using PCR amplification of the respective genes for 28 isolates. Enterococci isolation rate was 70% of the examined samples with a higher percentage of vancomycine resistance (53.5%) and the minimum inhibitory concentrations (MICs) ranged from 16 to 512 µg/mL. Molecular identification of 28 enterococcal isolates revealed the dominance of E. faecalis (42.8%) and clarified a higher proportion of vanA (78.5%) and vanB (67.8%) genes. In conclusion, administration of the antimicrobials mainly vancomycin may be considered as a pronounced stress factor in the veterinary and human practices. In addition, VRE can act as a reservoir for vancomycin resistance

    Two Levels of Palmitic Acid-Enriched Fat Supplement Affect Lactational Performance of Holstein Cows and Feed Utilization of Barki Sheep

    Get PDF
    The effect of feeding palmitic acid-enriched protected fat (PPF) supplement at two levels to increase energy density of diets was tested. In experiment 1, 21 multiparous lactating Holstein cows were fed on a basal diet without PPF supplementation (Control) or supplemented with 250 g (MG250) or 500 g PPF (MG500) for 13 weeks. In experiment 2, 12 adult Barki sheep were fed a basal diet without PAF supplementation (Control), or supplemented with 25 g (ME25), or 50 g of PPF (ME50 treatment) for 1 month. In experiment 1, MG250 treatment increased (

    Two Levels of Palmitic Acid-Enriched Fat Supplement Affect Lactational Performance of Holstein Cows and Feed Utilization of Barki Sheep

    Get PDF
    The effect of feeding palmitic acid-enriched protected fat (PPF) supplement at two levels to increase energy density of diets was tested. In experiment 1, 21 multiparous lactating Holstein cows were fed on a basal diet without PPF supplementation (Control) or supplemented with 250 g (MG250) or 500 g PPF (MG500) for 13 weeks. In experiment 2, 12 adult Barki sheep were fed a basal diet without PAF supplementation (Control), or supplemented with 25 g (ME25), or 50 g of PPF (ME50 treatment) for 1 month. In experiment 1, MG250 treatment increased (

    Case Study in Refractory Non-Hodgkin's Lymphoma: Successful Treatment with Plerixafor

    Get PDF
    The present case study describes our experience in treating a young woman diagnosed with a relapsing case of diffuse large cell lymphoma, who was heavily pre-treated with chemotherapy and radiotherapy. Our only chance to improve her survival was by using high-dose chemotherapy, followed by peripheral stem cell rescue. Unfortunately, in this patient, collecting sufficient stem cells for bone marrow transplantation proved to be very difficult since she had already been heavily treated with chemotherapy and radiotherapy. Currently, granulocyte colony-stimulating factor (G-CSF) alone or G-CSF plus chemotherapy are the most commonly used treatments for stem cell mobilization. However, 5–30% of patients do not respond to these agents. Plerixafor is a new hematopoietic stem cell-mobilizing drug that antagonizes the binding of chemokine stromal cell-derived factor-1α to CXC chemokine receptor 4. It is indicated in combination with G-CSF to mobilize hematopoietic stem cells to the peripheral blood for collection and subsequent autologous transplantation in patients with non-Hodgkin's lymphoma and multiple myeloma [Kessans et al.: Pharmacotherapy 2010;30:485–492; Jantunen: Expert Opin Biol Ther 2011;11:1241–1248]. Based on our findings, we consider plerixafor to be a very efficient and practical solution to mobilize and collect stem cells among all patients in such a situation, enabling us to proceed to autologous bone marrow transplantation and peripheral stem cell rescue in order to improve the patients’ overall survival

    Thymoquinone inhibits growth of human medulloblastoma cells by inducing oxidative stress and caspase-dependent apoptosis while suppressing NF-jB signaling and IL-8 expression

    Get PDF
    Medulloblastoma (MB) is the most common malignant brain tumor of childhood. The transcription factor NF-κB is overexpressed in human MB and is a critical factor for MB tumor growth. NF-κB is known to regulate the expression of interleukin-8 (IL-8), the chemokine that enhances cancer cell growth and resistance to chemotherapy. We have recently shown that thymoquinone (TQ) suppresses growth of hepatocellular carcinoma cells in part by inhibiting NF-κB signaling. Here we sought to extend these studies in MB cells and show that TQ suppresses growth of MB cells in a dose- and time-dependent manner, causes G2M cell cycle arrest, and induces apoptosis. TQ significantly increased generation of reactive oxygen species (ROS), while pretreatment of MB cells with the ROS scavenger N-acetylcysteine (NAC) abrogated TQ-induced cell death and apoptosis, suggesting that TQ-induced cell death and apoptosis are oxidative stress-mediated. TQ inhibitory effects were associated with inhibition of NF-κB and altered expression of its downstream effectors IL-8 and its receptors, the anti-apoptotic Bcl-2, Bcl-xL, X-IAP, and FLIP, as well as the pro-apoptotic TRAIL-R1, caspase-8, caspase-9, Bcl-xS, and cytochrome c. TQ-triggered apoptosis was substantiated by up-regulation of the executioner caspase-3 and caspase-7, as well as cleavage of the death substrate poly(ADP-ribose)polymerase. Interestingly, pretreatment of MB cells with NAC or the pan-caspase inhibitor zVAD-fmk abrogated TQ-induced apoptosis, loss of cyclin B1 and NF-κB activity, suggesting that these TQ-mediated effects are oxidative stress- and caspase-dependent. These findings reveal that TQ induces both extrinsic and intrinsic pathways of apoptosis in MB cells, and suggest its potential usefulness in the treatment of MB
    corecore