17 research outputs found

    Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording

    Get PDF
    In this paper we present our work on Task 1 Acoustic Scene Classi- fication and Task 3 Sound Event Detection in Real Life Recordings. Among our experiments we have low-level and high-level features, classifier optimization and other heuristics specific to each task. Our performance for both tasks improved the baseline from DCASE: for Task 1 we achieved an overall accuracy of 78.9% compared to the baseline of 72.6% and for Task 3 we achieved a Segment-Based Error Rate of 0.76 compared to the baseline of 0.91

    Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages

    Full text link
    In this paper, we describe the TTS models developed by NVIDIA for the MMITS-VC (Multi-speaker, Multi-lingual Indic TTS with Voice Cloning) 2024 Challenge. In Tracks 1 and 2, we utilize RAD-MMM to perform few-shot TTS by training additionally on 5 minutes of target speaker data. In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets. We use HiFi-GAN vocoders for all submissions. RAD-MMM performs competitively on Tracks 1 and 2, while P-Flow ranks first on Track 3, with mean opinion score (MOS) 4.4 and speaker similarity score (SMOS) of 3.62.Comment: Presentation accepted at ICASSP 202

    Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

    Full text link
    Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) strong multi-turn dialogue abilities. We introduce a series of training techniques, architecture design, and data strategies to enhance our model with these abilities. Extensive evaluations across various audio understanding tasks confirm the efficacy of our method, setting new state-of-the-art benchmarks. Our demo website is https://audioflamingo.github.io/ and the code is open-sourced at https://github.com/NVIDIA/audio-flamingo.Comment: ICML 202

    Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

    Full text link
    Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text contains multiple occurrences of the same token. We examine these challenges in an encoder-decoder transformer model and find that certain cross-attention heads in such models implicitly learn the text and speech alignment when trained for predicting speech tokens for a given text. To make the alignment more robust, we propose techniques utilizing CTC loss and attention priors that encourage monotonic cross-attention over the text tokens. Our guided attention training technique does not introduce any new learnable parameters and significantly improves robustness of LLM-based TTS models.Comment: Published as a conference paper at INTERSPEECH 202

    Experiments on the DCASE Challenge 2016: Acoustic scene classification and sound event detection in real life recording

    Get PDF
    International audienceIn this paper we present our work on Task 1 Acoustic Scene Classification and Task 3 Sound Event Detection in Real Life Recordings. Among our experiments we have low-level and high-level features, classifier optimization and other heuristics specific to each task. Our performance for both tasks improved the baseline from DCASE: for Task 1 we achieved an overall accuracy of 78.9% compared to the baseline of 72.6% and for Task 3 we achieved a Segment-Based Error Rate of 0.48 compared to the baseline of 0.91

    Data Storage in DNA

    No full text

    PERT era, race‐based healthcare disparities in a large urban safety net hospital

    No full text
    Abstract Pulmonary embolism (PE) is the third leading cause of cardiovascular death in the United States. Black Americans have higher incidence, greater clot severity, and worse outcomes than White Americans. This disparity is not fully understood, especially in the context of the advent of PE response teams (PERT), which aim to standardize PE‐related care. This retrospective single‐center cohort study compared 294 Black and 131 White patients from our institution's PERT database. Primary objectives included severity and in‐hospital management. Secondary outcomes included length of stay, 30‐day readmission, 30‐day mortality, and outpatient follow‐up. Clot  (p = 0.42), acute treatment (p = 0.28), 30‐day mortality (p = 0.77), 30‐day readmission (p = 0.50), and outpatient follow‐up (p = 0.98) were similar between races. Black patients had a lower mean household income (35,383,SD20,596)thanWhitepatients(35,383, SD 20,596) than White patients (63,396, SD 32,987) (p < 0.0001). More Black patients (78.8%) had exclusively government insurance (Medicare/Medicaid) compared to White patients (61.8%) (p = 0.006). Interestingly, government insurance patients had less follow‐up (58.3%) than private insurance patients (79.7%) (p = 0.001). Notably, patients with follow‐up had fewer 30‐day readmissions. Specifically, 12.2% of patients with follow‐up were readmitted compared to 22.2% of patients without follow‐up (p = 0.008). There were no significant differences in PE severity, in‐hospital treatment, mortality, or readmissions between Black and White patients. However, patients with government insurance had less follow‐up and more readmissions, indicating a socioeconomic disparity. Access barriers such as health literacy, treatment cost, and transportation may contribute to this inequity. Improving access to follow‐up care may reduce the disparity in PE outcomes
    corecore