1,768 research outputs found

    MAST: Multiscale Audio Spectrogram Transformers

    Full text link
    We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings the concept of multiscale feature hierarchies to the Audio Spectrogram Transformer (AST). Given an input audio spectrogram we first patchify and project it into an initial temporal resolution and embedding dimension, post which the multiple stages in MAST progressively expand the embedding dimension while reducing the temporal resolution of the input. We use a pyramid structure that allows early layers of MAST operating at a high temporal resolution but low embedding space to model simple low-level acoustic information and deeper temporally coarse layers to model high-level acoustic information with high-dimensional embeddings. We also extend our approach to present a new Self-Supervised Learning (SSL) method called SS-MAST, which calculates a symmetric contrastive loss between latent representations from a student and a teacher encoder. In practice, MAST significantly outperforms AST by an average accuracy of 3.4% across 8 speech and non-speech tasks from the LAPE Benchmark. Moreover, SS-MAST achieves an absolute average improvement of 2.6% over SSAST for both AST and MAST encoders. We make all our codes available on GitHub at the time of publication.Comment: Submitted ICASSP 202

    UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation

    Full text link
    In this paper, we introduce UnFuSeD, a novel approach to leverage self-supervised learning and reduce the need for large amounts of labeled data for audio classification. Unlike prior works, which directly fine-tune a self-supervised pre-trained encoder on a target dataset, we use the encoder to generate pseudo-labels for unsupervised fine-tuning before the actual fine-tuning step. We first train an encoder using a novel self-supervised learning algorithm (SSL) on an unlabeled audio dataset. Then, we use that encoder to generate pseudo-labels on our target task dataset via clustering the extracted representations. These pseudo-labels are then used to guide self-distillation on a randomly initialized model, which we call unsupervised fine-tuning. Finally, the resultant encoder is then fine-tuned on our target task dataset. Through UnFuSeD, we propose the first system that moves away from generic SSL paradigms in literature, which pre-train and fine-tune the same encoder, and present a novel self-distillation-based system to leverage SSL pre-training for low-resource audio classification. In practice, UnFuSeD achieves state-of-the-art results on the LAPE Benchmark, significantly outperforming all our baselines. Additionally, UnFuSeD allows us to achieve this at a 40% reduction in the number of parameters over the previous state-of-the-art system. We make all our codes publicly available.Comment: Under review at ICASSP 2023 SASB Worksho

    SLICER: Learning universal audio representations using low-resource self-supervised pre-training

    Full text link
    We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio data that reduces the need for large amounts of labeled data for audio and speech classification. Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks in a low-resource un-labeled audio pre-training setting. Inspired by the recent success of clustering and contrasting learning paradigms for SSL-based speech representation learning, we propose SLICER (Symmetrical Learning of Instance and Cluster-level Efficient Representations), which brings together the best of both clustering and contrasting learning paradigms. We use a symmetric loss between latent representations from student and teacher encoders and simultaneously solve instance and cluster-level contrastive learning tasks. We obtain cluster representations online by just projecting the input spectrogram into an output subspace with dimensions equal to the number of clusters. In addition, we propose a novel mel-spectrogram augmentation procedure, k-mix, based on mixup, which does not require labels and aids unsupervised representation learning for audio. Overall, SLICER achieves state-of-the-art results on the LAPE Benchmark \cite{9868132}, significantly outperforming DeLoRes-M and other prior approaches, which are pre-trained on 10×10\times larger of unsupervised data. We will make all our codes available on GitHub.Comment: Submitted to ICASSP 202

    CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

    Full text link
    A fundamental characteristic of audio is its compositional nature. Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP) that learns a shared representation between audio and language modalities have improved performance in many downstream applications, including zero-shot audio classification, audio retrieval, etc. However, the ability of these models to effectively perform compositional reasoning remains largely unexplored and necessitates additional research. In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs. Our proposed CompA-order evaluates how well an ALM understands the order or occurrence of acoustic events in audio, and CompA-attribute evaluates attribute binding of acoustic events. An instance from either benchmark consists of two audio-caption pairs, where both audios have the same acoustic events but with different compositions. An ALM is evaluated on how well it matches the right audio to the right caption. Using this benchmark, we first show that current ALMs perform only marginally better than random chance, thereby struggling with compositional reasoning. Next, we propose CompA-CLAP, where we fine-tune CLAP using a novel learning method to improve its compositional reasoning abilities. To train CompA-CLAP, we first propose improvements to contrastive training with composition-aware hard negatives, allowing for more focused training. Next, we propose a novel modular contrastive loss that helps the model learn fine-grained compositional understanding and overcomes the acute scarcity of openly available compositional audios. CompA-CLAP significantly improves over all our baseline models on the CompA benchmark, indicating its superior compositional reasoning capabilities.Comment: Pre-print under revie

    Intravesical rAd-IFNα/Syn3 for Patients With High-Grade, Bacillus Calmette-Guerin-Refractory or Relapsed Non-Muscle-Invasive Bladder Cancer: A Phase II Randomized Study.

    Get PDF
    Purpose Many patients with high-risk non-muscle-invasive bladder cancer (NMIBC) are either refractory to bacillus Calmette-Guerin (BCG) treatment or may experience disease relapse. We assessed the efficacy and safety of recombinant adenovirus interferon alfa with Syn3 (rAd-IFNα/Syn3), a replication-deficient recombinant adenovirus gene transfer vector, for patients with high-grade (HG) BCG-refractory or relapsed NMIBC. Methods In this open-label, multicenter (n = 13), parallel-arm, phase II study ( ClinicalTrials.gov identifier: NCT01687244), 43 patients with HG BCG-refractory or relapsed NMIBC received intravesical rAd-IFNα/Syn3 (randomly assigned 1:1 to 1 × 10(11) viral particles (vp)/mL or 3 × 10(11) vp/mL). Patients who responded at months 3, 6, and 9 were retreated at months 4, 7, and 10. The primary end point was 12-month HG recurrence-free survival (RFS). All patients who received at least one dose were included in efficacy and safety analyses. Results Forty patients received rAd-IFNα/Syn3 (1 × 10(11) vp/mL, n = 21; 3 × 10(11) vp/mL, n = 19) between November 5, 2012, and April 8, 2015. Fourteen patients (35.0%; 90% CI, 22.6% to 49.2%) remained free of HG recurrence 12 months after initial treatment. Comparable 12-month HG RFS was noted for both doses. Of these 14 patients, two experienced recurrence at 21 and 28 months, respectively, after treatment initiation, and one died as a result of an upper tract tumor at 17 months without a recurrence. rAd-IFNα/Syn3 was well tolerated; no grade four or five adverse events (AEs) occurred, and no patient discontinued treatment because of an adverse event. The most frequently reported drug-related AEs were micturition urgency (n = 16; 40%), dysuria (n = 16; 40%), fatigue (n = 13; 32.5%), pollakiuria (n = 11; 28%), and hematuria and nocturia (n = 10 each; 25%). Conclusion rAd-IFNα/Syn3 was well tolerated. It demonstrated promising efficacy for patients with HG NMIBC after BCG therapy who were unable or unwilling to undergo radical cystectomy

    T Cell Responses to Human Endogenous Retroviruses in HIV-1 Infection

    Get PDF
    Human endogenous retroviruses (HERVs) are remnants of ancient infectious agents that have integrated into the human genome. Under normal circumstances, HERVs are functionally defective or controlled by host factors. In HIV-1-infected individuals, intracellular defense mechanisms are compromised. We hypothesized that HIV-1 infection would remove or alter controls on HERV activity. Expression of HERV could potentially stimulate a T cell response to HERV antigens, and in regions of HIV-1/HERV similarity, these T cells could be cross-reactive. We determined that the levels of HERV production in HIV-1-positive individuals exceed those of HIV-1-negative controls. To investigate the impact of HERV activity on specific immunity, we examined T cell responses to HERV peptides in 29 HIV-1-positive and 13 HIV-1-negative study participants. We report T cell responses to peptides derived from regions of HERV detected by ELISPOT analysis in the HIV-1-positive study participants. We show an inverse correlation between anti-HERV T cell responses and HIV-1 plasma viral load. In HIV-1-positive individuals, we demonstrate that HERV-specific T cells are capable of killing cells presenting their cognate peptide. These data indicate that HIV-1 infection leads to HERV expression and stimulation of a HERV-specific CD8+ T cell response. HERV-specific CD8+ T cells have characteristics consistent with an important role in the response to HIV-1 infection: a phenotype similar to that of T cells responding to an effectively controlled virus (cytomegalovirus), an inverse correlation with HIV-1 plasma viral load, and the ability to lyse cells presenting their target peptide. These characteristics suggest that elicitation of anti-HERV-specific immune responses is a novel approach to immunotherapeutic vaccination. As endogenous retroviral sequences are fixed in the human genome, they provide a stable target, and HERV-specific T cells could recognize a cell infected by any HIV-1 viral variant. HERV-specific immunity is an important new avenue for investigation in HIV-1 pathogenesis and vaccine design

    Observed relationships between extreme sub-daily precipitation, surface temperature, and relative humidity

    Get PDF
    Expected changes to future extreme precipitation remain a key uncertainty associated with anthropogenic climate change. Recently, extreme precipitation has been proposed to scale with the precipitable water content in the atmosphere, which assuming relative humidity stays constant, will increase at a rate of ∼6.8%/°C as indicated by the Clausius-Clapeyron (C-C) relationship. We examine this scaling empirically using data from 137 long-record pluviograph and temperature gauges across Australia. We find that scaling rates are consistent with the C-C relationship for surface temperatures up to between 20°C and 26°C and for precipitation durations up to 30 minutes, implying that such scaling applies only for individual storm systems. At greater temperatures negative scaling is observed. Consideration of relative humidity data shows a pronounced decrease in the maximum relative humidity for land surface temperatures greater than 26°C, indicating that moisture availability becomes the dominant driver of how extreme precipitation scales at higher temperatures.Rhys Hardwick Jones, Seth Westra and Ashish Sharm

    Epidemiology of Bladder Cancer in 2023: A Systematic Review of Risk Factors

    Full text link
    CONTEXT Bladder cancer (BC) is common worldwide and poses a significant public health challenge. External risk factors and the wider exposome (totality of exposure from external and internal factors) contribute significantly to the development of BC. Therefore, establishing a clear understanding of these risk factors is the key to prevention. OBJECTIVE To perform an up-to-date systematic review of BC's epidemiology and external risk factors. EVIDENCE ACQUISITION Two reviewers (I.J. and S.O.) performed a systematic review using PubMed and Embase in January 2022 and updated it in September 2022. The search was restricted to 4 yr since our previous review in 2018. EVIDENCE SYNTHESIS Our search identified 5177 articles and a total of 349 full-text manuscripts. GLOBOCAN data from 2020 revealed an incidence of 573 000 new BC cases and 213 000 deaths worldwide in 2020. The 5-yr prevalence worldwide in 2020 was 1 721 000. Tobacco smoking and occupational exposures (aromatic amines and polycyclic aromatic hydrocarbons) are the most substantial risk factors. In addition, correlative evidence exists for several risk factors, including specific dietary factors, imbalanced microbiome, gene-environment risk factor interactions, diesel exhaust emission exposure, and pelvic radiotherapy. CONCLUSIONS We present a contemporary overview of the epidemiology of BC and the current evidence for BC risk factors. Smoking and specific occupational exposures are the most established risk factors. There is emerging evidence for specific dietary factors, imbalanced microbiome, gene-external risk factor interactions, diesel exhaust emission exposure, and pelvic radiotherapy. Further high-quality evidence is required to confirm initial findings and further understand cancer prevention. PATIENT SUMMARY Bladder cancer is common, and the most substantial risk factors are smoking and workplace exposure to suspected carcinogens. On-going research to identify avoidable risk factors could reduce the number of people who get bladder cancer

    Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

    Full text link
    Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages belong to different language families, resulting in differences in generated audio duration. This is further compounded by the original speaker's rhythm, especially for extempore speech. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically. A prototype is developed for dubbing lectures into 9 Indian languages. A mean-opinion-score (MOS) is obtained for two languages, Hindi and Tamil, on two different courses. The output video is compared with the original video in terms of MOS (1-5) and lip synchronisation with scores of 4.09 and 3.74, respectively. The human effort also reduces by 75%
    • …
    corecore