2,045 research outputs found
UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation
In this paper, we introduce UnFuSeD, a novel approach to leverage
self-supervised learning and reduce the need for large amounts of labeled data
for audio classification. Unlike prior works, which directly fine-tune a
self-supervised pre-trained encoder on a target dataset, we use the encoder to
generate pseudo-labels for unsupervised fine-tuning before the actual
fine-tuning step. We first train an encoder using a novel self-supervised
learning algorithm (SSL) on an unlabeled audio dataset. Then, we use that
encoder to generate pseudo-labels on our target task dataset via clustering the
extracted representations. These pseudo-labels are then used to guide
self-distillation on a randomly initialized model, which we call unsupervised
fine-tuning. Finally, the resultant encoder is then fine-tuned on our target
task dataset. Through UnFuSeD, we propose the first system that moves away from
generic SSL paradigms in literature, which pre-train and fine-tune the same
encoder, and present a novel self-distillation-based system to leverage SSL
pre-training for low-resource audio classification. In practice, UnFuSeD
achieves state-of-the-art results on the LAPE Benchmark, significantly
outperforming all our baselines. Additionally, UnFuSeD allows us to achieve
this at a 40% reduction in the number of parameters over the previous
state-of-the-art system. We make all our codes publicly available.Comment: Under review at ICASSP 2023 SASB Worksho
MAST: Multiscale Audio Spectrogram Transformers
We present Multiscale Audio Spectrogram Transformer (MAST) for audio
classification, which brings the concept of multiscale feature hierarchies to
the Audio Spectrogram Transformer (AST). Given an input audio spectrogram we
first patchify and project it into an initial temporal resolution and embedding
dimension, post which the multiple stages in MAST progressively expand the
embedding dimension while reducing the temporal resolution of the input. We use
a pyramid structure that allows early layers of MAST operating at a high
temporal resolution but low embedding space to model simple low-level acoustic
information and deeper temporally coarse layers to model high-level acoustic
information with high-dimensional embeddings. We also extend our approach to
present a new Self-Supervised Learning (SSL) method called SS-MAST, which
calculates a symmetric contrastive loss between latent representations from a
student and a teacher encoder. In practice, MAST significantly outperforms AST
by an average accuracy of 3.4% across 8 speech and non-speech tasks from the
LAPE Benchmark. Moreover, SS-MAST achieves an absolute average improvement of
2.6% over SSAST for both AST and MAST encoders. We make all our codes available
on GitHub at the time of publication.Comment: Submitted ICASSP 202
SLICER: Learning universal audio representations using low-resource self-supervised pre-training
We present a new Self-Supervised Learning (SSL) approach to pre-train
encoders on unlabeled audio data that reduces the need for large amounts of
labeled data for audio and speech classification. Our primary aim is to learn
audio representations that can generalize across a large variety of speech and
non-speech tasks in a low-resource un-labeled audio pre-training setting.
Inspired by the recent success of clustering and contrasting learning paradigms
for SSL-based speech representation learning, we propose SLICER (Symmetrical
Learning of Instance and Cluster-level Efficient Representations), which brings
together the best of both clustering and contrasting learning paradigms. We use
a symmetric loss between latent representations from student and teacher
encoders and simultaneously solve instance and cluster-level contrastive
learning tasks. We obtain cluster representations online by just projecting the
input spectrogram into an output subspace with dimensions equal to the number
of clusters. In addition, we propose a novel mel-spectrogram augmentation
procedure, k-mix, based on mixup, which does not require labels and aids
unsupervised representation learning for audio. Overall, SLICER achieves
state-of-the-art results on the LAPE Benchmark \cite{9868132}, significantly
outperforming DeLoRes-M and other prior approaches, which are pre-trained on
larger of unsupervised data. We will make all our codes available on
GitHub.Comment: Submitted to ICASSP 202
Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Continued self-supervised (SSL) pre-training for adapting existing SSL models
to the target domain has shown to be extremely effective for low-resource
Automatic Speech Recognition (ASR). This paper proposes Stable Distillation, a
simple and novel approach for SSL-based continued pre-training that boosts ASR
performance in the target domain where both labeled and unlabeled data are
limited. Stable Distillation employs self-distillation as regularization for
continued pre-training, alleviating the over-fitting issue, a common problem
continued pre-training faces when the source and target domains differ.
Specifically, first, we perform vanilla continued pre-training on an initial
SSL pre-trained model on the target domain ASR dataset and call it the teacher.
Next, we take the same initial pre-trained model as a student to perform
continued pre-training while enforcing its hidden representations to be close
to that of the teacher (via MSE loss). This student is then used for downstream
ASR fine-tuning on the target dataset. In practice, Stable Distillation
outperforms all our baselines by 0.8 - 7 WER when evaluated in various
experimental settings.Comment: Accepted to ICASSP 2024. Code:
https://github.com/cs20s030/stable_distillatio
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
A fundamental characteristic of audio is its compositional nature.
Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP)
that learns a shared representation between audio and language modalities have
improved performance in many downstream applications, including zero-shot audio
classification, audio retrieval, etc. However, the ability of these models to
effectively perform compositional reasoning remains largely unexplored and
necessitates additional research. In this paper, we propose CompA, a collection
of two expert-annotated benchmarks with a majority of real-world audio samples,
to evaluate compositional reasoning in ALMs. Our proposed CompA-order evaluates
how well an ALM understands the order or occurrence of acoustic events in
audio, and CompA-attribute evaluates attribute binding of acoustic events. An
instance from either benchmark consists of two audio-caption pairs, where both
audios have the same acoustic events but with different compositions. An ALM is
evaluated on how well it matches the right audio to the right caption. Using
this benchmark, we first show that current ALMs perform only marginally better
than random chance, thereby struggling with compositional reasoning. Next, we
propose CompA-CLAP, where we fine-tune CLAP using a novel learning method to
improve its compositional reasoning abilities. To train CompA-CLAP, we first
propose improvements to contrastive training with composition-aware hard
negatives, allowing for more focused training. Next, we propose a novel modular
contrastive loss that helps the model learn fine-grained compositional
understanding and overcomes the acute scarcity of openly available
compositional audios. CompA-CLAP significantly improves over all our baseline
models on the CompA benchmark, indicating its superior compositional reasoning
capabilities.Comment: Pre-print under revie
Intravesical rAd-IFNα/Syn3 for Patients With High-Grade, Bacillus Calmette-Guerin-Refractory or Relapsed Non-Muscle-Invasive Bladder Cancer: A Phase II Randomized Study.
Purpose Many patients with high-risk non-muscle-invasive bladder cancer (NMIBC) are either refractory to bacillus Calmette-Guerin (BCG) treatment or may experience disease relapse. We assessed the efficacy and safety of recombinant adenovirus interferon alfa with Syn3 (rAd-IFNα/Syn3), a replication-deficient recombinant adenovirus gene transfer vector, for patients with high-grade (HG) BCG-refractory or relapsed NMIBC. Methods In this open-label, multicenter (n = 13), parallel-arm, phase II study ( ClinicalTrials.gov identifier: NCT01687244), 43 patients with HG BCG-refractory or relapsed NMIBC received intravesical rAd-IFNα/Syn3 (randomly assigned 1:1 to 1 × 10(11) viral particles (vp)/mL or 3 × 10(11) vp/mL). Patients who responded at months 3, 6, and 9 were retreated at months 4, 7, and 10. The primary end point was 12-month HG recurrence-free survival (RFS). All patients who received at least one dose were included in efficacy and safety analyses. Results Forty patients received rAd-IFNα/Syn3 (1 × 10(11) vp/mL, n = 21; 3 × 10(11) vp/mL, n = 19) between November 5, 2012, and April 8, 2015. Fourteen patients (35.0%; 90% CI, 22.6% to 49.2%) remained free of HG recurrence 12 months after initial treatment. Comparable 12-month HG RFS was noted for both doses. Of these 14 patients, two experienced recurrence at 21 and 28 months, respectively, after treatment initiation, and one died as a result of an upper tract tumor at 17 months without a recurrence. rAd-IFNα/Syn3 was well tolerated; no grade four or five adverse events (AEs) occurred, and no patient discontinued treatment because of an adverse event. The most frequently reported drug-related AEs were micturition urgency (n = 16; 40%), dysuria (n = 16; 40%), fatigue (n = 13; 32.5%), pollakiuria (n = 11; 28%), and hematuria and nocturia (n = 10 each; 25%). Conclusion rAd-IFNα/Syn3 was well tolerated. It demonstrated promising efficacy for patients with HG NMIBC after BCG therapy who were unable or unwilling to undergo radical cystectomy
T Cell Responses to Human Endogenous Retroviruses in HIV-1 Infection
Human endogenous retroviruses (HERVs) are remnants of ancient infectious agents that have integrated into the human genome. Under normal circumstances, HERVs are functionally defective or controlled by host factors. In HIV-1-infected individuals, intracellular defense mechanisms are compromised. We hypothesized that HIV-1 infection would remove or alter controls on HERV activity. Expression of HERV could potentially stimulate a T cell response to HERV antigens, and in regions of HIV-1/HERV similarity, these T cells could be cross-reactive. We determined that the levels of HERV production in HIV-1-positive individuals exceed those of HIV-1-negative controls. To investigate the impact of HERV activity on specific immunity, we examined T cell responses to HERV peptides in 29 HIV-1-positive and 13 HIV-1-negative study participants. We report T cell responses to peptides derived from regions of HERV detected by ELISPOT analysis in the HIV-1-positive study participants. We show an inverse correlation between anti-HERV T cell responses and HIV-1 plasma viral load. In HIV-1-positive individuals, we demonstrate that HERV-specific T cells are capable of killing cells presenting their cognate peptide. These data indicate that HIV-1 infection leads to HERV expression and stimulation of a HERV-specific CD8+ T cell response. HERV-specific CD8+ T cells have characteristics consistent with an important role in the response to HIV-1 infection: a phenotype similar to that of T cells responding to an effectively controlled virus (cytomegalovirus), an inverse correlation with HIV-1 plasma viral load, and the ability to lyse cells presenting their target peptide. These characteristics suggest that elicitation of anti-HERV-specific immune responses is a novel approach to immunotherapeutic vaccination. As endogenous retroviral sequences are fixed in the human genome, they provide a stable target, and HERV-specific T cells could recognize a cell infected by any HIV-1 viral variant. HERV-specific immunity is an important new avenue for investigation in HIV-1 pathogenesis and vaccine design
Epidemiology of Bladder Cancer in 2023: A Systematic Review of Risk Factors
CONTEXT
Bladder cancer (BC) is common worldwide and poses a significant public health challenge. External risk factors and the wider exposome (totality of exposure from external and internal factors) contribute significantly to the development of BC. Therefore, establishing a clear understanding of these risk factors is the key to prevention.
OBJECTIVE
To perform an up-to-date systematic review of BC's epidemiology and external risk factors.
EVIDENCE ACQUISITION
Two reviewers (I.J. and S.O.) performed a systematic review using PubMed and Embase in January 2022 and updated it in September 2022. The search was restricted to 4 yr since our previous review in 2018.
EVIDENCE SYNTHESIS
Our search identified 5177 articles and a total of 349 full-text manuscripts. GLOBOCAN data from 2020 revealed an incidence of 573 000 new BC cases and 213 000 deaths worldwide in 2020. The 5-yr prevalence worldwide in 2020 was 1 721 000. Tobacco smoking and occupational exposures (aromatic amines and polycyclic aromatic hydrocarbons) are the most substantial risk factors. In addition, correlative evidence exists for several risk factors, including specific dietary factors, imbalanced microbiome, gene-environment risk factor interactions, diesel exhaust emission exposure, and pelvic radiotherapy.
CONCLUSIONS
We present a contemporary overview of the epidemiology of BC and the current evidence for BC risk factors. Smoking and specific occupational exposures are the most established risk factors. There is emerging evidence for specific dietary factors, imbalanced microbiome, gene-external risk factor interactions, diesel exhaust emission exposure, and pelvic radiotherapy. Further high-quality evidence is required to confirm initial findings and further understand cancer prevention.
PATIENT SUMMARY
Bladder cancer is common, and the most substantial risk factors are smoking and workplace exposure to suspected carcinogens. On-going research to identify avoidable risk factors could reduce the number of people who get bladder cancer
Observed relationships between extreme sub-daily precipitation, surface temperature, and relative humidity
Expected changes to future extreme precipitation remain a key uncertainty associated with anthropogenic climate change. Recently, extreme precipitation has been proposed to scale with the precipitable water content in the atmosphere, which assuming relative humidity stays constant, will increase at a rate of ∼6.8%/°C as indicated by the Clausius-Clapeyron (C-C) relationship. We examine this scaling empirically using data from 137 long-record pluviograph and temperature gauges across Australia. We find that scaling rates are consistent with the C-C relationship for surface temperatures up to between 20°C and 26°C and for precipitation durations up to 30 minutes, implying that such scaling applies only for individual storm systems. At greater temperatures negative scaling is observed. Consideration of relative humidity data shows a pronounced decrease in the maximum relative humidity for land surface temperatures greater than 26°C, indicating that moisture availability becomes the dominant driver of how extreme precipitation scales at higher temperatures.Rhys Hardwick Jones, Seth Westra and Ashish Sharm
- …