3 research outputs found
Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification
Pre-trained Transformer-based speech models have shown striking performance
when fine-tuned on various downstream tasks such as automatic speech
recognition and spoken language identification (SLID). However, the problem of
domain mismatch remains a challenge in this area, where the domain of the
pre-training data might differ from that of the downstream labeled data used
for fine-tuning. In multilingual tasks such as SLID, the pre-trained speech
model may not support all the languages in the downstream task. To address this
challenge, we propose self-supervised adaptive pre-training (SAPT) to adapt the
pre-trained model to the target domain and languages of the downstream task. We
apply SAPT to the XLSR-128 model and investigate the effectiveness of this
approach for the SLID task. First, we demonstrate that SAPT improves XLSR
performance on the FLEURS benchmark with substantial gains up to 40.1% for
under-represented languages. Second, we apply SAPT on four different datasets
in a few-shot learning setting, showing that our approach improves the sample
efficiency of XLSR during fine-tuning. Our experiments provide strong empirical
evidence that continual adaptation via self-supervision improves downstream
performance for multilingual speech models.Comment: Submitted to ICASSP 202
An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech
Self-supervised representation learning for speech often involves a
quantization step that transforms the acoustic input into discrete units.
However, it remains unclear how to characterize the relationship between these
discrete units and abstract phonetic categories such as phonemes. In this
paper, we develop an information-theoretic framework whereby we represent each
phonetic category as a distribution over discrete units. We then apply our
framework to two different self-supervised models (namely wav2vec 2.0 and XLSR)
and use American English speech as a case study. Our study demonstrates that
the entropy of phonetic distributions reflects the variability of the
underlying speech sounds, with phonetically similar sounds exhibiting similar
distributions. While our study confirms the lack of direct, one-to-one
correspondence, we find an intriguing, indirect relationship between phonetic
categories and discrete units.Comment: Accepted in Interspeech 202
Evaluation of prognostic risk models for postoperative pulmonary complications in adult patients undergoing major abdominal surgery: a systematic review and international external validation cohort study
Background Stratifying risk of postoperative pulmonary complications after major abdominal surgery allows clinicians to modify risk through targeted interventions and enhanced monitoring. In this study, we aimed to identify and validate prognostic models against a new consensus definition of postoperative pulmonary complications. Methods We did a systematic review and international external validation cohort study. The systematic review was done in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. We searched MEDLINE and Embase on March 1, 2020, for articles published in English that reported on risk prediction models for postoperative pulmonary complications following abdominal surgery. External validation of existing models was done within a prospective international cohort study of adult patients (≥18 years) undergoing major abdominal surgery. Data were collected between Jan 1, 2019, and April 30, 2019, in the UK, Ireland, and Australia. Discriminative ability and prognostic accuracy summary statistics were compared between models for the 30-day postoperative pulmonary complication rate as defined by the Standardised Endpoints in Perioperative Medicine Core Outcome Measures in Perioperative and Anaesthetic Care (StEP-COMPAC). Model performance was compared using the area under the receiver operating characteristic curve (AUROCC). Findings In total, we identified 2903 records from our literature search; of which, 2514 (86·6%) unique records were screened, 121 (4·8%) of 2514 full texts were assessed for eligibility, and 29 unique prognostic models were identified. Nine (31·0%) of 29 models had score development reported only, 19 (65·5%) had undergone internal validation, and only four (13·8%) had been externally validated. Data to validate six eligible models were collected in the international external validation cohort study. Data from 11 591 patients were available, with an overall postoperative pulmonary complication rate of 7·8% (n=903). None of the six models showed good discrimination (defined as AUROCC ≥0·70) for identifying postoperative pulmonary complications, with the Assess Respiratory Risk in Surgical Patients in Catalonia score showing the best discrimination (AUROCC 0·700 [95% CI 0·683–0·717]). Interpretation In the pre-COVID-19 pandemic data, variability in the risk of pulmonary complications (StEP-COMPAC definition) following major abdominal surgery was poorly described by existing prognostication tools. To improve surgical safety during the COVID-19 pandemic recovery and beyond, novel risk stratification tools are required. Funding British Journal of Surgery Society