3 research outputs found

    Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification

    Full text link
    Pre-trained Transformer-based speech models have shown striking performance when fine-tuned on various downstream tasks such as automatic speech recognition and spoken language identification (SLID). However, the problem of domain mismatch remains a challenge in this area, where the domain of the pre-training data might differ from that of the downstream labeled data used for fine-tuning. In multilingual tasks such as SLID, the pre-trained speech model may not support all the languages in the downstream task. To address this challenge, we propose self-supervised adaptive pre-training (SAPT) to adapt the pre-trained model to the target domain and languages of the downstream task. We apply SAPT to the XLSR-128 model and investigate the effectiveness of this approach for the SLID task. First, we demonstrate that SAPT improves XLSR performance on the FLEURS benchmark with substantial gains up to 40.1% for under-represented languages. Second, we apply SAPT on four different datasets in a few-shot learning setting, showing that our approach improves the sample efficiency of XLSR during fine-tuning. Our experiments provide strong empirical evidence that continual adaptation via self-supervision improves downstream performance for multilingual speech models.Comment: Submitted to ICASSP 202

    An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech

    Full text link
    Self-supervised representation learning for speech often involves a quantization step that transforms the acoustic input into discrete units. However, it remains unclear how to characterize the relationship between these discrete units and abstract phonetic categories such as phonemes. In this paper, we develop an information-theoretic framework whereby we represent each phonetic category as a distribution over discrete units. We then apply our framework to two different self-supervised models (namely wav2vec 2.0 and XLSR) and use American English speech as a case study. Our study demonstrates that the entropy of phonetic distributions reflects the variability of the underlying speech sounds, with phonetically similar sounds exhibiting similar distributions. While our study confirms the lack of direct, one-to-one correspondence, we find an intriguing, indirect relationship between phonetic categories and discrete units.Comment: Accepted in Interspeech 202

    Evaluation of prognostic risk models for postoperative pulmonary complications in adult patients undergoing major abdominal surgery: a systematic review and international external validation cohort study

    Get PDF
    Background Stratifying risk of postoperative pulmonary complications after major abdominal surgery allows clinicians to modify risk through targeted interventions and enhanced monitoring. In this study, we aimed to identify and validate prognostic models against a new consensus definition of postoperative pulmonary complications. Methods We did a systematic review and international external validation cohort study. The systematic review was done in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. We searched MEDLINE and Embase on March 1, 2020, for articles published in English that reported on risk prediction models for postoperative pulmonary complications following abdominal surgery. External validation of existing models was done within a prospective international cohort study of adult patients (≥18 years) undergoing major abdominal surgery. Data were collected between Jan 1, 2019, and April 30, 2019, in the UK, Ireland, and Australia. Discriminative ability and prognostic accuracy summary statistics were compared between models for the 30-day postoperative pulmonary complication rate as defined by the Standardised Endpoints in Perioperative Medicine Core Outcome Measures in Perioperative and Anaesthetic Care (StEP-COMPAC). Model performance was compared using the area under the receiver operating characteristic curve (AUROCC). Findings In total, we identified 2903 records from our literature search; of which, 2514 (86·6%) unique records were screened, 121 (4·8%) of 2514 full texts were assessed for eligibility, and 29 unique prognostic models were identified. Nine (31·0%) of 29 models had score development reported only, 19 (65·5%) had undergone internal validation, and only four (13·8%) had been externally validated. Data to validate six eligible models were collected in the international external validation cohort study. Data from 11 591 patients were available, with an overall postoperative pulmonary complication rate of 7·8% (n=903). None of the six models showed good discrimination (defined as AUROCC ≥0·70) for identifying postoperative pulmonary complications, with the Assess Respiratory Risk in Surgical Patients in Catalonia score showing the best discrimination (AUROCC 0·700 [95% CI 0·683–0·717]). Interpretation In the pre-COVID-19 pandemic data, variability in the risk of pulmonary complications (StEP-COMPAC definition) following major abdominal surgery was poorly described by existing prognostication tools. To improve surgical safety during the COVID-19 pandemic recovery and beyond, novel risk stratification tools are required. Funding British Journal of Surgery Society
    corecore