221 research outputs found

    DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation

    Full text link
    In recent years, audio-driven 3D facial animation has gained significant attention, particularly in applications such as virtual reality, gaming, and video conferencing. However, accurately modeling the intricate and subtle dynamics of facial expressions remains a challenge. Most existing studies approach the facial animation task as a single regression problem, which often fail to capture the intrinsic inter-modal relationship between speech signals and 3D facial animation and overlook their inherent consistency. Moreover, due to the limited availability of 3D-audio-visual datasets, approaches learning with small-size samples have poor generalizability that decreases the performance. To address these issues, in this study, we propose a cross-modal dual-learning framework, termed DualTalker, aiming at improving data usage efficiency as well as relating cross-modal dependencies. The framework is trained jointly with the primary task (audio-driven facial animation) and its dual task (lip reading) and shares common audio/motion encoder components. Our joint training framework facilitates more efficient data usage by leveraging information from both tasks and explicitly capitalizing on the complementary relationship between facial motion and audio to improve performance. Furthermore, we introduce an auxiliary cross-modal consistency loss to mitigate the potential over-smoothing underlying the cross-modal complementary representations, enhancing the mapping of subtle facial expression dynamics. Through extensive experiments and a perceptual user study conducted on the VOCA and BIWI datasets, we demonstrate that our approach outperforms current state-of-the-art methods both qualitatively and quantitatively. We have made our code and video demonstrations available at https://github.com/sabrina-su/iadf.git

    Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

    Full text link
    Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of impaired speech required for ASR system development. This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personalized disordered speech augmentation approaches that simultaneously learn to encode, generate and discriminate synthesized impaired speech. Separate latent features are derived to learn dysarthric speech characteristics and phoneme context representations. Self-supervised pre-trained Wav2vec 2.0 embedding features are also incorporated. Experiments conducted on the UASpeech corpus suggest the proposed adversarial data augmentation approach consistently outperformed the baseline speed perturbation and non-VAE GAN augmentation methods with trained hybrid TDNN and End-to-end Conformer systems. After LHUC speaker adaptation, the best system using VAE-GAN based augmentation produced an overall WER of 27.78% on the UASpeech test set of 16 dysarthric speakers, and the lowest published WER of 57.31% on the subset of speakers with "Very Low" intelligibility.Comment: Submitted to ICASSP 202

    Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

    Full text link
    Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors. To address these issues, a set of compact and data efficient speaker-dependent (SD) parameter representations are used to facilitate both speaker adaptive training and test-time unsupervised speaker adaptation of state-of-the-art Conformer ASR systems. The sensitivity to supervision quality is reduced using a confidence score-based selection of the less erroneous subset of speaker-level adaptation data. Two lightweight confidence score estimation modules are proposed to produce more reliable confidence scores. The data sparsity issue, which is exacerbated by data selection, is addressed by modelling the SD parameter uncertainty using Bayesian learning. Experiments on the benchmark 300-hour Switchboard and the 233-hour AMI datasets suggest that the proposed confidence score-based adaptation schemes consistently outperformed the baseline speaker-independent (SI) Conformer model and conventional non-Bayesian, point estimate-based adaptation using no speaker data selection. Similar consistent performance improvements were retained after external Transformer and LSTM language model rescoring. In particular, on the 300-hour Switchboard corpus, statistically significant WER reductions of 1.0%, 1.3%, and 1.4% absolute (9.5%, 10.9%, and 11.3% relative) were obtained over the baseline SI Conformer on the NIST Hub5'00, RT02, and RT03 evaluation sets respectively. Similar WER reductions of 2.7% and 3.3% absolute (8.9% and 10.2% relative) were also obtained on the AMI development and evaluation sets.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

    Full text link
    Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all system components is proposed in this paper. The efficacy of the video input is consistently demonstrated in mask-based MVDR speech separation, DNN-WPE or spectral mapping (SpecM) based speech dereverberation front-end and Conformer ASR back-end. Audio-visual integrated front-end architectures performing speech separation and dereverberation in a pipelined or joint fashion via mask-based WPD are investigated. The error cost mismatch between the speech enhancement front-end and ASR back-end components is minimized by end-to-end jointly fine-tuning using either the ASR cost function alone, or its interpolation with the speech enhancement loss. Experiments were conducted on the mixture overlapped and reverberant speech data constructed using simulation or replay of the Oxford LRS2 dataset. The proposed audio-visual multi-channel speech separation, dereverberation and recognition systems consistently outperformed the comparable audio-only baseline by 9.1% and 6.2% absolute (41.7% and 36.0% relative) word error rate (WER) reductions. Consistent speech enhancement improvements were also obtained on PESQ, STOI and SRMR scores.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Anti-tumour therapeutic efficacy of OX40L in murine tumour model

    Get PDF
    OX40 ligand (OX40L), a member of TNF superfamily, is a co-stimulatory molecule involved in T cell activation. Systemic administration of mOX40L fusion protein significantly inhibited the growth of experimental lung metastasis and subcutaneous (s.c.) established colon (CT26) and breast (4T1) carcinomas. Vaccination with OX40L was significantly enhanced by combination treatment with intra-tumour injection of a disabled infectious single cycle-herpes simplex virus (DISC-HSV) vector encoding murine granulocyte macrophage-colony stimulating factor (mGM-CSF). Tumour rejection in response to OX40L therapy required functional CD4+ and CD8+ T cells and correlated with splenocyte cytotoxic T lymphocytes (CTLs) activity against the AH-1 gp70 peptide of the tumour associated antigen expressed by CT26 cells. These results demonstrate the potential role of the OX40L in cancer immunotherapy

    Intense exercise for survival among men with metastatic castrate-resistant prostate cancer (INTERVAL-GAP4): A multicentre, randomized, controlled phase III study protocol

    Get PDF
    Introduction: Preliminary evidence supports the beneficial role of physical activity on prostate cancer outcomes. This phase III randomised controlled trial (RCT) is designed to determine if supervised high-intensity aerobic and resistance exercise increases overall survival (OS) in patients with metastatic castrate-resistant prostate cancer (mCRPC). Methods and analysis: Participants (n=866) must have histologically documented metastatic prostate cancer with evidence of progressive disease on androgen deprivation therapy (defined as mCRPC). Patients can be treatmentnaive for mCRPC or on first-line androgen receptor-targeted therapy for mCRPC (ie, abiraterone or enzalutamide) without evidence of progression at enrolment, and with no prior chemotherapy for mCRPC. Patients will receive psychosocial support and will be randomly assigned (1:1) to either supervised exercise (high-intensity aerobic and resistance training) or self-directed exercise (provision of guidelines), stratified by treatment status and site. Exercise prescriptions will be tailored to each participant’s fitness and morbidities. The primary endpoint is OS. Secondary endpoints include time to disease progression, occurrence of a skeletal-related event or progression of pain, and degree of pain, opiate use, physical and emotional quality of life, and changes in metabolic biomarkers. An assessment of whether immune function, inflammation, dysregulation of insulin and energy metabolism, and androgen biomarkers are associated with OS will be performed, and whether they mediate the primary association between exercise and OS will also be investigated. This study will also establish a biobank for future biomarker discovery or validation. Ethics and dissemination: Validation of exercise as medicine and its mechanisms of action will create evidence to change clinical practice. Accordingly, outcomes of this RCT will be published in international, peer-reviewed journals, and presented at national and international conferences. Ethics approval was first obtained at Edith Cowan University (ID: 13236 NEWTON), with a further 10 investigator sites since receiving ethics approval, prior to activation. Trial registration number: NCT02730338

    Stellar Coronal and Wind Models: Impact on Exoplanets

    Full text link
    Surface magnetism is believed to be the main driver of coronal heating and stellar wind acceleration. Coronae are believed to be formed by plasma confined in closed magnetic coronal loops of the stars, with winds mainly originating in open magnetic field line regions. In this Chapter, we review some basic properties of stellar coronae and winds and present some existing models. In the last part of this Chapter, we discuss the effects of coronal winds on exoplanets.Comment: Chapter published in the "Handbook of Exoplanets", Editors in Chief: Juan Antonio Belmonte and Hans Deeg, Section Editor: Nuccio Lanza. Springer Reference Work

    Methods for conducting international Delphi surveys to optimise global participation in core outcome set development: a case study in gastric cancer informed by a comprehensive literature review

    Get PDF
    Copyright © 2021, The Author(s) Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.Background: Core outcome sets (COS) should be relevant to key stakeholders and widely applicable and usable. Ideally, they are developed for international use to allow optimal data synthesis from trials. Electronic Delphi surveys are commonly used to facilitate global participation; however, this has limitations. It is common for these surveys to be conducted in a single language potentially excluding those not fluent in that tongue. The aim of this study is to summarise current approaches for optimising international participation in Delphi studies and make recommendations for future practice. Methods: A comprehensive literature review of current approaches to translating Delphi surveys for COS development was undertaken. A standardised methodology adapted from international guidance derived from 12 major sets of translation guidelines in the field of outcome reporting was developed. As a case study, this was applied to a COS project for surgical trials in gastric cancer to translate a Delphi survey into 7 target languages from regions active in gastric cancer research. Results: Three hundred thirty-two abstracts were screened and four studies addressing COS development in rheumatoid and osteoarthritis, vascular malformations and polypharmacy were eligible for inclusion. There was wide variation in methodological approaches to translation, including the number of forward translations, the inclusion of back translation, the employment of cognitive debriefing and how discrepancies and disagreements were handled. Important considerations were identified during the development of the gastric cancer survey including establishing translation groups, timelines, understanding financial implications, strategies to maximise recruitment and regulatory approvals. The methodological approach to translating the Delphi surveys was easily reproducible by local collaborators and resulted in an additional 637 participants to the 315 recruited to complete the source language survey. Ninety-nine per cent of patients and 97% of healthcare professionals from non-English-speaking regions used translated surveys. Conclusion: Consideration of the issues described will improve planning by other COS developers and can be used to widen international participation from both patients and healthcare professionals.This study is funded by the National Institute for Health Research (NIHR) Doctoral Research Fellowship Grant (DRF-2015-08-023). JMB is partially funded by the NIHR Bristol Biomedical Research Centre and the MRC ConDUCT-II Hub for Trials Methodology Research. PRW was funded by the MRC North West Hub for Trials Methodology Research (Grant ref: MR/K025635/01).info:eu-repo/semantics/publishedVersio
    corecore