221 research outputs found
DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation
In recent years, audio-driven 3D facial animation has gained significant
attention, particularly in applications such as virtual reality, gaming, and
video conferencing. However, accurately modeling the intricate and subtle
dynamics of facial expressions remains a challenge. Most existing studies
approach the facial animation task as a single regression problem, which often
fail to capture the intrinsic inter-modal relationship between speech signals
and 3D facial animation and overlook their inherent consistency. Moreover, due
to the limited availability of 3D-audio-visual datasets, approaches learning
with small-size samples have poor generalizability that decreases the
performance. To address these issues, in this study, we propose a cross-modal
dual-learning framework, termed DualTalker, aiming at improving data usage
efficiency as well as relating cross-modal dependencies. The framework is
trained jointly with the primary task (audio-driven facial animation) and its
dual task (lip reading) and shares common audio/motion encoder components. Our
joint training framework facilitates more efficient data usage by leveraging
information from both tasks and explicitly capitalizing on the complementary
relationship between facial motion and audio to improve performance.
Furthermore, we introduce an auxiliary cross-modal consistency loss to mitigate
the potential over-smoothing underlying the cross-modal complementary
representations, enhancing the mapping of subtle facial expression dynamics.
Through extensive experiments and a perceptual user study conducted on the VOCA
and BIWI datasets, we demonstrate that our approach outperforms current
state-of-the-art methods both qualitatively and quantitatively. We have made
our code and video demonstrations available at
https://github.com/sabrina-su/iadf.git
Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition
Automatic recognition of disordered speech remains a highly challenging task
to date. The underlying neuro-motor conditions, often compounded with
co-occurring physical disabilities, lead to the difficulty in collecting large
quantities of impaired speech required for ASR system development. This paper
presents novel variational auto-encoder generative adversarial network
(VAE-GAN) based personalized disordered speech augmentation approaches that
simultaneously learn to encode, generate and discriminate synthesized impaired
speech. Separate latent features are derived to learn dysarthric speech
characteristics and phoneme context representations. Self-supervised
pre-trained Wav2vec 2.0 embedding features are also incorporated. Experiments
conducted on the UASpeech corpus suggest the proposed adversarial data
augmentation approach consistently outperformed the baseline speed perturbation
and non-VAE GAN augmentation methods with trained hybrid TDNN and End-to-end
Conformer systems. After LHUC speaker adaptation, the best system using VAE-GAN
based augmentation produced an overall WER of 27.78% on the UASpeech test set
of 16 dysarthric speakers, and the lowest published WER of 57.31% on the subset
of speakers with "Very Low" intelligibility.Comment: Submitted to ICASSP 202
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems
Speaker adaptation techniques provide a powerful solution to customise
automatic speech recognition (ASR) systems for individual users. Practical
application of unsupervised model-based speaker adaptation techniques to data
intensive end-to-end ASR systems is hindered by the scarcity of speaker-level
data and performance sensitivity to transcription errors. To address these
issues, a set of compact and data efficient speaker-dependent (SD) parameter
representations are used to facilitate both speaker adaptive training and
test-time unsupervised speaker adaptation of state-of-the-art Conformer ASR
systems. The sensitivity to supervision quality is reduced using a confidence
score-based selection of the less erroneous subset of speaker-level adaptation
data. Two lightweight confidence score estimation modules are proposed to
produce more reliable confidence scores. The data sparsity issue, which is
exacerbated by data selection, is addressed by modelling the SD parameter
uncertainty using Bayesian learning. Experiments on the benchmark 300-hour
Switchboard and the 233-hour AMI datasets suggest that the proposed confidence
score-based adaptation schemes consistently outperformed the baseline
speaker-independent (SI) Conformer model and conventional non-Bayesian, point
estimate-based adaptation using no speaker data selection. Similar consistent
performance improvements were retained after external Transformer and LSTM
language model rescoring. In particular, on the 300-hour Switchboard corpus,
statistically significant WER reductions of 1.0%, 1.3%, and 1.4% absolute
(9.5%, 10.9%, and 11.3% relative) were obtained over the baseline SI Conformer
on the NIST Hub5'00, RT02, and RT03 evaluation sets respectively. Similar WER
reductions of 2.7% and 3.3% absolute (8.9% and 10.2% relative) were also
obtained on the AMI development and evaluation sets.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition
Accurate recognition of cocktail party speech containing overlapping
speakers, noise and reverberation remains a highly challenging task to date.
Motivated by the invariance of visual modality to acoustic signal corruption,
an audio-visual multi-channel speech separation, dereverberation and
recognition approach featuring a full incorporation of visual information into
all system components is proposed in this paper. The efficacy of the video
input is consistently demonstrated in mask-based MVDR speech separation,
DNN-WPE or spectral mapping (SpecM) based speech dereverberation front-end and
Conformer ASR back-end. Audio-visual integrated front-end architectures
performing speech separation and dereverberation in a pipelined or joint
fashion via mask-based WPD are investigated. The error cost mismatch between
the speech enhancement front-end and ASR back-end components is minimized by
end-to-end jointly fine-tuning using either the ASR cost function alone, or its
interpolation with the speech enhancement loss. Experiments were conducted on
the mixture overlapped and reverberant speech data constructed using simulation
or replay of the Oxford LRS2 dataset. The proposed audio-visual multi-channel
speech separation, dereverberation and recognition systems consistently
outperformed the comparable audio-only baseline by 9.1% and 6.2% absolute
(41.7% and 36.0% relative) word error rate (WER) reductions. Consistent speech
enhancement improvements were also obtained on PESQ, STOI and SRMR scores.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin
Anti-tumour therapeutic efficacy of OX40L in murine tumour model
OX40 ligand (OX40L), a member of TNF superfamily, is a co-stimulatory molecule involved in T cell activation. Systemic administration of mOX40L fusion protein significantly inhibited the growth of experimental lung metastasis and subcutaneous (s.c.) established colon (CT26) and breast (4T1) carcinomas. Vaccination with OX40L was significantly enhanced by combination treatment with intra-tumour injection of a disabled infectious single cycle-herpes simplex virus (DISC-HSV) vector encoding murine granulocyte macrophage-colony stimulating factor (mGM-CSF). Tumour rejection in response to OX40L therapy required functional CD4+ and CD8+ T cells and correlated with splenocyte cytotoxic T lymphocytes (CTLs) activity against the AH-1 gp70 peptide of the tumour associated antigen expressed by CT26 cells. These results demonstrate the potential role of the OX40L in cancer immunotherapy
Intense exercise for survival among men with metastatic castrate-resistant prostate cancer (INTERVAL-GAP4): A multicentre, randomized, controlled phase III study protocol
Introduction: Preliminary evidence supports the beneficial role of physical activity on prostate cancer outcomes. This phase III randomised controlled trial (RCT) is designed to determine if supervised high-intensity aerobic and resistance exercise increases overall survival (OS) in patients with metastatic castrate-resistant prostate cancer (mCRPC).
Methods and analysis: Participants (n=866) must have histologically documented metastatic prostate cancer with evidence of progressive disease on androgen deprivation therapy (defined as mCRPC). Patients can be treatmentnaive for mCRPC or on first-line androgen receptor-targeted therapy for mCRPC (ie, abiraterone or enzalutamide) without evidence of progression at enrolment, and with no prior chemotherapy for mCRPC. Patients will receive psychosocial support and will be randomly assigned (1:1) to either supervised exercise (high-intensity aerobic and resistance training) or self-directed exercise (provision of guidelines), stratified by treatment status and site. Exercise prescriptions will be tailored to each participant’s fitness and morbidities. The primary endpoint is OS. Secondary endpoints include time to disease progression, occurrence of a skeletal-related event or progression of pain, and degree of pain, opiate use, physical and emotional quality of life, and changes in metabolic biomarkers. An assessment of whether immune function, inflammation, dysregulation of insulin and energy metabolism, and androgen biomarkers are associated with OS will be performed, and whether they mediate the primary association between exercise and OS will also be investigated. This study will also establish a biobank for future biomarker discovery or validation.
Ethics and dissemination: Validation of exercise as medicine and its mechanisms of action will create evidence to change clinical practice. Accordingly, outcomes of this RCT will be published in international, peer-reviewed journals, and presented at national and international conferences. Ethics approval was first obtained at Edith Cowan University (ID: 13236 NEWTON), with a further 10 investigator sites since receiving ethics approval, prior to activation.
Trial registration number: NCT02730338
Stellar Coronal and Wind Models: Impact on Exoplanets
Surface magnetism is believed to be the main driver of coronal heating and
stellar wind acceleration. Coronae are believed to be formed by plasma confined
in closed magnetic coronal loops of the stars, with winds mainly originating in
open magnetic field line regions. In this Chapter, we review some basic
properties of stellar coronae and winds and present some existing models. In
the last part of this Chapter, we discuss the effects of coronal winds on
exoplanets.Comment: Chapter published in the "Handbook of Exoplanets", Editors in Chief:
Juan Antonio Belmonte and Hans Deeg, Section Editor: Nuccio Lanza. Springer
Reference Work
Methods for conducting international Delphi surveys to optimise global participation in core outcome set development: a case study in gastric cancer informed by a comprehensive literature review
Copyright © 2021, The Author(s) Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.Background: Core outcome sets (COS) should be relevant to key stakeholders and widely applicable and usable. Ideally, they are developed for international use to allow optimal data synthesis from trials. Electronic Delphi surveys are commonly used to facilitate global participation; however, this has limitations. It is common for these surveys to be conducted in a single language potentially excluding those not fluent in that tongue. The aim of this study is to summarise current approaches for optimising international participation in Delphi studies and make recommendations for future practice.
Methods: A comprehensive literature review of current approaches to translating Delphi surveys for COS development was undertaken. A standardised methodology adapted from international guidance derived from 12 major sets of translation guidelines in the field of outcome reporting was developed. As a case study, this was applied to a COS project for surgical trials in gastric cancer to translate a Delphi survey into 7 target languages from regions active in gastric cancer research.
Results: Three hundred thirty-two abstracts were screened and four studies addressing COS development in rheumatoid and osteoarthritis, vascular malformations and polypharmacy were eligible for inclusion. There was wide variation in methodological approaches to translation, including the number of forward translations, the inclusion of back translation, the employment of cognitive debriefing and how discrepancies and disagreements were handled. Important considerations were identified during the development of the gastric cancer survey including establishing translation groups, timelines, understanding financial implications, strategies to maximise recruitment and regulatory approvals. The methodological approach to translating the Delphi surveys was easily reproducible by local collaborators and resulted in an additional 637 participants to the 315 recruited to complete the source language survey. Ninety-nine per cent of patients and 97% of healthcare professionals from non-English-speaking regions used translated surveys.
Conclusion: Consideration of the issues described will improve planning by other COS developers and can be used to widen international participation from both patients and healthcare professionals.This study is funded by the National Institute for Health Research (NIHR) Doctoral Research Fellowship Grant (DRF-2015-08-023). JMB is partially funded by the NIHR Bristol Biomedical Research Centre and the MRC
ConDUCT-II Hub for Trials Methodology Research. PRW was funded by the MRC North West Hub for Trials Methodology Research (Grant ref: MR/K025635/01).info:eu-repo/semantics/publishedVersio
- …