Search CORE

138 research outputs found

Visual units and confusion modelling for automatic lip-reading

Author: Cox Stephen
Howell Dominic
Theobald Barry
Publication venue: 'Elsevier BV'
Publication date: 01/07/2016
Field of study

Automatic lip-reading (ALR) is a challenging task because the visual speech signal is known to be missing some important information, such as voicing. We propose an approach to ALR that acknowledges that this information is missing but assumes that it is substituted or deleted in a systematic way that can be modelled. We describe a system that learns such a model and then incorporates it into decoding, which is realised as a cascade of weighted finite-state transducers. Our results show a small but statistically significant improvement in recognition accuracy. We also investigate the issue of suitable visual units for ALR, and show that visemes are sub-optimal, not but because they introduce lexical ambiguity, but because the reduction in modelling units entailed by their use reduces accuracy

Crossref

University of East Anglia digital repository

The Effect of Speaking Rate on Audio and Visual Speech

Author: Matthews Iain
Taylor Sarah
Theobald Barry-John
Publication venue
Publication date: 29/07/2014
Field of study

The speed that an utterance is spoken affects both the duration of the speech and the position of the articulators. Consequently, the sounds that are produced are modified, as are the position and appearance of the lips, teeth, tongue and other visible articulators. We describe an experiment designed to measure the effect of variable speaking rate on audio and visual speech by comparing sequences of phonemes and dynamic visemes appearing in the same sentences spoken at different speeds. We find that both audio and visual speech production are affected by varying the rate of speech, however, the effect is significantly more prominent in visual speech

Crossref

University of East Anglia digital repository

A Mouth Full of Words: Visually Consistent Acoustic Redubbing

Author: Matthews Iain
Taylor Sarah
Theobald Barry-John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2015
Field of study

This paper introduces a method for automatic redubbing of video that exploits the many-to-many mapping of phoneme sequences to lip movements modelled as dynamic visemes [1]. For a given utterance, the corresponding dynamic viseme sequence is sampled to construct a graph of possible phoneme sequences that synchronize with the video. When composed with a pronunciation dictionary and language model, this produces a vast number of word sequences that are in sync with the original video, literally putting plausible words into the mouth of the speaker. We demonstrate that traditional, one-to-many, static visemes lack flexibility for this application as they produce significantly fewer word sequences. This work explores the natural ambiguity in visual speech and offers insight for automatic speech recognition and the importance of language modeling

Crossref

University of East Anglia digital repository

Indiana\u27s Reward-for-Effort School Funding Formula: Issues and Options

Author: Bull Barry
Theobald Neil D.
Vesper Nick
Publication venue: 'New Prairie Press'
Publication date: 01/09/1997
Field of study

Indiana is in the fourth year of a scheduled six-year phase in of its guaranteed yield Reward-for-Effort School Funding Formula

Crossref

Kansas State University

Mirroring to Build Trust in Digital Assistants

Author: Apostoloff Nicholas
Jonsson Ing-Marie
Lee Robert
Metcalf Katherine
Theobald Barry-John
Webb Russ
Weinberg Garrett
Publication venue
Publication date: 02/04/2019
Field of study

We describe experiments towards building a conversational digital assistant that considers the preferred conversational style of the user. In particular, these experiments are designed to measure whether users prefer and trust an assistant whose conversational style matches their own. To this end we conducted a user study where subjects interacted with a digital assistant that responded in a way that either matched their conversational style, or did not. Using self-reported personality attributes and subjects' feedback on the interactions, we built models that can reliably predict a user's preferred conversational style.Comment: Preprin

arXiv.org e-Print Archive

Crossref

Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

Author: Mackraz Natalie
Metcalf Katherine
Sarabia Miguel
Theobald Barry-John
Publication venue
Publication date: 27/02/2024
Field of study

Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors. We show that dynamics-aware reward functions improve the sample efficiency of PbRL by an order of magnitude. In our experiments we iterate between: (1) learning a dynamics-aware state-action representation (z^{sa}) via a self-supervised temporal consistency task, and (2) bootstrapping the preference-based reward function from (z^{sa}), which results in faster policy learning and better final policy performance. For example, on quadruped-walk, walker-walk, and cheetah-run, with 50 preference labels we achieve the same performance as existing approaches with 500 preference labels, and we recover 83\% and 66\% of ground truth reward policy performance versus only 38\% and 21\%. The performance gains demonstrate the benefits of explicitly learning a dynamics-aware reward model. Repo: \texttt{https://github.com/apple/ml-reed}.Comment: CoRL 2023. arXiv admin note: substantial text overlap with arXiv:2211.0652

arXiv.org e-Print Archive

REALM: Robust Entropy Adaptive Loss Minimization for Improved Single-Sample Test-Time Adaptation

Author: Busbridge Dan
Danieli Federico
Jaitly Navdeep
Seto Skyler
Theobald Barry-John
Publication venue
Publication date: 07/09/2023
Field of study

Fully-test-time adaptation (F-TTA) can mitigate performance loss due to distribution shifts between train and test data (1) without access to the training data, and (2) without knowledge of the model training procedure. In online F-TTA, a pre-trained model is adapted using a stream of test samples by minimizing a self-supervised objective, such as entropy minimization. However, models adapted with online using entropy minimization, are unstable especially in single sample settings, leading to degenerate solutions, and limiting the adoption of TTA inference strategies. Prior works identify noisy, or unreliable, samples as a cause of failure in online F-TTA. One solution is to ignore these samples, which can lead to bias in the update procedure, slow adaptation, and poor generalization. In this work, we present a general framework for improving robustness of F-TTA to these noisy samples, inspired by self-paced learning and robust loss functions. Our proposed approach, Robust Entropy Adaptive Loss Minimization (REALM), achieves better adaptation accuracy than previous approaches throughout the adaptation process on corruptions of CIFAR-10 and ImageNet-1K, demonstrating its effectiveness.Comment: Accepted at WACV 2024, 17 pages, 7 figures, 11 table

arXiv.org e-Print Archive