3,226 research outputs found
Study of aerodynamic technology for single-cruise-engine V/STOL fighter/attack aircraft
A viable, single engine, supersonic V/STOL fighter/attack aircraft concept was defined. This vectored thrust, canard wing configuration utilizes an advanced technology separated flow engine with fan stream burning. The aerodynamic characteristics of this configuration were estimated and performance evaluated. Significant aerodynamic and aerodynamic propulsion interaction uncertainties requiring additional investigation were identified. A wind tunnel model concept and test program to resolve these uncertainties and validate the aerodynamic prediction methods were defined
Decoding visemes: improving machine lip-reading
Abstract
This thesis is about improving machine lip-reading, that is, the classi�cation of
speech from only visual cues of a speaker. Machine lip-reading is a niche research
problem in both areas of speech processing and computer vision.
Current challenges for machine lip-reading fall into two groups: the content of the
video, such as the rate at which a person is speaking or; the parameters of the video
recording for example, the video resolution. We begin our work with a literature
review to understand the restrictions current technology limits machine lip-reading
recognition and conduct an experiment into resolution a�ects. We show that high
de�nition video is not needed to successfully lip-read with a computer.
The term \viseme" is used in machine lip-reading to represent a visual cue or
gesture which corresponds to a subgroup of phonemes where the phonemes are
indistinguishable in the visual speech signal. Whilst a viseme is yet to be formally
de�ned, we use the common working de�nition `a viseme is a group of phonemes
with identical appearance on the lips'. A phoneme is the smallest acoustic unit a
human can utter. Because there are more phonemes per viseme, mapping between
the units creates a many-to-one relationship. Many mappings have been presented,
and we conduct an experiment to determine which mapping produces the most
accurate classi�cation. Our results show Lee's [82] is best. Lee's classi�cation also
outperforms machine lip-reading systems which use the popular Fisher [48] phonemeto-
viseme map.
Further to this, we propose three methods of deriving speaker-dependent phonemeto-
viseme maps and compare our new approaches to Lee's. Our results show the
ii
iii
sensitivity of phoneme clustering and we use our new knowledge for our �rst suggested
augmentation to the conventional lip-reading system.
Speaker independence in machine lip-reading classi�cation is another unsolved
obstacle. It has been observed, in the visual domain, that classi�ers need training
on the test subject to achieve the best classi�cation. Thus machine lip-reading is
highly dependent upon the speaker. Speaker independence is the opposite of this,
or in other words, is the classi�cation of a speaker not present in the classi�er's
training data. We investigate the dependence of phoneme-to-viseme maps between
speakers. Our results show there is not a high variability of visual cues, but there is
high variability in trajectory between visual cues of an individual speaker with the
same ground truth. This implies a dependency upon the number of visemes within
each set for each individual.
Finally, we investigate how many visemes is the optimum number within a set.
We show the phoneme-to-viseme maps in literature rarely have enough visemes
and the optimal number, which varies by speaker, ranges from 11 to 35. The last
di�culty we address is decoding from visemes back to phonemes and into words.
Traditionally this is completed using a language model. The language model unit is
either: the same as the classi�er, e.g. visemes or phonemes; or the language model
unit is words. In a novel approach we use these optimum range viseme sets within
hierarchical training of phoneme labelled classi�ers. This new method of classi�er
training demonstrates signi�cant increase in classi�cation with a word language
network
Nonlinear diffusion model for Rayleigh-Taylor mixing
The complex evolution of turbulent mixing in Rayleigh-Taylor convection is
studied in terms of eddy diffusiviy models for the mean temperature profile. It
is found that a non-linear model, derived within the general framework of
Prandtl mixing theory, reproduces accurately the evolution of turbulent
profiles obtained from numerical simulations. Our model allows to give very
precise predictions for the turbulent heat flux and for the Nusselt number in
the ultimate state regime of thermal convection.Comment: 4 pages, 4 figure, PRL in pres
- …