1,679 research outputs found
Visual Speech Recognition
Lip reading is used to understand or interpret speech without hearing it, a
technique especially mastered by people with hearing difficulties. The ability
to lip read enables a person with a hearing impairment to communicate with
others and to engage in social activities, which otherwise would be difficult.
Recent advances in the fields of computer vision, pattern recognition, and
signal processing has led to a growing interest in automating this challenging
task of lip reading. Indeed, automating the human ability to lip read, a
process referred to as visual speech recognition (VSR) (or sometimes speech
reading), could open the door for other novel related applications. VSR has
received a great deal of attention in the last decade for its potential use in
applications such as human-computer interaction (HCI), audio-visual speech
recognition (AVSR), speaker recognition, talking heads, sign language
recognition and video surveillance. Its main aim is to recognise spoken word(s)
by using only the visual signal that is produced during speech. Hence, VSR
deals with the visual domain of speech and involves image processing,
artificial intelligence, object detection, pattern recognition, statistical
modelling, etc.Comment: Speech and Language Technologies (Book), Prof. Ivo Ipsic (Ed.), ISBN:
978-953-307-322-4, InTech (2011
Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems
A thesis submitted to the Faculty of Engineering and the Built Environment,
University of the Witwatersrand, Johannesburg, in ful lment of the requirements for
the degree of Doctor of Philosophy.
Johannesburg, September 2016Having survived the ordeal of a laryngectomy, the patient must come to terms with
the resulting loss of speech. With recent advances in portable computing power,
automatic lip-reading (ALR) may become a viable approach to voice restoration. This
thesis addresses the image processing aspect of ALR, and focuses three contributions
to colour-based lip segmentation.
The rst contribution concerns the colour transform to enhance the contrast
between the lips and skin. This thesis presents the most comprehensive study to
date by measuring the overlap between lip and skin histograms for 33 di erent
colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%,
and results show that selecting the correct transform can increase the segmentation
accuracy by up to three times.
The second contribution is the development of a new lip segmentation algorithm
that utilises the best colour transforms from the comparative study. The algorithm
is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation
error (SE) of 7:39 %.
The third contribution focuses on the impact of the histogram threshold on the
segmentation accuracy, and introduces a novel technique called Adaptive Threshold
Optimisation (ATO) to select a better threshold value. The rst stage of ATO
incorporates -SVR to train the lip shape model. ATO then uses feedback of shape
information to validate and optimise the threshold. After applying ATO, the SE
decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp
or relative improvement of 15:1%. While this thesis concerns lip segmentation in
particular, ATO is a threshold selection technique that can be used in various
segmentation applications.MT201
Newborn insula gray matter volume is prospectively associated with early life adiposity gain
The importance of energy homeostasis brain circuitry in the context of obesity is well established, however, the developmental ontogeny of this circuitry in humans is currently unknown. Here, we investigate the prospective association between newborn gray matter (GM) volume in the insula, a key brain region underlying energy homeostasis, and change in percent body fat accrual over the first six months of postnatal life, an outcome that represents among the most reliable infant predictors of childhood obesity risk
An exploration of the rhythm of Malay
In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing.
The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English.
Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima.
This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm
- …