Search CORE

3,155 research outputs found

Lip segmentation using adaptive color space training

Author: Erdogan Hakan
Erdoğan Hakan
Karabalkan Harun
Ozgur Erol
Unel Mustafa
Yılmaz Mustafa Berkay
Yilmaz Mustafa Berkay
Özgür Erol
Ünel Mustafa
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 01/09/2008
Field of study

In audio-visual speech recognition (AVSR), it is beneficial to use lip boundary information in addition to texture-dependent features. In this paper, we propose an automatic lip segmentation method that can be used in AVSR systems. The algorithm consists of the following steps: face detection, lip corners extraction, adaptive color space training for lip and non-lip regions using Gaussian mixture models (GMMs), and curve evolution using level-set formulation based on region and image gradients fields. Region-based fields are obtained using adapted GMM likelihoods. We have tested the proposed algorithm on a database (SU-TAV) of 100 facial images and obtained objective performance results by comparing automatic lip segmentations with hand-marked ground truth segmentations. Experimental results are promising and much work has to be done to improve the robustness of the proposed method

Sabanci University Research Database

Neural Models of Motion Integration, Segmentation, and Probablistic Decision-Making

Author: Grossberg Stephen
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/09/2007
Field of study

When brain mechanism carry out motion integration and segmentation processes that compute unambiguous global motion percepts from ambiguous local motion signals? Consider, for example, a deer running at variable speeds behind forest cover. The forest cover is an occluder that creates apertures through which fragments of the deer's motion signals are intermittently experienced. The brain coherently groups these fragments into a trackable percept of the deer in its trajectory. Form and motion processes are needed to accomplish this using feedforward and feedback interactions both within and across cortical processing streams. All the cortical areas V1, V2, MT, and MST are involved in these interactions. Figure-ground processes in the form stream through V2, such as the seperation of occluding boundaries of the forest cover from the boundaries of the deer, select the motion signals which determine global object motion percepts in the motion stream through MT. Sparse, but unambiguous, feauture tracking signals are amplified before they propogate across position and are intergrated with far more numerous ambiguous motion signals. Figure-ground and integration processes together determine the global percept. A neural model predicts the processing stages that embody these form and motion interactions. Model concepts and data are summarized about motion grouping across apertures in response to a wide variety of displays, and probabilistic decision making in parietal cortex in response to random dot displays.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

Boston University Institutional Repository (OpenBU)

Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems

Author: Gritzman Ashley Daniel
Publication venue
Publication date: 01/01/2016
Field of study

A thesis submitted to the Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, in ful lment of the requirements for the degree of Doctor of Philosophy. Johannesburg, September 2016Having survived the ordeal of a laryngectomy, the patient must come to terms with the resulting loss of speech. With recent advances in portable computing power, automatic lip-reading (ALR) may become a viable approach to voice restoration. This thesis addresses the image processing aspect of ALR, and focuses three contributions to colour-based lip segmentation. The rst contribution concerns the colour transform to enhance the contrast between the lips and skin. This thesis presents the most comprehensive study to date by measuring the overlap between lip and skin histograms for 33 di erent colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%, and results show that selecting the correct transform can increase the segmentation accuracy by up to three times. The second contribution is the development of a new lip segmentation algorithm that utilises the best colour transforms from the comparative study. The algorithm is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation error (SE) of 7:39 %. The third contribution focuses on the impact of the histogram threshold on the segmentation accuracy, and introduces a novel technique called Adaptive Threshold Optimisation (ATO) to select a better threshold value. The rst stage of ATO incorporates -SVR to train the lip shape model. ATO then uses feedback of shape information to validate and optimise the threshold. After applying ATO, the SE decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp or relative improvement of 15:1%. While this thesis concerns lip segmentation in particular, ATO is a threshold selection technique that can be used in various segmentation applications.MT201

Wits Institutional Repository on DSPACE

Pixelated Semantic Colorization

Author: Han Jungong
Shao Ling
Snoek Cees G. M.
Zhao Jiaojiao
Publication venue
Publication date: 07/02/2019
Field of study

While many image colorization algorithms have recently shown the capability of producing plausible color versions from gray-scale photographs, they still suffer from limited semantic understanding. To address this shortcoming, we propose to exploit pixelated object semantics to guide image colorization. The rationale is that human beings perceive and distinguish colors based on the semantic categories of objects. Starting from an autoregressive model, we generate image color distributions, from which diverse colored results are sampled. We propose two ways to incorporate object semantics into the colorization model: through a pixelated semantic embedding and a pixelated semantic generator. Specifically, the proposed convolutional neural network includes two branches. One branch learns what the object is, while the other branch learns the object colors. The network jointly optimizes a color embedding loss, a semantic segmentation loss and a color generation loss, in an end-to-end fashion. Experiments on PASCAL VOC2012 and COCO-stuff reveal that our network, when trained with semantic segmentation labels, produces more realistic and finer results compared to the colorization state-of-the-art

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Probabilistic facial feature extraction using joint distribution of location and texture information

Author: Erdogan Hakan
Erdoğan Hakan
Unel Mustafa
Yılmaz Mustafa Berkay
Yilmaz Mustafa Berkay
Ünel Mustafa
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

In this work, we propose a method which can extract critical points on a face using both location and texture information. This new approach can automatically learn feature information from training data. It finds the best facial feature locations by maximizing the joint distribution of location and texture parameters. We first introduce an independence assumption. Then, we improve upon this model by assuming dependence of location parameters but independence of texture parameters.We model combined location parameters with a multivariate Gaussian for computational reasons. The texture parameters are modeled with a Gaussian mixture model. It is shown that the new method outperforms active appearance models for the same experimental setup

CiteSeerX

Crossref

Sabanci University Research Database