1,510 research outputs found

    Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems

    Get PDF
    A thesis submitted to the Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, in ful lment of the requirements for the degree of Doctor of Philosophy. Johannesburg, September 2016Having survived the ordeal of a laryngectomy, the patient must come to terms with the resulting loss of speech. With recent advances in portable computing power, automatic lip-reading (ALR) may become a viable approach to voice restoration. This thesis addresses the image processing aspect of ALR, and focuses three contributions to colour-based lip segmentation. The rst contribution concerns the colour transform to enhance the contrast between the lips and skin. This thesis presents the most comprehensive study to date by measuring the overlap between lip and skin histograms for 33 di erent colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%, and results show that selecting the correct transform can increase the segmentation accuracy by up to three times. The second contribution is the development of a new lip segmentation algorithm that utilises the best colour transforms from the comparative study. The algorithm is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation error (SE) of 7:39 %. The third contribution focuses on the impact of the histogram threshold on the segmentation accuracy, and introduces a novel technique called Adaptive Threshold Optimisation (ATO) to select a better threshold value. The rst stage of ATO incorporates -SVR to train the lip shape model. ATO then uses feedback of shape information to validate and optimise the threshold. After applying ATO, the SE decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp or relative improvement of 15:1%. While this thesis concerns lip segmentation in particular, ATO is a threshold selection technique that can be used in various segmentation applications.MT201

    Dictionary-based lip reading classification

    Get PDF
    Visual lip reading recognition is an essential stage in many multimedia systems such as “Audio Visual Speech Recognition” [6], “Mobile Phone Visual System for deaf people”, “Sign Language Recognition System”, etc. The use of lip visual features to help audio or hand recognition is appropriate because this information is robust to acoustic noise. In this paper, we describe our work towards developing a robust technique for lip reading classification that extracts the lips in a colour image by using EMPCA feature extraction and k-nearest-neighbor classification. In order to reduce the dimensionality of the feature space the lip motion is characterized by three templates that are modelled based on different mouth shapes: closed template, semi-closed template, and wideopen template. Our goal is to classify each image sequence based on the distribution of the three templates and group the words into different clusters. The words that form the database were grouped into three different clusters as follows: group1(‘I’, ‘high’, ‘lie’, ‘hard’, ‘card’, ‘bye’), group2(‘you, ‘owe’, ‘word’), group3(‘bird’)

    Using Prior Knowledge for Verification and Elimination of Stationary and Variable Objects in Real-time Images

    Get PDF
    With the evolving technologies in the autonomous vehicle industry, now it has become possible for automobile passengers to sit relaxed instead of driving the car. Technologies like object detection, object identification, and image segmentation have enabled an autonomous car to identify and detect an object on the road in order to drive safely. While an autonomous car drives by itself on the road, the types of objects surrounding the car can be dynamic (e.g., cars and pedestrians), stationary (e.g., buildings and benches), and variable (e.g., trees) depending on if the location or shape of an object changes or not. Different from the existing image-based approaches to detect and recognize objects in the scene, in this research 3D virtual world is employed to verify and eliminate stationary and variable objects to allow the autonomous car to focus on dynamic objects that may cause danger to its driving. This methodology takes advantage of prior knowledge of stationary and variable objects presented in a virtual city and verifies their existence in a real-time scene by matching keypoints between the virtual and real objects. In case of a stationary or variable object that does not exist in the virtual world due to incomplete pre-existing information, this method uses machine learning for object detection. Verified objects are then removed from the real-time image with a combined algorithm using contour detection and class activation map (CAM), which helps to enhance the efficiency and accuracy when recognizing moving objects

    EXPERIMENTAL STUDY ON LIP AND SMILE DETECTION

    Get PDF
    This paper presents a lip and smile detection method based-on the normalized RGB chromaticity diagram. The method employs the popular Viola-Jones detection method to detect the face. To avoid the false positive, the eye detector is introduced in the detection stage. Only the face candidates with the detected eyes are considered as the face. Once the face is detected, the lip region is localized using the simple geometric rule. Further, the the red color thresholding based-on the normalized RGB chromaticity diagram is proposed to extract the lip. The projection technique is employed for detecting the smile state. From the experiment results, the proposed method achieves the lip detection rate of 97% and the smile detection rate of 94%. Paper ini menyajikan medote pendeteksi bibir dan senyum berdasarkan diagram tingkat kromatis RGB ternormalisasi. Metode ini menggunakan metode Viola-Jones yang populer untuk mendeteksi wajah. Untuk menghindari kesalahan positif, detektor mata diperkenalkan pada tahapan deteksi. Hanya kandidat wajah dengan mata yang telah terdeteksi yang dianggap sebagai wajah. Setelah wajah dideteksi, bagian bibir ditempatkan dengan menggunakan aturan geometris sederhana. Selanjutnya, batasan warna merah berdasarkan pada diagram kromatisitas RGB ternormalisasi digunakan untuk mengekstrak bibir. Teknik proyeksi digunakan untuk mendeteksi keadaan tersenyum. Dari hasil percobaan, metode yang diusulkan mencapai 97% untuk tingkat deteksi bibir dan 94% untuk tingkat deteksi senyum
    • 

    corecore