1,201 research outputs found

    Top-down segmentation of non-rigid visual objects using derivative-based search on sparse manifolds

    Get PDF
    The solution for the top-down segmentation of non rigid visual objects using machine learning techniques is generally regarded as too complex to be solved in its full generality given the large dimensionality of the search space of the explicit representation of the segmentation contour. In order to reduce this complexity, the problem is usually divided into two stages: rigid detection and non-rigid segmentation. The rationale is based on the fact that the rigid detection can be run in a lower dimensionality space (i.e., less complex and faster) than the original contour space, and its result is then used to constrain the non-rigid segmentation. In this paper, we propose the use of sparse manifolds to reduce the dimensionality of the rigid detection search space of current state-of-the-art top-down segmentation methodologies. The main goals targeted by this smaller dimensionality search space are the decrease of the search running time complexity and the reduction of the training complexity of the rigid detector. These goals are attainable given that both the search and training complexities are function of the dimensionality of the rigid search space. We test our approach in the segmentation of the left ventricle from ultrasound images and lips from frontal face images. Compared to the performance of state-of-the-art non-rigid segmentation system, our experiments show that the use of sparse manifolds for the rigid detection leads to the two goals mentioned above. © 2013 IEEE.Jacinto C. Nascimento, Gustavo Carneirohttp://www.pamitc.org/cvpr13

    Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems

    Get PDF
    A thesis submitted to the Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, in ful lment of the requirements for the degree of Doctor of Philosophy. Johannesburg, September 2016Having survived the ordeal of a laryngectomy, the patient must come to terms with the resulting loss of speech. With recent advances in portable computing power, automatic lip-reading (ALR) may become a viable approach to voice restoration. This thesis addresses the image processing aspect of ALR, and focuses three contributions to colour-based lip segmentation. The rst contribution concerns the colour transform to enhance the contrast between the lips and skin. This thesis presents the most comprehensive study to date by measuring the overlap between lip and skin histograms for 33 di erent colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%, and results show that selecting the correct transform can increase the segmentation accuracy by up to three times. The second contribution is the development of a new lip segmentation algorithm that utilises the best colour transforms from the comparative study. The algorithm is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation error (SE) of 7:39 %. The third contribution focuses on the impact of the histogram threshold on the segmentation accuracy, and introduces a novel technique called Adaptive Threshold Optimisation (ATO) to select a better threshold value. The rst stage of ATO incorporates -SVR to train the lip shape model. ATO then uses feedback of shape information to validate and optimise the threshold. After applying ATO, the SE decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp or relative improvement of 15:1%. While this thesis concerns lip segmentation in particular, ATO is a threshold selection technique that can be used in various segmentation applications.MT201

    Cued Speech Gesture Recognition: A First Prototype Based on Early Reduction

    No full text
    International audienceCued Speech is a specific linguistic code for hearing-impaired people. It is based on both lip reading and manual gestures. In the context of THIMP (Telephony for the Hearing-IMpaired Project), we work on automatic cued speech translation. In this paper, we only address the problem of automatic cued speech manual gesture recognition. Such a gesture recognition issue is really common from a theoretical point of view, but we approach it with respect to its particularities in order to derive an original method. This method is essentially built around a bioinspired method called early reduction. Prior to a complete analysis of each image of a sequence, the early reduction process automatically extracts a restricted number of key images which summarize the whole sequence. Only the key images are studied from a temporal point of view with lighter computation than the complete sequenc

    3D Segmentation & Measurement of Macular Holes

    Get PDF
    Macular holes are blinding conditions where a hole develops in the central part of retina, resulting in reduced central vision. The prognosis and treatment options are related to a number of variables including the macular hole size and shape. In this work we introduce a method to segment and measure macular holes in three-dimensional (3D) data. High-resolution spectral domain optical coherence tomography (SD-OCT) allows precise imaging of the macular hole geometry in three dimensions, but the measurement of these by human observers is time consuming and prone to high inter- and intra-observer variability, being characteristically measured in 2D rather than 3D. This work introduces several novel techniques to automatically retrieve accurate 3D measurements of the macular hole, including surface area, base area, base diameter, top area, top diameter, height, and minimum diameter. Specifically, it is introducing a multi-scale 3D level set segmentation approach based on a state-of-the-art level set method, and introducing novel curvature-based cutting and 3D measurement procedures. The algorithm is fully automatic, and we validate the extracted measurements both qualitatively and quantitatively, where the results show the method to be robust across a variety of scenarios. A segmentation software package is presented for targeting medical and biological applications, with a high level of visual feedback and several usability enhancements over existing packages. Specifically, it is providing a substantially faster graphics processing unit (GPU) implementation of the local Gaussian distribution fitting (LGDF) energy model, which can segment inhomogeneous objects with poorly defined boundaries as often encountered in biomedical images. It also provides interactive brushes to guide the segmentation process in a semi-automated framework. The speed of implementation allows us to visualise the active surface in real-time with a built-in ray tracer, where users may halt evolution at any timestep to correct implausible segmentation by painting new blocking regions or new seeds. Quantitative and qualitative validation is presented, demonstrating the practical efficacy of the interactive elements for a variety of real-world datasets. The size of macular holes is known to be one of the strongest predictors of surgical success both anatomically and functionally. Furthermore, it is used to guide the choice of treatment, the optimum surgical approach and to predict outcome. Our automated 3D image segmentation algorithm has extracted 3D shape-based macular hole measurements and described the dimensions and morphology. Our approach is able to robustly and accurately measure macular hole dimensions. This thesis is considered as a significant contribution for clinical applications particularly in the field of macular hole segmentation and shape analysis

    Classification using geometric level sets

    Get PDF
    A variational level set method is developed for the supervised classification problem. Nonlinear classifier decision boundaries are obtained by minimizing an energy functional that is composed of an empirical risk term with a margin-based loss and a geometric regularization term new to machine learning: the surface area of the decision boundary. This geometric level set classifier is analyzed in terms of consistency and complexity through the calculation of its ε-entropy. For multicategory classification, an efficient scheme is developed using a logarithmic number of decision functions in the number of classes rather than the typical linear number of decision functions. Geometric level set classification yields performance results on benchmark data sets that are competitive with well-established methods.National Science Foundation (U.S.) (Graduate Research Fellowship)United States. Army Research Office (MURI grant W911NF-06-1-0076

    Nilpotent Approximations of Sub-Riemannian Distances for Fast Perceptual Grouping of Blood Vessels in 2D and 3D

    Get PDF
    We propose an efficient approach for the grouping of local orientations (points on vessels) via nilpotent approximations of sub-Riemannian distances in the 2D and 3D roto-translation groups SE(2)SE(2) and SE(3)SE(3). In our distance approximations we consider homogeneous norms on nilpotent groups that locally approximate SE(n)SE(n), and which are obtained via the exponential and logarithmic map on SE(n)SE(n). In a qualitative validation we show that the norms provide accurate approximations of the true sub-Riemannian distances, and we discuss their relations to the fundamental solution of the sub-Laplacian on SE(n)SE(n). The quantitative experiments further confirm the accuracy of the approximations. Quantitative results are obtained by evaluating perceptual grouping performance of retinal blood vessels in 2D images and curves in challenging 3D synthetic volumes. The results show that 1) sub-Riemannian geometry is essential in achieving top performance and 2) that grouping via the fast analytic approximations performs almost equally, or better, than data-adaptive fast marching approaches on Rn\mathbb{R}^n and SE(n)SE(n).Comment: 18 pages, 9 figures, 3 tables, in review at JMI

    A Variational Model for Object Segmentation Using Boundary Information and Shape Prior Driven by the Mumford-Shah Functional

    Get PDF
    In this paper, we propose a new variational model to segment an object belonging to a given shape space using the active contour method, a geometric shape prior and the Mumford-Shah functional. The core of our model is an energy functional composed by three complementary terms. The first one is based on a shape model which constrains the active contour to get a shape of interest. The second term detects object boundaries from image gradients. And the third term drives globally the shape prior and the active contour towards a homogeneous intensity region. The segmentation of the object of interest is given by the minimum of our energy functional. This minimum is computed with the calculus of variations and the gradient descent method that provide a system of evolution equations solved with the well-known level set method. We also prove the existence of this minimum in the space of functions with bounded variation. Applications of the proposed model are presented on synthetic and medical image

    Multi-Sensory Emotion Recognition with Speech and Facial Expression

    Get PDF
    Emotion plays an important role in human beings’ daily lives. Understanding emotions and recognizing how to react to others’ feelings are fundamental to engaging in successful social interactions. Currently, emotion recognition is not only significant in human beings’ daily lives, but also a hot topic in academic research, as new techniques such as emotion recognition from speech context inspires us as to how emotions are related to the content we are uttering. The demand and importance of emotion recognition have highly increased in many applications in recent years, such as video games, human computer interaction, cognitive computing, and affective computing. Emotion recognition can be done from many sources including text, speech, hand, and body gesture as well as facial expression. Presently, most of the emotion recognition methods only use one of these sources. The emotion of human beings changes every second and using a single way to process the emotion recognition may not reflect the emotion correctly. This research is motivated by the desire to understand and evaluate human beings’ emotion from multiple ways such as speech and facial expressions. In this dissertation, multi-sensory emotion recognition has been exploited. The proposed framework can recognize emotion from speech, facial expression, and both of them. There are three important parts in the design of the system: the facial emotion recognizer, the speech emotion recognizer, and the information fusion. The information fusion part uses the results from the speech emotion recognition and facial emotion recognition. Then, a novel weighted method is used to integrate the results, and a final decision of the emotion is given after the fusion. The experiments show that with the weighted fusion methods, the accuracy can be improved to an average of 3.66% compared to fusion without adding weight. The improvement of the recognition rate can reach 18.27% and 5.66% compared to the speech emotion recognition and facial expression recognition, respectively. By improving the emotion recognition accuracy, the proposed multi-sensory emotion recognition system can help to improve the naturalness of human computer interaction

    Multi-Sensory Emotion Recognition with Speech and Facial Expression

    Get PDF
    Emotion plays an important role in human beings’ daily lives. Understanding emotions and recognizing how to react to others’ feelings are fundamental to engaging in successful social interactions. Currently, emotion recognition is not only significant in human beings’ daily lives, but also a hot topic in academic research, as new techniques such as emotion recognition from speech context inspires us as to how emotions are related to the content we are uttering. The demand and importance of emotion recognition have highly increased in many applications in recent years, such as video games, human computer interaction, cognitive computing, and affective computing. Emotion recognition can be done from many sources including text, speech, hand, and body gesture as well as facial expression. Presently, most of the emotion recognition methods only use one of these sources. The emotion of human beings changes every second and using a single way to process the emotion recognition may not reflect the emotion correctly. This research is motivated by the desire to understand and evaluate human beings’ emotion from multiple ways such as speech and facial expressions. In this dissertation, multi-sensory emotion recognition has been exploited. The proposed framework can recognize emotion from speech, facial expression, and both of them. There are three important parts in the design of the system: the facial emotion recognizer, the speech emotion recognizer, and the information fusion. The information fusion part uses the results from the speech emotion recognition and facial emotion recognition. Then, a novel weighted method is used to integrate the results, and a final decision of the emotion is given after the fusion. The experiments show that with the weighted fusion methods, the accuracy can be improved to an average of 3.66% compared to fusion without adding weight. The improvement of the recognition rate can reach 18.27% and 5.66% compared to the speech emotion recognition and facial expression recognition, respectively. By improving the emotion recognition accuracy, the proposed multi-sensory emotion recognition system can help to improve the naturalness of human computer interaction
    corecore