5 research outputs found

    Recognition of Isolated Marathi words from Side Pose for multi-pose Audio Visual Speech Recognition

    Get PDF
    Abstract: This paper presents a new multi pose audio visual speech recognition system based on fusion of side pose visual features and acoustic signals. The proposed method improved robustness and circumvention of conventional multimodal speech recognition system. The work was implemented on ‘vVISWA’ (Visual Vocabulary of Independent Standard Words) dataset comprised of full frontal, 45degree and side pose visual streams.The feature sets originating from the visual feature for Side pose are extracted using 2D Stationary Wavelet Transform (2D-SWT) and acoustic features extracted using (Linear Predictive Coding) LPC were fused and classified using KNN algorithm resulted in 90 % accuracy. This work facilitates approach of automatic recognition of isolated words from side pose in Multipose audio visual speech recognition domainwhere partial visual features of face were exists.Keywords: Side pose face detection, stationary wavelet transform, linear predictive analysis, Feature level fusion, KNN classifier

    Topological Feature Selection: A Graph-Based Filter Feature Selection Approach

    Full text link
    In this paper, we introduce a novel unsupervised, graph-based filter feature selection technique which exploits the power of topologically constrained network representations. We model dependency structures among features using a family of chordal graphs (the Triangulated Maximally Filtered Graph), and we maximise the likelihood of features' relevance by studying their relative position inside the network. Such an approach presents three aspects that are particularly satisfactory compared to its alternatives: (i) it is highly tunable and easily adaptable to the nature of input data; (ii) it is fully explainable, maintaining, at the same time, a remarkable level of simplicity; (iii) it is computationally cheaper compared to its alternatives. We test our algorithm on 16 benchmark datasets from different applicative domains showing that it outperforms or matches the current state-of-the-art under heterogeneous evaluation conditions.Comment: 23 pages, 2 figures, 13 table

    Viseme-based Lip-Reading using Deep Learning

    Get PDF
    Research in Automated Lip Reading is an incredibly rich discipline with so many facets that have been the subject of investigation including audio-visual data, feature extraction, classification networks and classification schemas. The most advanced and up-to-date lip-reading systems can predict entire sentences with thousands of different words and the majority of them use ASCII characters as the classification schema. The classification performance of such systems however has been insufficient and the need to cover an ever expanding range of vocabulary using as few classes as possible is challenge. The work in this thesis contributes to the area concerning classification schemas by proposing an automated lip reading model that predicts sentences using visemes as a classification schema. This is an alternative schema to using ASCII characters, which is the conventional class system used to predict sentences. This thesis provides a review of the current trends in deep learning- based automated lip reading and analyses a gap in the research endeavours of automated lip-reading by contributing towards work done in the region of classification schema. A whole new line of research is opened up whereby an alternative way to do lip-reading is explored and in doing so, lip-reading performance results for predicting s entences from a benchmark dataset are attained which improve upon the current state-of-the-art. In this thesis, a neural network-based lip reading system is proposed. The system is lexicon-free and uses purely visual cues. With only a limited number of visemes as classes to recognise, the system is designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training. The lip-reading system predicts sentences as a two-stage procedure with visemes being recognised as the first stage and words being classified as the second stage. This is such that the second-stage has to both overcome the one-to-many mapping problem posed in lip-reading where one set of visemes can map to several words, and the problem of visemes being confused or misclassified to begin with. To develop the proposed lip-reading system, a number of tasks have been performed in this thesis. These include the classification of continuous sequences of visemes; and the proposal of viseme-to-word conversion models that are both effective in their conversion performance of predicting words, and robust to the possibility of viseme confusion or misclassification. The initial system reported has been testified on the challenging BBC Lip Reading Sentences 2 (LRS2) benchmark dataset attaining a word accuracy rate of 64.6%. Compared with the state-of-the-art works in lip reading sentences reported at the time, the system had achieved a significantly improved performance. The lip reading system is further improved upon by using a language model that has been demonstrated to be effective at discriminating between homopheme words and being robust to incorrectly classified visemes. An improved performance in predicting spoken sentences from the LRS2 dataset is yielded with an attained word accuracy rate of 79.6% which is still better than another lip-reading system trained and evaluated on the the same dataset that attained a word accuracy rate 77.4% and it is to the best of our knowledge the next best observed result attained on LRS2

    Geometric Variational Models for Inverse Problems in Imaging

    Get PDF
    This dissertation develops geometric variational models for different inverse problems in imaging that are ill-posed, designing at the same time efficient numerical algorithms to compute their solutions. Variational methods solve inverse problems by the following two steps: formulation of a variational model as a minimization problem, and design of a minimization algorithm to solve it. This dissertation is organized in the same manner. It first formulates minimization problems associated with geometric models for different inverse problems in imaging, and it then designs efficient minimization algorithms to compute their solutions. The minimization problem summarizes both the data available from the measurements and the prior knowledge about the solution in its objective functional; this naturally leads to the combination of a measurement or data term and a prior term. Geometry can play a role in any of these terms, depending on the properties of the data acquisition system or the object being imaged. In this context, each chapter of this dissertation formulates a variational model that includes geometry in a different manner in the objective functional, depending on the inverse problem at hand. In the context of compressed sensing, the first chapter exploits the geometric properties of images to include an alignment term in the sparsity prior of compressed sensing; this additional prior term aligns the normal vectors of the level curves of the image with the reconstructed signal, and it improves the quality of reconstruction. A two-step recovery method is designed for that purpose: first, it estimates the normal vectors to the level curves of the image; second, it reconstructs an image matching the compressed sensing measurements, the geometric alignment of normals, and the sparsity constraint of compressed sensing. The proposed method is extended to non-local operators in graphs for the recovery of textures. The harmonic active contours of Chapter 2 make use of differential geometry to interpret the segmentation of an image as a minimal surface manifold. In this case, geometry is exploited in both the measurement term, by coupling the different image channels in a robust edge detector, and in the prior term, by imposing smoothness in the segmentation. The proposed technique generalizes existing active contours to higher dimensional spaces and non-flat images; in the plane, it improves the segmentation of images with inhomogeneities and weak edges. Shape-from-shading is investigated in Chapter 3 for the reconstruction of a silicon wafer from images of printed circuits taken with a scanning electron microscope. In this case, geometry plays a role in the image acquisition system, that is, in the measurement term of the objective functional. The prior term involves a smoothness constraint on the surface and a shape prior on the expected pattern in the circuit. The proposed reconstruction method also estimates a deformation field between the ideal pattern design and the reconstructed surface, substituting the model of shape variability necessary in shape priors with an elastic deformation field that quantifies deviations in the manufacturing process. Finally, the techniques used for the design of efficient numerical algorithms are explained with an example problem based on the level set method. To this purpose, Chapter 4 develops an efficient algorithm for the level set method when the level set function is constrained to remain a signed distance function. The distance function is preserved by the introduction of an explicit constraint in the minimization problem, the minimization algorithm is efficient by the adequate use of variable-splitting and augmented Lagrangian techniques. These techniques introduce additional variables, constraints, and Lagrange multipliers in the original minimization problem, and they decompose it into sub-optimization problems that are simple and can be efficiently solved. As a result, the proposed algorithm is five to six times faster than the original algorithm for the level set method
    corecore