3,392 research outputs found

    Graph kernels between point clouds

    Get PDF
    Point clouds are sets of points in two or three dimensions. Most kernel methods for learning on sets of points have not yet dealt with the specific geometrical invariances and practical constraints associated with point clouds in computer vision and graphics. In this paper, we present extensions of graph kernels for point clouds, which allow to use kernel methods for such ob jects as shapes, line drawings, or any three-dimensional point clouds. In order to design rich and numerically efficient kernels with as few free parameters as possible, we use kernels between covariance matrices and their factorizations on graphical models. We derive polynomial time dynamic programming recursions and present applications to recognition of handwritten digits and Chinese characters from few training examples

    A Systematic Study and Empirical Analysis of Lip Reading Models using Traditional and Deep Learning Algorithms

    Get PDF
    Despite the fact that there are many applications for analyzing and recreating the audio through existinglip movement recognition, the researchers have shown the interest in developing the automatic lip-readingsystems to achieve the increased performance. Modelling of the framework has been playing a major role inadvance yield of sequential framework. In recent years there have been lot of interest in Deep Neural Networks(DNN) and break through results in various domains including Image Classification, Speech Recognition andNatural Language Processing. To represents complex functions DNNs are used and also they play a vital rolein Automatic Lip Reading (ALR) systems. This paper mainly focuses on the traditional pixel, shape and mixedfeature extractions and their improved technologies for lip reading recognitions. It highlights the mostimportant techniques and progression from end-to-end deep learning architectures that were evolved duringthe past decade. The investigation points out the voice-visual databases that are used for analyzing and trainthe system with the most common words and the count of speakers and the size, length of the language andtime duration. On the flip side, ALR systems developed were compared with their old-style systems. Thestatistical analysis is performed to recognize the characters or numerals and words or sentences in English andcompared their performances

    A novel lip geometry approach for audio-visual speech recognition

    Get PDF
    By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. Various method have been studied by research group around the world to incorporate lip movements into speech recognition in recent years, however exactly how best to incorporate the additional visual information is still not known. This study aims to extend the knowledge of relationships between visual and speech information specifically using lip geometry information due to its robustness to head rotation and the fewer number of features required to represent movement. A new method has been developed to extract lip geometry information, to perform classification and to integrate visual and speech modalities. This thesis makes several contributions. First, this work presents a new method to extract lip geometry features using the combination of a skin colour filter, a border following algorithm and a convex hull approach. The proposed method was found to improve lip shape extraction performance compared to existing approaches. Lip geometry features including height, width, ratio, area, perimeter and various combinations of these features were evaluated to determine which performs best when representing speech in the visual domain. Second, a novel template matching technique able to adapt dynamic differences in the way words are uttered by speakers has been developed, which determines the best fit of an unseen feature signal to those stored in a database template. Third, following on evaluation of integration strategies, a novel method has been developed based on alternative decision fusion strategy, in which the outcome from the visual and speech modality is chosen by measuring the quality of audio based on kurtosis and skewness analysis and driven by white noise confusion. Finally, the performance of the new methods introduced in this work are evaluated using the CUAVE and LUNA-V data corpora under a range of different signal to noise ratio conditions using the NOISEX-92 dataset

    Musicians have better memory than nonmusicians: A meta-analysis

    Get PDF
    Background Several studies have found that musicians perform better than nonmusicians in memory tasks, but this is not always the case, and the strength of this apparent advantage is unknown. Here, we conducted a meta-analysis with the aim of clarifying whether musicians perform better than nonmusicians in memory tasks. Methods Education Source; PEP (WEB)\u2014Psychoanalytic Electronic Publishing; Psychology and Behavioral Science (EBSCO); PsycINFO (Ovid); PubMed; ScienceDirect\u2014AllBooks Content (Elsevier API); SCOPUS (Elsevier API); SocINDEX with Full Text (EBSCO) and Google Scholar were searched for eligible studies. The selected studies involved two groups of participants: young adult musicians and nonmusicians. All the studies included memory tasks (loading long-term, short-term or working memory) that contained tonal, verbal or visuospatial stimuli. Three meta-analyses were run separately for long-term memory, short-term memory and working memory. Results We collected 29 studies, including 53 memory tasks. The results showed that musicians performed better than nonmusicians in terms of long-term memory, g = .29, 95% CI (.08\u2013.51), short-term memory, g = .57, 95% CI (.41\u2013.73), and working memory, g = .56, 95% CI (.33\u2013.80). To further explore the data, we included a moderator (the type of stimulus presented, i.e., tonal, verbal or visuospatial), which was found to influence the effect size for short-term and working memory, but not for long-term memory. In terms of short-term and working memory, the musicians\u2019 advantage was large with tonal stimuli, moderate with verbal stimuli, and small or null with visuospatial stimuli. Conclusions The three meta-analyses revealed a small effect size for long-term memory, and a medium effect size for short-term and working memory, suggesting that musicians perform better than nonmusicians in memory tasks. Moreover, the effect of the moderator suggested that, the type of stimuli influences this advantage

    Geometrical-based lip-reading using template probabilistic multi-dimension dynamic time warping

    Get PDF
    By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. In this paper, we present a geometrical-based automatic lip reading system that extracts the lip region from images using conventional techniques, but the contour itself is extracted using a novel application of a combination of border following and convex hull approaches. Classification is carried out using an enhanced dynamic time warping technique that has the ability to operate in multiple dimensions and a template probability technique that is able to compensate for differences in the way words are uttered in the training set. The performance of the new system has been assessed in recognition of the English digits 0 to 9 as available in the CUAVE database. The experimental results obtained from the new approach compared favorably with those of existing lip reading approaches, achieving a word recognition accuracy of up to 71% with the visual information being obtained from estimates of lip height, width and their ratio
    • …
    corecore