9 research outputs found

    Unsupervised discovery of facial events

    Full text link

    On Single-Sequence and Multi-Sequence Factorizations

    Get PDF
    Subspace based factorization methods are commonly used for a variety of applications, such as 3D reconstruction, multi-body segmentation and optical flow estimation. These are usually applied to a single video sequence. In this paper we present an analysis of the multi-sequence case and place it under a single framework with the single sequence case. In particular, we start by analyzing the characteristics of subspace based spatial and temporal segmentation. We show that in many cases objects moving with different 3D motions will be captured as a single object using multi-body (spatial) factorization approaches. Similarly, frames viewing different shapes might be grouped as displaying the same shape in the temporal factorization framework. Temporal factorization provides temporal grouping of frames by employing a subspace based approach to capture non-rigid shape changes (Zelnik-Manor and Irani, 2004). We analyze what causes these degeneracies and show that in the case of multiple sequences these can be made useful and provide information for both temporal synchronization of sequences and spatial matching of points across sequences

    Optical flow segmentation for pedestrian detection

    Get PDF
    This project will study the motion between consecutive frames in a video in order to segment meaningful regions. In particular, video recorded from a camera on top of a car will be used to analyse the pedestrian moving around it. This information could potentially be used to alert the driver of possible obstacles.Pedestrian detection has become an active area of research in recent years. It is widely applied in different applications such as surveillance systems, automotive safety or robotics among others. The current project aims to localize moving objects on sequences of images, focusing on pedestrian detection. First, the apparent motion in the scene will be computed. Afterward motion vectors will be divided into moving objects or background and finally, the resulting segments will be analysed by introducing them into a classifier in order to determine if they are pedestrians or not.La detección de peatones se ha convertido en un área de investigación muy activa en los últimos años. Se aplica en una gran variedad de aplicaciones, como por ejemplo en sistema de vigilancia, seguridad en automóviles o en la robótica, entre otros. Este proyecto pretende localizar objetos en movimiento en secuencias de imágenes, centrando la atención en la detección de peatones. Primeramente, se calculará el movimiento aparente en la escena, a continuación, los vectores de movimiento se dividirán entre objetos móviles y fondo, y finalmente, las segmentaciones obtenidas serán analizadas introduciéndolas en un clasificador, para determinar si se trata de peatones o no. La detecció de vianants s’ha convertit en una àrea d’investigació molt activa en els darrers anys. S’aplica a una gran varietat d’aplicacions com per exemple en sistemes de vigilància, seguretat en automòbils o en la robòtica, entre d’altres. Aquest projecte pretén localitzar objectes en moviment en seqüències d’imatges, centrant-ne l’atenció en la detecció de vianants. Primerament, es calcularà el moviment aparent en l’escena, a continuació, els vectors de moviment es dividiran entre objectes mòbils o en fons estàtic, i finalment, els segments obtinguts seran analitzats introduint-los en un classificador, per tal de determinar si es tracta de vianants o no

    Visual Speech Recognition

    Get PDF
    In recent years, Visual speech recognition has a more concentration, by researchers, than the past. Because of the leakage of the visual processing of the Arabic vocabularies recognition, we start to search in this field. Audio speech recognition concerned with the acoustic characteristic of the signal, but there are many situations that the audio signal is weak of not exist, and this will be a point in Chapter 2. The visual recognition process focuses on the features extracted from video of the speaker. These features are to be classified using several techniques. The most important feature to be extracted is motion. By segmenting motion of the lips of the speaker, an algorithm has manipulate it in such away to recognize the word which is said. But motion segmentation is not the only problem facing the speech recognition process, segmenting the lips itself is an early step in the speech recognition process, so, to segment lips motion we have to segment lips first, a new approach for lip segmentation is proposed in this thesis. Sometimes, motion feature needs another feature to support in recognition the spoken word. So in our thesis another new algorithm is proposed to use motion segmentation by using the Abstract Difference Image from an image series, supported by correlation for registering images in the image series, to recognize ten words in the Arabic language, the words are from “one” to “ten” in Arabic language. The algorithm also uses the HU-Invariant set of features to describe the Abstract Difference Image, and uses a three different recognition methods to recognize the words. The CLAHE method as a filtering technique is used by our algorithm to manipulate lighting problems. Our algorithm based on extracting the differences details from a series of images to recognize the word, achieved an overall results 55.8%, it is an adequate result for our algorithm when integrated in an audio-visual system

    A Semi-Supervised Approach for Kernel-Based Temporal Clustering

    Get PDF
    Temporal clustering refers to the partitioning of a time series into multiple non-overlapping segments that belong to k temporal clusters, in such a way that segments in the same cluster are more similar to each other than to those in other clusters. Temporal clustering is a fundamental task in many fields, such as computer animation, computer vision, health care, and robotics. The applications of temporal clustering in those areas are diverse, and include human-motion imitation and recognition, emotion analysis, human activity segmentation, automated rehabilitation exercise analysis, and human-computer interaction. However, temporal clustering using a completely unsupervised method may not produce satisfactory results. Similar to regular clustering, temporal clustering also benefits from some expert knowledge that may be available. The type of approach that utilizes a small amount of knowledge to “guide” the clustering process is known as “semi-supervised clustering.” Semi-supervised temporal clustering is a strategy in which extra knowledge, in the form of pairwise constraints, is incorporated into the temporal data to help with the partitioning problem. This thesis proposes a process to adapt and transform two kernel-based methods into semi-supervised temporal clustering methods. The proposed process is exclusive to kernel-based clustering methods, and is based on two concepts. First, it uses the idea of instance-level constraints, in the form of must-link and cannot-link, to supervise the clustering methods. Second, it uses a dynamic-programming method to search for the optimal temporal clusters. The proposed process is applied to two algorithms, aligned cluster analysis (ACA) and spectral clustering. To validate the advantages of the proposed temporal semi-supervised clustering methods, a comparative analysis was performed, using the original versions of the algorithm and another semi-supervised temporal cluster. This evaluation was conducted with both synthetic data and two real-world applications. The first application includes two naturalistic audio-visual human emotion datasets, and the second application focuses on human-motion segmentation. Results show substantial improvements in accuracy, with minimal supervision, compared to unsupervised and other temporal semi-supervised approaches, without compromising time performance

    Temporal Factorization vs. Spatial Factorization

    No full text
    The traditional subspace-based approaches to segmentation (often referred to as multi-body factorization approaches) provide spatial clustering/segmentation by grouping together points moving with consistent motions. We are exploring a dual approach to factorization, i.e., obtaining temporal clustering/segmentation by grouping together frames capturing consistent shapes. Temporal cuts are thus detected at non-rigid changes in the shape of the scene/object. In addition it provides a clustering of the frames with consistent shape (but not necessarily same motion). For example, in a sequence showing a face which appears serious at some frames, and is smiling in other frames, all the "serious expression" frames will be grouped together and separated from all the "smile" frames which will be classified as a second group, even though the head may meanwhile undergo various random motions
    corecore