6 research outputs found

    Lateral interaction in accumulative computation

    Get PDF
    To be able to understand the motion of non-rigid objects, techniques in image processing and computer vision are essential for motion analysis. Lateral interaction in accumulative computation for extracting non-rigid blobs and shapes from an image sequence has recently been presented, as well as its application to segmentation from motion. In this paper we show an architecture consisting of five layers based on spatial and temporal coherence in visual motion analysis with application to visual surveillance. The LIAC method used in general task ?spatio-temporal coherent shape building? consists in (a) spatial coherence for brightness-based image segmentation, (b) temporal coherence for motion-based pixel charge computation, (c) spatial coherence for charge-based pixel charge computation, (d) spatial coherence for charge-based blob fusion, and, (e) spatial coherence for charge-based shape fusion. In our case, temporal coherence (in accumulative computation) is understood as a measure of frame to frame motion persistency on a pixel, whilst spatial coherence (in lateral interaction) is a measure of pixel to neighbouring pixels accumulative charge comparison

    Video sequence motion tracking by fuzzification techniques

    Get PDF
    In this paper a method for moving objects segmentation and tracking from the so-called permanency matrix is introduced. Our motion-based algorithms enable to obtain the shapes of moving objects in video sequences starting from those image pixels where a change in their grey levels is detected between two consecutive frames by means of the permanency values. In the segmentation phase matching between objects along the image sequence is performed by using fuzzy bi-dimensional rectangular regions. The tracking phase performs the association between the various fuzzy regions in all the images through time. Finally, the analysis phase describes motion through a long video sequence. Segmentation, tracking an analysis phases are enhanced through the use of fuzzy logic techniques, which enable to work with the uncertainty of the permanency values due to image noise inherent to computer vision

    Multi-view Geometric Constraints For Human Action Recognition And Tracking

    Get PDF
    Human actions are the essence of a human life and a natural product of the human mind. Analysis of human activities by a machine has attracted the attention of many researchers. This analysis is very important in a variety of domains including surveillance, video retrieval, human-computer interaction, athlete performance investigation, etc. This dissertation makes three major contributions to automatic analysis of human actions. First, we conjecture that the relationship between body joints of two actors in the same posture can be described by a 3D rigid transformation. This transformation simultaneously captures different poses and various sizes and proportions. As a consequence of this conjecture, we show that there exists a fundamental matrix between the imaged positions of the body joints of two actors, if they are in the same posture. Second, we propose a novel projection model for cameras moving at a constant velocity in 3D space, \emph cameras, and derive the Galilean fundamental matrix and apply it to human action recognition. Third, we propose a novel use for the invariant ratio of areas under an affine transformation and utilizing the epipolar geometry between two cameras for 2D model-based tracking of human body joints. In the first part of the thesis, we propose an approach to match human actions using semantic correspondences between human bodies. These correspondences are used to provide geometric constraints between multiple anatomical landmarks ( e.g. hands, shoulders, and feet) to match actions observed from different viewpoints and performed at different rates by actors of differing anthropometric proportions. The fact that the human body has approximate anthropometric proportion allows for innovative use of the machinery of epipolar geometry to provide constraints for analyzing actions performed by people of different anthropometric sizes, while ensuring that changes in viewpoint do not affect matching. A novel measure in terms of rank of matrix constructed only from image measurements of the locations of anatomical landmarks is proposed to ensure that similar actions are accurately recognized. Finally, we describe how dynamic time warping can be used in conjunction with the proposed measure to match actions in the presence of nonlinear time warps. We demonstrate the versatility of our algorithm in a number of challenging sequences and applications including action synchronization , odd one out, following the leader, analyzing periodicity etc. Next, we extend the conventional model of image projection to video captured by a camera moving at constant velocity. We term such moving camera Galilean camera. To that end, we derive the spacetime projection and develop the corresponding epipolar geometry between two Galilean cameras. Both perspective imaging and linear pushbroom imaging form specializations of the proposed model and we show how six different ``fundamental matrices including the classic fundamental matrix, the Linear Pushbroom (LP) fundamental matrix, and a fundamental matrix relating Epipolar Plane Images (EPIs) are related and can be directly recovered from a Galilean fundamental matrix. We provide linear algorithms for estimating the parameters of the the mapping between videos in the case of planar scenes. For applying fundamental matrix between Galilean cameras to human action recognition, we propose a measure that has two important properties. First property makes it possible to recognize similar actions, if their execution rates are linearly related. Second property allows recognizing actions in video captured by Galilean cameras. Thus, the proposed algorithm guarantees that actions can be correctly matched despite changes in view, execution rate, anthropometric proportions of the actor, and even if the camera moves with constant velocity. Finally, we also propose a novel 2D model based approach for tracking human body parts during articulated motion. The human body is modeled as a 2D stick figure of thirteen body joints and an action is considered as a sequence of these stick figures. Given the locations of these joints in every frame of a model video and the first frame of a test video, the joint locations are automatically estimated throughout the test video using two geometric constraints. First, invariance of the ratio of areas under an affine transformation is used for initial estimation of the joint locations in the test video. Second, the epipolar geometry between the two cameras is used to refine these estimates. Using these estimated joint locations, the tracking algorithm determines the exact location of each landmark in the test video using the foreground silhouettes. The novelty of the proposed approach lies in the geometric formulation of human action models, the combination of the two geometric constraints for body joints prediction, and the handling of deviations in anthropometry of individuals, viewpoints, execution rate, and style of performing action. The proposed approach does not require extensive training and can easily adapt to a wide variety of articulated actions

    Traitement automatique de vidéos en LSF. Modélisation et exploitation des contraintes phonologiques du mouvement

    Get PDF
    Dans le domaine du Traitement automatique des langues naturelles, l'exploitation d'Ă©noncĂ©s en langues des signes occupe une place Ă  part. En raison des spĂ©cificitĂ©s propres Ă  la Langue des Signes Française (LSF) comme la simultanĂ©itĂ© de plusieurs paramĂštres, le fort rĂŽle de l'expression du visage, le recours massif Ă  des unitĂ©s gestuelles iconiques et l'utilisation de l'espace pour structurer l'Ă©noncĂ©, de nouvelles mĂ©thodes de traitement doivent ĂȘtres adaptĂ©es Ă  cette langue. Nous exposons d'abord une mĂ©thode de suivi basĂ©e sur un filtre particulaire, permettant de dĂ©terminer Ă  tout moment la position de la tĂȘte, des coudes, du buste et des mains d'un signeur dans une vidĂ©o monovue. Cette mĂ©thode a Ă©tĂ© adaptĂ©e Ă  la LSF pour la rendre plus robuste aux occultations, aux sorties de cadre et aux inversions des mains du signeur. Ensuite, l'analyse de donnĂ©es issues de capture de mouvements nous permet d'aboutir Ă  une catĂ©gorisation de diffĂ©rents mouvements frĂ©quemment utilisĂ©s dans la production de signes. Nous en proposons un modĂšle paramĂ©trique que nous utilisons dans le cadre de la recherche de signes dans une vidĂ©o, Ă  partir d'un exemple vidĂ©o de signe. Ces modĂšles de mouvement sont enfin rĂ©utilisĂ©s dans des applications permettant d'assister un utilisateur dans la crĂ©ation d'images de signe et la segmentation d'une vidĂ©o en signes.There are a lot of differences between sign languages and vocal languages. Among them, we can underline the simultaneity of several parameters, the important role of the face expression, the recurrent use of iconic gestures and the use of signing space to structure utterances. As a consequence, new methods have to be developed and adapted to those languages. At first, we detail a method based on a particle filter to estimate at any time, the position of the signer's head, hands, elbows and shoulders in a monoview video. This method has been adapted to the French Sign Language in order to make it more robust to occlusion, inversion of the signer's hands or disappearance of hands from the video frame. Then, we propose a classification of the motion patterns that are frequently involved in the sign of production, thanks to the analysis of motion capture data. The parametric models associated to each sign pattern are used in the frame of automatic signe retrieval in a video from a filmed sign example. We finally include those models in two applications. The first one helps an user in creating sign pictures. The second one is dedicated to computer aided sign segmentation

    Extracting gestural motion trajectories

    No full text
    This paper is concerned with the extraction of spatiotemporal patterns in video sequences with focus on trajectories of gestural motions associated with American Sign Language. An algorithm is described to extract the motion trajectories of salient features such as human palms from an image sequence. First, motion segmentation of the image sequence is generated based on a multiscale segmentation of the frames and attributed graph matching of regions across frames. This produces region correspondences and their affine transformations. Second, colors of the moving regions are used to determine skin regions. Third, the head and palm regions are identified based on the shape and size of skin regions in motion. Finally, affine transformations defining a region’s motion between successive frames are concatenated to construct the region’s motion trajectory. Experimental results showing the extracted motion trajectories are presented.
    corecore