176,676 research outputs found

    Activity Representation from Video Using Statistical Models on Shape Manifolds

    Get PDF
    Activity recognition from video data is a key computer vision problem with applications in surveillance, elderly care, etc. This problem is associated with modeling a representative shape which contains significant information about the underlying activity. In this dissertation, we represent several approaches for view-invariant activity recognition via modeling shapes on various shape spaces and Riemannian manifolds. The first two parts of this dissertation deal with activity modeling and recognition using tracks of landmark feature points. The motion trajectories of points extracted from objects involved in the activity are used to build deformation shape models for each activity, and these models are used for classification and detection of unusual activities. In the first part of the dissertation, these models are represented by the recovered 3D deformation basis shapes corresponding to the activity using a non-rigid structure from motion formulation. We use a theory for estimating the amount of deformation for these models from the visual data. We study the special case of ground plane activities in detail because of its importance in video surveillance applications. In the second part of the dissertation, we propose to model the activity by learning an affine invariant deformation subspace representation that captures the space of possible body poses associated with the activity. These subspaces can be viewed as points on a Grassmann manifold. We propose several statistical classification models on Grassmann manifold that capture the statistical variations of the shape data while following the intrinsic Riemannian geometry of these manifolds. The last part of this dissertation addresses the problem of recognizing human gestures from silhouette images. We represent a human gesture as a temporal sequence of human poses, each characterized by a contour of the associated human silhouette. The shape of a contour is viewed as a point on the shape space of closed curves and, hence, each gesture is characterized and modeled as a trajectory on this shape space. We utilize the Riemannian geometry of this space to propose a template-based and a graphical-based approaches for modeling these trajectories. The two models are designed in such a way to account for the different invariance requirements in gesture recognition, and also capture the statistical variations associated with the contour data

    Particle filtering on large dimensional state spaces and applications in computer vision

    Get PDF
    Tracking of spatio-temporal events is a fundamental problem in computer vision and signal processing in general. For example, keeping track of motion activities from video sequences for abnormality detection or spotting neuronal activity patterns inside the brain from fMRI data. To that end, our research has two main aspects with equal emphasis - first, development of efficient Bayesian filtering frameworks for solving real-world tracking problems and second, understanding the temporal evolution dynamics of physical systems/phenomenon and build statistical models for them. These models facilitate prior information to the trackers as well as lead to intelligent signal processing for computer vision and image understanding. The first part of the dissertation deals with the key signal processing aspects of tracking and the challenges involved. In simple terms, tracking basically is the problem of estimating the hidden state of a system from noisy observed data(from sensors). As frequently encountered in real-life, due to the non-linear and non-Gaussian nature of the state spaces involved, Particle Filters (PF) give an approximate Bayesian inference under such problem setup. However, quite often we are faced with large dimensional state spaces together with multimodal observation likelihood due to occlusion and clutter. This makes the existing particle filters very inefficient for practical purposes. In order to tackle these issues, we have developed and implemented efficient particle filters on large dimensional state spaces with applications to various visual tracking problems in computer vision. In the second part of the dissertation, we develop dynamical models for motion activities inspired by human visual cognitive ability of characterizing temporal evolution pattern of shapes. We take a landmark shape based approach for the representation and tracking of motion activities. Basically, we have developed statistical models for the shape change of a configuration of ``landmark points (key points of interest) over time and to use these models for automatic landmark extraction and tracking, filtering and change detection from video sequences. In this regard, we demonstrate superior performance of our Non-Stationary Shape Activity(NSSA) model in comparison to other existing works. Also, owing to the large dimensional state space of this problem, we have utilized efficient particle filters(PF) for motion activity tracking. In the third part of the dissertation, we develop a visual tracking algorithm that is able to track in presence of illumination variations in the scene. In order to do that we build and learn a dynamical model for 2D illumination patterns based on Legendre basis functions. Under our problem formulation, we pose the visual tracking task as a large dimensional tracking problem in a joint motion-illumination space and thus use an efficient PF algorithm called PF-MT(PF with Mode Tracker) for tracking. In addition, we also demonstrate the use of change/abnormality detection framework for tracking across drastic illumination changes. Experiments with real-life video sequences demonstrate the usefulness of the algorithm while many other existing approaches fail. The last part of the dissertation explores the upcoming field of compressive sensing and looks into the possibilities of leveraging from particle filtering ideas to do better sequential reconstruction (i.e. tracking) of sparse signals from a small number of random linear measurements. Our preliminary results show several promising aspects to such an approach and it is an interesting direction of future research with many potential computer vision applications

    Visual stress: origins and treatment

    Get PDF
    The statistical characteristics of visual images that provoke discomfort generally differ from those of images found in nature. Computational models of the cortex suggest that uncomfortable images are processed inefficiently, a suggestion consistent with the large electrical and haemodynamic cortical response such images induce. The response is greater in individuals who customarily experience visual discomfort, such as those with migraine. Text provides an unnatural image and can be uncomfortable when small and closely spaced. It can provoke illusions of color, shape and motion, just as do patterns of stripes, and these illusions can disturb reading and reading acquisition. Changing the lighting chromaticity can sometimes reduce these illusions, particularly in patients with migraine aura, thereby facilitating reading

    A Deep-structured Conditional Random Field Model for Object Silhouette Tracking

    Full text link
    In this work, we introduce a deep-structured conditional random field (DS-CRF) model for the purpose of state-based object silhouette tracking. The proposed DS-CRF model consists of a series of state layers, where each state layer spatially characterizes the object silhouette at a particular point in time. The interactions between adjacent state layers are established by inter-layer connectivity dynamically determined based on inter-frame optical flow. By incorporate both spatial and temporal context in a dynamic fashion within such a deep-structured probabilistic graphical model, the proposed DS-CRF model allows us to develop a framework that can accurately and efficiently track object silhouettes that can change greatly over time, as well as under different situations such as occlusion and multiple targets within the scene. Experiment results using video surveillance datasets containing different scenarios such as occlusion and multiple targets showed that the proposed DS-CRF approach provides strong object silhouette tracking performance when compared to baseline methods such as mean-shift tracking, as well as state-of-the-art methods such as context tracking and boosted particle filtering.Comment: 17 page

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

    Estimation of Human Body Shape and Posture Under Clothing

    Full text link
    Estimating the body shape and posture of a dressed human subject in motion represented as a sequence of (possibly incomplete) 3D meshes is important for virtual change rooms and security. To solve this problem, statistical shape spaces encoding human body shape and posture variations are commonly used to constrain the search space for the shape estimate. In this work, we propose a novel method that uses a posture-invariant shape space to model body shape variation combined with a skeleton-based deformation to model posture variation. Our method can estimate the body shape and posture of both static scans and motion sequences of dressed human body scans. In case of motion sequences, our method takes advantage of motion cues to solve for a single body shape estimate along with a sequence of posture estimates. We apply our approach to both static scans and motion sequences and demonstrate that using our method, higher fitting accuracy is achieved than when using a variant of the popular SCAPE model as statistical model.Comment: 23 pages, 11 figure

    GAGAN: Geometry-Aware Generative Adversarial Networks

    Full text link
    Deep generative models learned through adversarial training have become increasingly popular for their ability to generate naturalistic image textures. However, aside from their texture, the visual appearance of objects is significantly influenced by their shape geometry; information which is not taken into account by existing generative models. This paper introduces the Geometry-Aware Generative Adversarial Networks (GAGAN) for incorporating geometric information into the image generation process. Specifically, in GAGAN the generator samples latent variables from the probability space of a statistical shape model. By mapping the output of the generator to a canonical coordinate frame through a differentiable geometric transformation, we enforce the geometry of the objects and add an implicit connection from the prior to the generated object. Experimental results on face generation indicate that the GAGAN can generate realistic images of faces with arbitrary facial attributes such as facial expression, pose, and morphology, that are of better quality than current GAN-based methods. Our method can be used to augment any existing GAN architecture and improve the quality of the images generated
    corecore