461 research outputs found

    Methods for Recognizing Pose and Action of Articulated Objects with Collection of Planes in Motion

    Get PDF
    The invention comprises an improved system, method, and computer-readable instructions for recognizing pose and action of articulated objects with collection of planes in motion. The method starts with a video sequence and a database of reference sequences corresponding to different known actions. The method identifies the sequence from the reference sequences such that the subject in performs the closest action to that observed. The method compares actions by comparing pose transitions. The cross-homography invariant may be used for view-invariant recognition of human body pose transition and actions

    Study Of Human Activity In Video Data With An Emphasis On View-invariance

    Get PDF
    The perception and understanding of human motion and action is an important area of research in computer vision that plays a crucial role in various applications such as surveillance, HCI, ergonomics, etc. In this thesis, we focus on the recognition of actions in the case of varying viewpoints and different and unknown camera intrinsic parameters. The challenges to be addressed include perspective distortions, differences in viewpoints, anthropometric variations, and the large degrees of freedom of articulated bodies. In addition, we are interested in methods that require little or no training. The current solutions to action recognition usually assume that there is a huge dataset of actions available so that a classifier can be trained. However, this means that in order to define a new action, the user has to record a number of videos from different viewpoints with varying camera intrinsic parameters and then retrain the classifier, which is not very practical from a development point of view. We propose algorithms that overcome these challenges and require just a few instances of the action from any viewpoint with any intrinsic camera parameters. Our first algorithm is based on the rank constraint on the family of planar homographies associated with triplets of body points. We represent action as a sequence of poses, and decompose the pose into triplets. Therefore, the pose transition is broken down into a set of movement of body point planes. In this way, we transform the non-rigid motion of the body points into a rigid motion of body point iii planes. We use the fact that the family of homographies associated with two identical poses would have rank 4 to gauge similarity of the pose between two subjects, observed by different perspective cameras and from different viewpoints. This method requires only one instance of the action. We then show that it is possible to extend the concept of triplets to line segments. In particular, we establish that if we look at the movement of line segments instead of triplets, we have more redundancy in data thus leading to better results. We demonstrate this concept on “fundamental ratios.” We decompose a human body pose into line segments instead of triplets and look at set of movement of line segments. This method needs only three instances of the action. If a larger dataset is available, we can also apply weighting on line segments for better accuracy. The last method is based on the concept of “Projective Depth”. Given a plane, we can find the relative depth of a point relative to the given plane. We propose three different ways of using “projective depth:” (i) Triplets - the three points of a triplet along with the epipole defines the plane and the movement of points relative to these body planes can be used to recognize actions; (ii) Ground plane - if we are able to extract the ground plane, we can find the “projective depth” of the body points with respect to it. Therefore, the problem of action recognition would translate to curve matching; and (iii) Mirror person - We can use the mirror view of the person to extract mirror symmetric planes. This method also needs only one instance of the action. Extensive experiments are reported on testing view invariance, robustness to noisy localization and occlusions of body points, and action recognition. The experimental results are very promising and demonstrate the efficiency of our proposed invariants. i

    Angular variation as a monocular cue for spatial percepcion

    Get PDF
    Monocular cues are spatial sensory inputs which are picked up exclusively from one eye. They are in majority static features that provide depth information and are extensively used in graphic art to create realistic representations of a scene. Since the spatial information contained in these cues is picked up from the retinal image, the existence of a link between it and the theory of direct perception can be conveniently assumed. According to this theory, spatial information of an environment is directly contained in the optic array. Thus, this assumption makes possible the modeling of visual perception processes through computational approaches. In this thesis, angular variation is considered as a monocular cue, and the concept of direct perception is adopted by a computer vision approach that considers it as a suitable principle from which innovative techniques to calculate spatial information can be developed. The expected spatial information to be obtained from this monocular cue is the position and orientation of an object with respect to the observer, which in computer vision is a well known field of research called 2D-3D pose estimation. In this thesis, the attempt to establish the angular variation as a monocular cue and thus the achievement of a computational approach to direct perception is carried out by the development of a set of pose estimation methods. Parting from conventional strategies to solve the pose estimation problem, a first approach imposes constraint equations to relate object and image features. In this sense, two algorithms based on a simple line rotation motion analysis were developed. These algorithms successfully provide pose information; however, they depend strongly on scene data conditions. To overcome this limitation, a second approach inspired in the biological processes performed by the human visual system was developed. It is based in the proper content of the image and defines a computational approach to direct perception. The set of developed algorithms analyzes the visual properties provided by angular variations. The aim is to gather valuable data from which spatial information can be obtained and used to emulate a visual perception process by establishing a 2D-3D metric relation. Since it is considered fundamental in the visual-motor coordination and consequently essential to interact with the environment, a significant cognitive effect is produced by the application of the developed computational approach in environments mediated by technology. In this work, this cognitive effect is demonstrated by an experimental study where a number of participants were asked to complete an action-perception task. The main purpose of the study was to analyze the visual guided behavior in teleoperation and the cognitive effect caused by the addition of 3D information. The results presented a significant influence of the 3D aid in the skill improvement, which showed an enhancement of the sense of presence.Las señales monoculares son entradas sensoriales capturadas exclusivamente por un solo ojo que ayudan a la percepción de distancia o espacio. Son en su mayoría características estáticas que proveen información de profundidad y son muy utilizadas en arte gráfico para crear apariencias reales de una escena. Dado que la información espacial contenida en dichas señales son extraídas de la retina, la existencia de una relación entre esta extracción de información y la teoría de percepción directa puede ser convenientemente asumida. De acuerdo a esta teoría, la información espacial de todo le que vemos está directamente contenido en el arreglo óptico. Por lo tanto, esta suposición hace posible el modelado de procesos de percepción visual a través de enfoques computacionales. En esta tesis doctoral, la variación angular es considerada como una señal monocular, y el concepto de percepción directa adoptado por un enfoque basado en algoritmos de visión por computador que lo consideran un principio apropiado para el desarrollo de nuevas técnicas de cálculo de información espacial. La información espacial esperada a obtener de esta señal monocular es la posición y orientación de un objeto con respecto al observador, lo cual en visión por computador es un conocido campo de investigación llamado estimación de la pose 2D-3D. En esta tesis doctoral, establecer la variación angular como señal monocular y conseguir un modelo matemático que describa la percepción directa, se lleva a cabo mediante el desarrollo de un grupo de métodos de estimación de la pose. Partiendo de estrategias convencionales, un primer enfoque implanta restricciones geométricas en ecuaciones para relacionar características del objeto y la imagen. En este caso, dos algoritmos basados en el análisis de movimientos de rotación de una línea recta fueron desarrollados. Estos algoritmos exitosamente proveen información de la pose. Sin embargo, dependen fuertemente de condiciones de la escena. Para superar esta limitación, un segundo enfoque inspirado en los procesos biológicos ejecutados por el sistema visual humano fue desarrollado. Está basado en el propio contenido de la imagen y define un enfoque computacional a la percepción directa. El grupo de algoritmos desarrollados analiza las propiedades visuales suministradas por variaciones angulares. El propósito principal es el de reunir datos de importancia con los cuales la información espacial pueda ser obtenida y utilizada para emular procesos de percepción visual mediante el establecimiento de relaciones métricas 2D- 3D. Debido a que dicha relación es considerada fundamental en la coordinación visuomotora y consecuentemente esencial para interactuar con lo que nos rodea, un efecto cognitivo significativo puede ser producido por la aplicación de métodos de L estimación de pose en entornos mediados tecnológicamente. En esta tesis doctoral, este efecto cognitivo ha sido demostrado por un estudio experimental en el cual un número de participantes fueron invitados a ejecutar una tarea de acción-percepción. El propósito principal de este estudio fue el análisis de la conducta guiada visualmente en teleoperación y el efecto cognitivo causado por la inclusión de información 3D. Los resultados han presentado una influencia notable de la ayuda 3D en la mejora de la habilidad, así como un aumento de la sensación de presencia

    Euclidean reconstruction and reprojection up to subgroups

    Get PDF
    The necessaryand sufficient conditionsfor being able to estimatescene structure, motion and camera calibration from a sequence of images are very rarely satisfied in practice. What exactly can be estimated in sequences of practical importance, when such conditions are not satisfied? In this paper we give a complete answer to this question. For every camera motion that fails to meet the conditions, we give explicit formulas for the ambiguities in the reconstructed scene, motion and calibration. Such a characterization is crucial both for designing robust estimation algorithms (that do not try to recover parameters that cannot be recovered), and for generating novel views of the scene by controlling the vantage point. To this end, we characterizeexplicitly all the vantage points that give rise to a valid Euclidean reprojection regardless of the ambiguity in the reconstruction. We also characterize vantage points that generate views that are altogether invariant to the ambiguity. All the results are presented using simple notation that involves no tensors nor complex projective geometry, and should be accessible with basic background in linear algebra. 1

    Geometric Invariance In The Analysis Of Human Motion In Video Data

    Get PDF
    Human motion analysis is one of the major problems in computer vision research. It deals with the study of the motion of human body in video data from different aspects, ranging from the tracking of body parts and reconstruction of 3D human body configuration, to higher level of interpretation of human action and activities in image sequences. When human motion is observed through video camera, it is perspectively distorted and may appear totally different from different viewpoints. Therefore it is highly challenging to establish correct relationships between human motions across video sequences with different camera settings. In this work, we investigate the geometric invariance in the motion of human body, which is critical to accurately understand human motion in video data regardless of variations in camera parameters and viewpoints. In human action analysis, the representation of human action is a very important issue, and it usually determines the nature of the solutions, including their limits in resolving the problem. Unlike existing research that study human motion as a whole 2D/3D object or a sequence of postures, we study human motion as a sequence of body pose transitions. We also decompose a human body pose further into a number of body point triplets, and break down a pose transition into the transition of a set of body point triplets. In this way the study of complex non-rigid motion of human body is reduced to that of the motion of rigid body point triplets, i.e. a collection of planes in motion. As a result, projective geometry and linear algebra can be applied to explore the geometric invariance in human motion. Based on this formulation, we have discovered the fundamental ratio invariant and the eigenvalue equality invariant in human motion. We also propose solutions based on these geometric invariants to the problems of view-invariant recognition of human postures and actions, as well as analysis of human motion styles. These invariants and their applicability have been validated by experimental results supporting that their effectiveness in understanding human motion with various camera parameters and viewpoints
    corecore