549 research outputs found

    Semantic Graph Convolutional Networks for 3D Human Pose Regression

    Full text link
    In this paper, we study the problem of learning Graph Convolutional Networks (GCNs) for regression. Current architectures of GCNs are limited to the small receptive field of convolution filters and shared transformation matrix for each node. To address these limitations, we propose Semantic Graph Convolutional Networks (SemGCN), a novel neural network architecture that operates on regression tasks with graph-structured data. SemGCN learns to capture semantic information such as local and global node relationships, which is not explicitly represented in the graph. These semantic relationships can be learned through end-to-end training from the ground truth without additional supervision or hand-crafted rules. We further investigate applying SemGCN to 3D human pose regression. Our formulation is intuitive and sufficient since both 2D and 3D human poses can be represented as a structured graph encoding the relationships between joints in the skeleton of a human body. We carry out comprehensive studies to validate our method. The results prove that SemGCN outperforms state of the art while using 90% fewer parameters.Comment: In CVPR 2019 (13 pages including supplementary material). The code can be found at https://github.com/garyzhao/SemGC

    A human motion feature based on semi-supervised learning of GMM

    Get PDF
    Using motion capture to create naturally looking motion sequences for virtual character animation has become a standard procedure in the games and visual effects industry. With the fast growth of motion data, the task of automatically annotating new motions is gaining an importance. In this paper, we present a novel statistic feature to represent each motion according to the pre-labeled categories of key-poses. A probabilistic model is trained with semi-supervised learning of the Gaussian mixture model (GMM). Each pose in a given motion could then be described by a feature vector of a series of probabilities by GMM. A motion feature descriptor is proposed based on the statistics of all pose features. The experimental results and comparison with existing work show that our method performs more accurately and efficiently in motion retrieval and annotation

    Representations for Cognitive Vision : a Review of Appearance-Based, Spatio-Temporal, and Graph-Based Approaches

    Get PDF
    The emerging discipline of cognitive vision requires a proper representation of visual information including spatial and temporal relationships, scenes, events, semantics and context. This review article summarizes existing representational schemes in computer vision which might be useful for cognitive vision, a and discusses promising future research directions. The various approaches are categorized according to appearance-based, spatio-temporal, and graph-based representations for cognitive vision. While the representation of objects has been covered extensively in computer vision research, both from a reconstruction as well as from a recognition point of view, cognitive vision will also require new ideas how to represent scenes. We introduce new concepts for scene representations and discuss how these might be efficiently implemented in future cognitive vision systems

    Linear Regression and Unsupervised Learning For Tracking and Embodied Robot Control.

    Get PDF
    Computer vision problems, such as tracking and robot navigation, tend to be solved using models of the objects of interest to the problem. These models are often either hard-coded, or learned in a supervised manner. In either case, an engineer is required to identify the visual information that is important to the task, which is both time consuming and problematic. Issues with these engineered systems relate to the ungrounded nature of the knowledge imparted by the engineer, where the systems have no meaning attached to the representations. This leads to systems that are brittle and are prone to failure when expected to act in environments not envisaged by the engineer. The work presented in this thesis removes the need for hard-coded or engineered models of either visual information representations or behaviour. This is achieved by developing novel approaches for learning from example, in both input (percept) and output (action) spaces. This approach leads to the development of novel feature tracking algorithms, and methods for robot control. Applying this approach to feature tracking, unsupervised learning is employed, in real time, to build appearance models of the target that represent the input space structure, and this structure is exploited to partition banks of computationally efficient, linear regression based target displacement estimators. This thesis presents the first application of regression based methods to the problem of simultaneously modeling and tracking a target object. The computationally efficient Linear Predictor (LP) tracker is investigated, along with methods for combining and weighting flocks of LP’s. The tracking algorithms developed operate with accuracy comparable to other state of the art online approaches and with a significant gain in computational efficiency. This is achieved as a result of two specific contributions. First, novel online approaches for the unsupervised learning of modes of target appearance that identify aspects of the target are introduced. Second, a general tracking framework is developed within which the identified aspects of the target are adaptively associated to subsets of a bank of LP trackers. This results in the partitioning of LP’s and the online creation of aspect specific LP flocks that facilitate tracking through significant appearance changes. Applying the approach to the percept action domain, unsupervised learning is employed to discover the structure of the action space, and this structure is used in the formation of meaningful perceptual categories, and to facilitate the use of localised input-output (percept-action) mappings. This approach provides a realisation of an embodied and embedded agent that organises its perceptual space and hence its cognitive process based on interactions with its environment. Central to the proposed approach is the technique of clustering an input-output exemplar set, based on output similarity, and using the resultant input exemplar groupings to characterise a perceptual category. All input exemplars that are coupled to a certain class of outputs form a category - the category of a given affordance, action or function. In this sense the formed perceptual categories have meaning and are grounded in the embodiment of the agent. The approach is shown to identify the relative importance of perceptual features and is able to solve percept-action tasks, defined only by demonstration, in previously unseen situations. Within this percept-action learning framework, two alternative approaches are developed. The first approach employs hierarchical output space clustering of point-to-point mappings, to achieve search efficiency and input and output space generalisation as well as a mechanism for identifying the important variance and invariance in the input space. The exemplar hierarchy provides, in a single structure, a mechanism for classifying previously unseen inputs and generating appropriate outputs. The second approach to a percept-action learning framework integrates the regression mappings used in the feature tracking domain, with the action space clustering and imitation learning techniques developed in the percept-action domain. These components are utilised within a novel percept-action data mining methodology, that is able to discover the visual entities that are important to a specific problem, and to map from these entities onto the action space. Applied to the robot control task, this approach allows for real-time generation of continuous action signals, without the use of any supervision or definition of representations or rules of behaviour

    Efficient Human Pose Estimation with Image-dependent Interactions

    Get PDF
    Human pose estimation from 2D images is one of the most challenging and computationally-demanding problems in computer vision. Standard models such as Pictorial Structures consider interactions between kinematically connected joints or limbs, leading to inference cost that is quadratic in the number of pixels. As a result, researchers and practitioners have restricted themselves to simple models which only measure the quality of limb-pair possibilities by their 2D geometric plausibility. In this talk, we propose novel methods which allow for efficient inference in richer models with data-dependent interactions. First, we introduce structured prediction cascades, a structured analog of binary cascaded classifiers, which learn to focus computational effort where it is needed, filtering out many states cheaply while ensuring the correct output is unfiltered. Second, we propose a way to decompose models of human pose with cyclic dependencies into a collection of tree models, and provide novel methods to impose model agreement. Finally, we develop a local linear approach that learns bases centered around modes in the training data, giving us image-dependent local models which are fast and accurate. These techniques allow for sparse and efficient inference on the order of minutes or seconds per image. As a result, we can afford to model pairwise interaction potentials much more richly with data-dependent features such as contour continuity, segmentation alignment, color consistency, optical flow and multiple modes. We show empirically that these richer models are worthwhile, obtaining significantly more accurate pose estimation on popular datasets

    Semantic Localization and Mapping in Robot Vision

    Get PDF
    Integration of human semantics plays an increasing role in robotics tasks such as mapping, localization and detection. Increased use of semantics serves multiple purposes, including giving computers the ability to process and present data containing human meaningful concepts, allowing computers to employ human reasoning to accomplish tasks. This dissertation presents three solutions which incorporate semantics onto visual data in order to address these problems. First, on the problem of constructing topological maps from sequence of images. The proposed solution includes a novel image similarity score which uses dynamic programming to match images using both appearance and relative positions of local features simultaneously. An MRF is constructed to model the probability of loop-closures and a locally optimal labeling is found using Loopy-BP. The recovered loop closures are then used to generate a topological map. Results are presented on four urban sequences and one indoor sequence. The second system uses video and annotated maps to solve localization. Data association is achieved through detection of object classes, annotated in prior maps, rather than through detection of visual features. To avoid the caveats of object recognition, a new representation of query images is introduced consisting of a vector of detection scores for each object class. Using soft object detections, hypotheses about pose are refined through particle filtering. Experiments include both small office spaces, and a large open urban rail station with semantically ambiguous places. This approach showcases a representation that is both robust and can exploit the plethora of existing prior maps for GPS-denied environments while avoiding the data association problems encountered when matching point clouds or visual features. Finally, a purely vision-based approach for constructing semantic maps given camera pose and simple object exemplar images. Object response heatmaps are combined with known pose to back-project detection information onto the world. These update the world model, integrating information over time as the camera moves. The approach avoids making hard decisions on object recognition, and aggregates evidence about objects in the world coordinate system. These solutions simultaneously showcase the contribution of semantics in robotics and provide state of the art solutions to these fundamental problems

    Task-adaptable, Pervasive Perception for Robots Performing Everyday Manipulation

    Get PDF
    Intelligent robotic agents that help us in our day-to-day chores have been an aspiration of robotics researchers for decades. More than fifty years since the creation of the first intelligent mobile robotic agent, robots are still struggling to perform seemingly simple tasks, such as setting or cleaning a table. One of the reasons for this is that the unstructured environments these robots are expected to work in impose demanding requirements on a robota s perception system. Depending on the manipulation task the robot is required to execute, different parts of the environment need to be examined, the objects in it found and functional parts of these identified. This is a challenging task, since the visual appearance of the objects and the variety of scenes they are found in are large. This thesis proposes to treat robotic visual perception for everyday manipulation tasks as an open question-asnswering problem. To this end RoboSherlock, a framework for creating task-adaptable, pervasive perception systems is presented. Using the framework, robot perception is addressed from a systema s perspective and contributions to the state-of-the-art are proposed that introduce several enhancements which scale robot perception toward the needs of human-level manipulation. The contributions of the thesis center around task-adaptability and pervasiveness of perception systems. A perception task-language and a language interpreter that generates task-relevant perception plans is proposed. The task-language and task-interpreter leverage the power of knowledge representation and knowledge-based reasoning in order to enhance the question-answering capabilities of the system. Pervasiveness, a seamless integration of past, present and future percepts, is achieved through three main contributions: a novel way for recording, replaying and inspecting perceptual episodic memories, a new perception component that enables pervasive operation and maintains an object belief state and a novel prospection component that enables robots to relive their past experiences and anticipate possible future scenarios. The contributions are validated through several real world robotic experiments that demonstrate how the proposed system enhances robot perception
    • …
    corecore