75 research outputs found
Learning to Reconstruct Shapes from Unseen Classes
From a single image, humans are able to perceive the full 3D shape of an
object by exploiting learned shape priors from everyday life. Contemporary
single-image 3D reconstruction algorithms aim to solve this task in a similar
fashion, but often end up with priors that are highly biased by training
classes. Here we present an algorithm, Generalizable Reconstruction (GenRe),
designed to capture more generic, class-agnostic shape priors. We achieve this
with an inference network and training procedure that combine 2.5D
representations of visible surfaces (depth and silhouette), spherical shape
representations of both visible and non-visible surfaces, and 3D voxel-based
representations, in a principled manner that exploits the causal structure of
how 3D shapes give rise to 2D images. Experiments demonstrate that GenRe
performs well on single-view shape reconstruction, and generalizes to diverse
novel objects from categories not seen during training.Comment: NeurIPS 2018 (Oral). The first two authors contributed equally to
this paper. Project page: http://genre.csail.mit.edu
Cognitive coping strategies and motivational profiles of ultra-endurance athletes
Mental skills have been shown to be effective in helping endurance athletes cope with the challenges of their sport. The purpose of this research was to examine ultraendurance triathlete’s cognitive coping strategies, while also exploring their motivational profiles.
Using a phenomenological qualitative research methodology, guided by a grounded theoretical approach, 10 elite deca-ironmen, 8 male and 2 female, took part in interviews and completed two questionnaires, Motivations of Marathoners Scales ('Masters, Ogles & Jolton, 1993) and Athletes Skill and Coping Inventory-28, (Smith, Schultz, Smoll & Ptacek, 1995). The researcher was provided the profitable opportunity for gathenng extensive and informative material by living and working with the participants during the deca-ironman 2002 world championships, for a duration of seventeen days. Text units (sentences or phrases) were descriptively coded into meaningful themes and categories. Both constant comparison and deviant case analysis (Strauss & Corbin, 1998, Basics of Qualitative Research Techniques and Procedures for Developing Grounded Theory California Sage) guarded against narrow and premature interpretation of data.
Qualities required to complete the deca-ironman emerged from the data, these include a strong mental focus, intrinsic motivation, high levels of self-efficacy and the ability to withstand pam and boredom while overcoming fatigue. Deca-ironmen use cognitive coping strategies used in mainstream sporting disciplines, such as goal setting and self-deception to overcome adversities presented m ultra-endurance, along with certain strategies that are unique to this group, for example meditation. Unique motivations emerged both in the pilot and mam studies demonstrating some ultraendurance athletes are motivated by a unique motivational climate, where athletes persevere due to external pressures placed by dedications made to either chanties or individuals.
Some qualities are inherent in the individual but others have to be developed through participation in high-level endurance events. Training in these unique strategies will assist m better performance. It is hoped findings from this research will assist future ultra-endurance athletes to improve personal performance levels
SCENE AND ACTION UNDERSTANDING USING CONTEXT AND KNOWLEDGE SHARING
Complete scene understanding from video data involves spatio-temporal decision making over long sequences and utilization of world knowledge. We propose a method that captures edge connections between these spatio-temporal components or knowledge graphs through a graph convolution network (GCN). Our approach uses the GCN to fuse various information in the video like detected objects, human pose, scene information etc. for action segmentation. For certain functions like zero shot and few shot action recognition, we learn a classifier for unseen test classes through comparison with similar training classes. We provide information about similarity between two classes through an explicit relationship map i.e. the knowledge graph. We study different kinds of knowledge graphs based on action phrases, verbs or nouns and visual features to demonstrate how they perform with respect to each other. We build an integrated approach for zero-shot and few-shot learning. We also show further improvements through adaptive learning of the input knowledge graphs and using triplet loss along with the task specific loss while training. We add results for semi-supervised learning as well to understand improvements from our graph learning technique.
For complete scene understanding, we also study depth completion using deep depth prior based on the deep image prior (DIP) technique. DIP shows that structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images. Given color images and noisy or incomplete target depth maps, we optimize a randomly-initialized CNN model to reconstruct a depth map restored by virtue of using the CNN network structure as a prior combined with a view-constrained photo-consistency loss. This loss is computed using images from a geometrically calibrated camera from nearby viewpoints. It is based on test time optimization, so it is independent of training data distributions. We apply this deep depth prior for inpainting and refining incomplete and noisy depth maps within both binocular and multi-view stereo pipelines
Articulated human tracking and behavioural analysis in video sequences
Recently, there has been a dramatic growth of interest in the observation and tracking
of human subjects through video sequences. Arguably, the principal impetus has come
from the perceived demand for technological surveillance, however applications in entertainment,
intelligent domiciles and medicine are also increasing. This thesis examines
human articulated tracking and the classi cation of human movement, rst separately
and then as a sequential process.
First, this thesis considers the development and training of a 3D model of human body
structure and dynamics. To process video sequences, an observation model is also designed
with a multi-component likelihood based on edge, silhouette and colour. This is de ned on
the articulated limbs, and visible from a single or multiple cameras, each of which may be
calibrated from that sequence. Second, for behavioural analysis, we develop a methodology
in which actions and activities are described by semantic labels generated from a Movement
Cluster Model (MCM). Third, a Hierarchical Partitioned Particle Filter (HPPF) was
developed for human tracking that allows multi-level parameter search consistent with the
body structure. This tracker relies on the articulated motion prediction provided by the
MCM at pose or limb level. Fourth, tracking and movement analysis are integrated to
generate a probabilistic activity description with action labels.
The implemented algorithms for tracking and behavioural analysis are tested extensively
and independently against ground truth on human tracking and surveillance
datasets. Dynamic models are shown to predict and generate synthetic motion, while
MCM recovers both periodic and non-periodic activities, de ned either on the whole body
or at the limb level. Tracking results are comparable with the state of the art, however
the integrated behaviour analysis adds to the value of the approach.Overseas Research Students Awards Scheme (ORSAS
Synthetic image generation and the use of virtual environments for image enhancement tasks
Deep learning networks are often difficult to train if there are insufficient image samples. Gathering real-world images tailored for a specific job takes a lot of work to perform. This dissertation explores techniques for synthetic image generation and virtual environments for various image enhancement/ correction/restoration tasks, specifically distortion correction, dehazing, shadow removal, and intrinsic image decomposition. First, given various image formation equations, such as those used in distortion correction and dehazing, synthetic image samples can be produced, provided that the equation is well-posed. Second, using virtual environments to train various image models is applicable for simulating real-world effects that are otherwise difficult to gather or replicate, such as dehazing and shadow removal. Given synthetic images, one cannot train a network directly on it as there is a possible gap between the synthetic and real domains. We have devised several techniques for generating synthetic images and formulated domain adaptation methods where our trained deep-learning networks perform competitively in distortion correction, dehazing, and shadow removal. Additional studies and directions are provided for the intrinsic image decomposition problem and the exploration of procedural content generation, where a virtual Philippine city was created as an initial prototype.
Keywords: image generation, image correction, image dehazing, shadow removal, intrinsic image decomposition, computer graphics, rendering, machine learning, neural networks, domain adaptation, procedural content generation
- …