156 research outputs found
DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding
While deep neural networks have led to human-level performance on computer
vision tasks, they have yet to demonstrate similar gains for holistic scene
understanding. In particular, 3D context has been shown to be an extremely
important cue for scene understanding - yet very little research has been done
on integrating context information with deep models. This paper presents an
approach to embed 3D context into the topology of a neural network trained to
perform holistic scene understanding. Given a depth image depicting a 3D scene,
our network aligns the observed scene with a predefined 3D scene template, and
then reasons about the existence and location of each object within the scene
template. In doing so, our model recognizes multiple objects in a single
forward pass of a 3D convolutional neural network, capturing both global scene
and local object information simultaneously. To create training data for this
3D network, we generate partly hallucinated depth images which are rendered by
replacing real objects with a repository of CAD models of the same object
category. Extensive experiments demonstrate the effectiveness of our algorithm
compared to the state-of-the-arts. Source code and data are available at
http://deepcontext.cs.princeton.edu.Comment: Accepted by ICCV201
BRANCHING NEURAL NETWORKS
A conditional deep learning model that learns specialized representations on a decision tree is described. Unlike similar methods taking a probabilistic mixture of experts (MoE) approach, a feature augmentation based method is used to jointly train all network and decision parameters using back–propagation, which allows for deterministic binary decisions at both training and test time, specializing subtrees exclusively to clusters of data. Feature augmentation involves combining intermediate representations with scores or confidences assigned to branches. Each representation is augmented with all of the scores assigned to the active branch on the computational path to encode the entire path information, which is essential for efficient training of decision functions. These networks are referred to as Branching Neural Networks (BNNs). As this is an approach that is orthogonal to many other neural network compression methods, such algorithms can be combined to achieve much higher compression rates and further speedups
Learning to Navigate the Energy Landscape
In this paper, we present a novel and efficient architecture for addressing
computer vision problems that use `Analysis by Synthesis'. Analysis by
synthesis involves the minimization of the reconstruction error which is
typically a non-convex function of the latent target variables.
State-of-the-art methods adopt a hybrid scheme where discriminatively trained
predictors like Random Forests or Convolutional Neural Networks are used to
initialize local search algorithms. While these methods have been shown to
produce promising results, they often get stuck in local optima. Our method
goes beyond the conventional hybrid architecture by not only proposing multiple
accurate initial solutions but by also defining a navigational structure over
the solution space that can be used for extremely efficient gradient-free local
search. We demonstrate the efficacy of our approach on the challenging problem
of RGB Camera Relocalization. To make the RGB camera relocalization problem
particularly challenging, we introduce a new dataset of 3D environments which
are significantly larger than those found in other publicly-available datasets.
Our experiments reveal that the proposed method is able to achieve
state-of-the-art camera relocalization results. We also demonstrate the
generalizability of our approach on Hand Pose Estimation and Image Retrieval
tasks
User Interface Device with Actuated Buttons
A user interface device with actuated buttons is described. In an embodiment, the user interface device comprises two or more buttons and the motion of the buttons is controlled by actuators under software control such that their motion is inter-related. The position or motion of the buttons may provide a user with feedback about the current state of a software program they are using or provide them with enhanced user input functionality. In another embodiment, the ability to move the buttons is used to reconfigure the user interface buttons and this may be performed dynamically, based on the current state of the software program, or may be performed dependent upon the software program being used. The user interface device may be a peripheral device, such as a mouse or keyboard, or may be integrated within a computing device such as a games device
Quick and dirty : streamlined 3D scanning in archaeology
Capturing data is a key part of archaeological practice, whether for preserving records or to aid interpretation. But the technologies used are complex and expensive, resulting in time-consuming processes associated with their use. These processes force a separation between ongoing interpretive work and capture. Through two field studies we elicit more detail as to what is important about this interpretive work and what might be gained through a closer integration of capture technology with these practices. Drawing on these insights, we go on to present a novel, portable, wireless 3D modeling system that emphasizes "quick and dirty" capture. We discuss its design rational in relation to our field observations and evaluate this rationale further by giving the system to archaeological experts to explore in a variety of settings. While our device compromises on the resolution of traditional 3D scanners, its support of interpretation through emphasis on real-time capture, review and manipulability suggests it could be a valuable tool for the future of archaeology
- …