156 research outputs found

    DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding

    Full text link
    While deep neural networks have led to human-level performance on computer vision tasks, they have yet to demonstrate similar gains for holistic scene understanding. In particular, 3D context has been shown to be an extremely important cue for scene understanding - yet very little research has been done on integrating context information with deep models. This paper presents an approach to embed 3D context into the topology of a neural network trained to perform holistic scene understanding. Given a depth image depicting a 3D scene, our network aligns the observed scene with a predefined 3D scene template, and then reasons about the existence and location of each object within the scene template. In doing so, our model recognizes multiple objects in a single forward pass of a 3D convolutional neural network, capturing both global scene and local object information simultaneously. To create training data for this 3D network, we generate partly hallucinated depth images which are rendered by replacing real objects with a repository of CAD models of the same object category. Extensive experiments demonstrate the effectiveness of our algorithm compared to the state-of-the-arts. Source code and data are available at http://deepcontext.cs.princeton.edu.Comment: Accepted by ICCV201

    BRANCHING NEURAL NETWORKS

    Get PDF
    A conditional deep learning model that learns specialized representations on a decision tree is described. Unlike similar methods taking a probabilistic mixture of experts (MoE) approach, a feature augmentation based method is used to jointly train all network and decision parameters using back–propagation, which allows for deterministic binary decisions at both training and test time, specializing subtrees exclusively to clusters of data. Feature augmentation involves combining intermediate representations with scores or confidences assigned to branches. Each representation is augmented with all of the scores assigned to the active branch on the computational path to encode the entire path information, which is essential for efficient training of decision functions. These networks are referred to as Branching Neural Networks (BNNs). As this is an approach that is orthogonal to many other neural network compression methods, such algorithms can be combined to achieve much higher compression rates and further speedups

    Learning to Navigate the Energy Landscape

    Full text link
    In this paper, we present a novel and efficient architecture for addressing computer vision problems that use `Analysis by Synthesis'. Analysis by synthesis involves the minimization of the reconstruction error which is typically a non-convex function of the latent target variables. State-of-the-art methods adopt a hybrid scheme where discriminatively trained predictors like Random Forests or Convolutional Neural Networks are used to initialize local search algorithms. While these methods have been shown to produce promising results, they often get stuck in local optima. Our method goes beyond the conventional hybrid architecture by not only proposing multiple accurate initial solutions but by also defining a navigational structure over the solution space that can be used for extremely efficient gradient-free local search. We demonstrate the efficacy of our approach on the challenging problem of RGB Camera Relocalization. To make the RGB camera relocalization problem particularly challenging, we introduce a new dataset of 3D environments which are significantly larger than those found in other publicly-available datasets. Our experiments reveal that the proposed method is able to achieve state-of-the-art camera relocalization results. We also demonstrate the generalizability of our approach on Hand Pose Estimation and Image Retrieval tasks

    User Interface Device with Actuated Buttons

    Get PDF
    A user interface device with actuated buttons is described. In an embodiment, the user interface device comprises two or more buttons and the motion of the buttons is controlled by actuators under software control such that their motion is inter-related. The position or motion of the buttons may provide a user with feedback about the current state of a software program they are using or provide them with enhanced user input functionality. In another embodiment, the ability to move the buttons is used to reconfigure the user interface buttons and this may be performed dynamically, based on the current state of the software program, or may be performed dependent upon the software program being used. The user interface device may be a peripheral device, such as a mouse or keyboard, or may be integrated within a computing device such as a games device

    Quick and dirty : streamlined 3D scanning in archaeology

    Get PDF
    Capturing data is a key part of archaeological practice, whether for preserving records or to aid interpretation. But the technologies used are complex and expensive, resulting in time-consuming processes associated with their use. These processes force a separation between ongoing interpretive work and capture. Through two field studies we elicit more detail as to what is important about this interpretive work and what might be gained through a closer integration of capture technology with these practices. Drawing on these insights, we go on to present a novel, portable, wireless 3D modeling system that emphasizes "quick and dirty" capture. We discuss its design rational in relation to our field observations and evaluate this rationale further by giving the system to archaeological experts to explore in a variety of settings. While our device compromises on the resolution of traditional 3D scanners, its support of interpretation through emphasis on real-time capture, review and manipulability suggests it could be a valuable tool for the future of archaeology
    corecore