1,741 research outputs found

    Learning to Navigate the Energy Landscape

    Full text link
    In this paper, we present a novel and efficient architecture for addressing computer vision problems that use `Analysis by Synthesis'. Analysis by synthesis involves the minimization of the reconstruction error which is typically a non-convex function of the latent target variables. State-of-the-art methods adopt a hybrid scheme where discriminatively trained predictors like Random Forests or Convolutional Neural Networks are used to initialize local search algorithms. While these methods have been shown to produce promising results, they often get stuck in local optima. Our method goes beyond the conventional hybrid architecture by not only proposing multiple accurate initial solutions but by also defining a navigational structure over the solution space that can be used for extremely efficient gradient-free local search. We demonstrate the efficacy of our approach on the challenging problem of RGB Camera Relocalization. To make the RGB camera relocalization problem particularly challenging, we introduce a new dataset of 3D environments which are significantly larger than those found in other publicly-available datasets. Our experiments reveal that the proposed method is able to achieve state-of-the-art camera relocalization results. We also demonstrate the generalizability of our approach on Hand Pose Estimation and Image Retrieval tasks

    Steered mixture-of-experts for light field images and video : representation and coding

    Get PDF
    Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution

    Advances in identifying osseous fractured areas and virtually reducing bone fractures

    Get PDF
    [ES]Esta tesis pretende el desarrollo de técnicas asistidas por ordenador para ayudar a los especialistas durante la planificación preoperatoria de una reducción de fractura ósea. Como resultado, puede reducirse el tiempo de intervención y pueden evitarse errores de interpretación, con los consecuentes beneficios en el tratamiento y en el tiempo de recuperación del paciente. La planificación asistida por ordenador de una reducción de fractura ósea puede dividirse en tres grandes etapas: identificación de fragmentos óseos a partir de imágenes médicas, cálculo de la reducción y posterior estabilización de la fractura, y evaluación de los resultados obtenidos. La etapa de identificación puede incluir también la generación de modelos 3D de fragmentos óseos. Esta tesis aborda la identificación de fragmentos óseos a partir de imágenes médicas generadas por TC, la generación de modelos 3D de fragmentos, y el cálculo de la reducción de fracturas, sin incluir el uso de elementos de fijación.[EN]The aim of this work is the development of computer-assisted techniques for helping specialists in the pre-operative planning of bone fracture reduction. As a result, intervention time may be reduced and potential misinterpretations circumvented, with the consequent benefits in the treatment and recovery time of the patient. The computer-assisted planning of a bone fracture reduction may be divided into three main stages: identification of bone fragments from medical images, computation of the reduction and subsequent stabilization of the fracture, and evaluation of the obtained results. The identification stage may include the generation of 3D models of bone fragments, with the purpose of obtaining useful models for the two subsequent stages. This thesis deals with the identification of bone fragments from CT scans, the generation of 3D models of bone fragments, and the computation of the fracture reduction excluding the use of fixation devices.Tesis Univ. Jaén. Departamento de Informática. Leída 19 de septiembre de 201

    InfiniCity: Infinite-Scale City Synthesis

    Full text link
    Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises. InfiniCity decomposes the seemingly impractical task into three feasible modules, taking advantage of both 2D and 3D data. First, an infinite-pixel image synthesis module generates arbitrary-scale 2D maps from the bird's-eye view. Next, an octree-based voxel completion module lifts the generated 2D map to 3D octrees. Finally, a voxel-based neural rendering module texturizes the voxels and renders 2D images. InfiniCity can thus synthesize arbitrary-scale and traversable 3D city environments, and allow flexible and interactive editing from users. We quantitatively and qualitatively demonstrate the efficacy of the proposed framework. Project page: https://hubert0527.github.io/infinicity

    ROAM: Robust and Object-aware Motion Generation using Neural Pose Descriptors

    Full text link
    Existing automatic approaches for 3D virtual character motion synthesis supporting scene interactions do not generalise well to new objects outside training distributions, even when trained on extensive motion capture datasets with diverse objects and annotated interactions. This paper addresses this limitation and shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object. We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object. Given an unseen object and a reference pose-object pair, we optimise for the object-aware pose that is closest in the feature space to the reference pose. Finally, we use l-NSM, i.e., our motion generation model that is trained to seamlessly transition from locomotion to object interaction with the proposed bidirectional pose blending scheme. Through comprehensive numerical comparisons to state-of-the-art methods and in a user study, we demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects. Our project page is available at https://vcai.mpi-inf.mpg.de/projects/ROAM/.Comment: 12 pages, 10 figures; project page: https://vcai.mpi-inf.mpg.de/projects/ROAM
    corecore