1,741 research outputs found
Learning to Navigate the Energy Landscape
In this paper, we present a novel and efficient architecture for addressing
computer vision problems that use `Analysis by Synthesis'. Analysis by
synthesis involves the minimization of the reconstruction error which is
typically a non-convex function of the latent target variables.
State-of-the-art methods adopt a hybrid scheme where discriminatively trained
predictors like Random Forests or Convolutional Neural Networks are used to
initialize local search algorithms. While these methods have been shown to
produce promising results, they often get stuck in local optima. Our method
goes beyond the conventional hybrid architecture by not only proposing multiple
accurate initial solutions but by also defining a navigational structure over
the solution space that can be used for extremely efficient gradient-free local
search. We demonstrate the efficacy of our approach on the challenging problem
of RGB Camera Relocalization. To make the RGB camera relocalization problem
particularly challenging, we introduce a new dataset of 3D environments which
are significantly larger than those found in other publicly-available datasets.
Our experiments reveal that the proposed method is able to achieve
state-of-the-art camera relocalization results. We also demonstrate the
generalizability of our approach on Hand Pose Estimation and Image Retrieval
tasks
Steered mixture-of-experts for light field images and video : representation and coding
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution
Advances in identifying osseous fractured areas and virtually reducing bone fractures
[ES]Esta tesis pretende el desarrollo de técnicas asistidas por ordenador para ayudar a los especialistas durante la planificación preoperatoria de una reducción de fractura ósea. Como resultado, puede reducirse el tiempo de intervención y pueden evitarse errores de interpretación, con los consecuentes beneficios en el tratamiento y en el tiempo de recuperación del paciente. La planificación asistida por ordenador de una reducción de fractura ósea puede dividirse en tres grandes etapas: identificación de fragmentos óseos a partir de imágenes médicas, cálculo de la reducción y posterior estabilización de la fractura, y evaluación de los resultados obtenidos. La etapa de identificación puede incluir también la generación de modelos 3D de fragmentos óseos. Esta tesis aborda la identificación de fragmentos óseos a partir de imágenes médicas generadas por TC, la generación de modelos 3D de fragmentos, y el cálculo de la reducción de fracturas, sin incluir el uso de elementos de fijación.[EN]The aim of this work is the development of computer-assisted techniques for helping specialists in the pre-operative planning of bone fracture reduction. As a result, intervention time may be reduced and potential misinterpretations circumvented, with the consequent benefits in the treatment and recovery time of the patient. The computer-assisted planning of a bone fracture reduction may be divided into three main stages: identification of bone fragments from medical images, computation of the reduction and subsequent stabilization of the fracture, and evaluation of the obtained results. The identification stage may include the generation of 3D models of bone fragments, with the purpose of obtaining useful models for the two subsequent stages. This thesis deals with the identification of bone fragments from CT scans, the generation of 3D models of bone fragments, and the computation of the fracture reduction excluding the use of fixation devices.Tesis Univ. Jaén. Departamento de Informática. Leída 19 de septiembre de 201
InfiniCity: Infinite-Scale City Synthesis
Toward infinite-scale 3D city synthesis, we propose a novel framework,
InfiniCity, which constructs and renders an unconstrainedly large and
3D-grounded environment from random noises. InfiniCity decomposes the seemingly
impractical task into three feasible modules, taking advantage of both 2D and
3D data. First, an infinite-pixel image synthesis module generates
arbitrary-scale 2D maps from the bird's-eye view. Next, an octree-based voxel
completion module lifts the generated 2D map to 3D octrees. Finally, a
voxel-based neural rendering module texturizes the voxels and renders 2D
images. InfiniCity can thus synthesize arbitrary-scale and traversable 3D city
environments, and allow flexible and interactive editing from users. We
quantitatively and qualitatively demonstrate the efficacy of the proposed
framework. Project page: https://hubert0527.github.io/infinicity
ROAM: Robust and Object-aware Motion Generation using Neural Pose Descriptors
Existing automatic approaches for 3D virtual character motion synthesis
supporting scene interactions do not generalise well to new objects outside
training distributions, even when trained on extensive motion capture datasets
with diverse objects and annotated interactions. This paper addresses this
limitation and shows that robustness and generalisation to novel scene objects
in 3D object-aware character synthesis can be achieved by training a motion
model with as few as one reference object. We leverage an implicit feature
representation trained on object-only datasets, which encodes an
SE(3)-equivariant descriptor field around the object. Given an unseen object
and a reference pose-object pair, we optimise for the object-aware pose that is
closest in the feature space to the reference pose. Finally, we use l-NSM,
i.e., our motion generation model that is trained to seamlessly transition from
locomotion to object interaction with the proposed bidirectional pose blending
scheme. Through comprehensive numerical comparisons to state-of-the-art methods
and in a user study, we demonstrate substantial improvements in 3D virtual
character motion and interaction quality and robustness to scenarios with
unseen objects. Our project page is available at
https://vcai.mpi-inf.mpg.de/projects/ROAM/.Comment: 12 pages, 10 figures; project page:
https://vcai.mpi-inf.mpg.de/projects/ROAM
- …