7 research outputs found
AQuaMaM: An Autoregressive, Quaternion Manifold Model for Rapidly Estimating Complex SO(3) Distributions
Accurately modeling complex, multimodal distributions is necessary for
optimal decision-making, but doing so for rotations in three-dimensions, i.e.,
the SO(3) group, is challenging due to the curvature of the rotation manifold.
The recently described implicit-PDF (IPDF) is a simple, elegant, and effective
approach for learning arbitrary distributions on SO(3) up to a given precision.
However, inference with IPDF requires forward passes through the network's
final multilayer perceptron (where places an upper bound on the likelihood
that can be calculated by the model), which is prohibitively slow for those
without the computational resources necessary to parallelize the queries. In
this paper, I introduce AQuaMaM, a neural network capable of both learning
complex distributions on the rotation manifold and calculating exact
likelihoods for query rotations in a single forward pass. Specifically, AQuaMaM
autoregressively models the projected components of unit quaternions as
mixtures of uniform distributions that partition their geometrically-restricted
domain of values. When trained on an "infinite" toy dataset with ambiguous
viewpoints, AQuaMaM rapidly converges to a sampling distribution closely
matching the true data distribution. In contrast, the sampling distribution for
IPDF dramatically diverges from the true data distribution, despite IPDF
approaching its theoretical minimum evaluation loss during training. When
trained on a constructed dataset of 500,000 renders of a die in different
rotations, AQuaMaM reaches a test log-likelihood 14% higher than IPDF. Further,
compared to IPDF, AQuaMaM uses 24% fewer parameters, has a prediction
throughput 52 faster on a single GPU, and converges in a similar amount
of time during training
Implicit-PDF: Non-Parametric Representation of Probability Distributions on the Rotation Manifold
Single image pose estimation is a fundamental problem in many vision and
robotics tasks, and existing deep learning approaches suffer by not completely
modeling and handling: i) uncertainty about the predictions, and ii) symmetric
objects with multiple (sometimes infinite) correct poses. To this end, we
introduce a method to estimate arbitrary, non-parametric distributions on
SO(3). Our key idea is to represent the distributions implicitly, with a neural
network that estimates the probability given the input image and a candidate
pose. Grid sampling or gradient ascent can be used to find the most likely
pose, but it is also possible to evaluate the probability at any pose, enabling
reasoning about symmetries and uncertainty. This is the most general way of
representing distributions on manifolds, and to showcase the rich expressive
power, we introduce a dataset of challenging symmetric and nearly-symmetric
objects. We require no supervision on pose uncertainty -- the model trains only
with a single pose per example. Nonetheless, our implicit model is highly
expressive to handle complex distributions over 3D poses, while still obtaining
accurate pose estimation on standard non-ambiguous environments, achieving
state-of-the-art performance on Pascal3D+ and ModelNet10-SO(3) benchmarks
On deep generative modelling methods for protein-protein interaction
Proteins form the basis for almost all biological processes, identifying the interactions that proteins have with themselves, the environment, and each other are critical to understanding their biological function in an organism, and thus the impact of drugs designed to affect them. Consequently a significant body of research and development focuses on methods to analyse and predict protein structure and interactions. Due to the breadth of possible interactions and the complexity of structures, \textit{in sillico} methods are used to propose models of both interaction and structure that can then be verified experimentally. However the computational complexity of protein interaction means that full physical simulation of these processes requires exceptional computational resources and is often infeasible. Recent advances in deep generative modelling have shown promise in correctly capturing complex conditional distributions. These models derive their basic principles from statistical mechanics and thermodynamic modelling. While the learned functions of these methods are not guaranteed to be physically accurate, they result in a similar sampling process to that suggested by the thermodynamic principles of protein folding and interaction. However, limited research has been applied to extending these models to work over the space of 3D rotation, limiting their applicability to protein models. In this thesis we develop an accelerated sampling strategy for faster sampling of potential docking locations, we then address the rotational diffusion limitation by extending diffusion models to the space of and finally present a framework for the use of this rotational diffusion model to rigid docking of proteins
Implicit Object Pose Estimation on RGB Images Using Deep Learning Methods
With the rise of robotic and camera systems and the success of deep learning in computer vision,
there is growing interest in precisely determining object positions and orientations. This is crucial for
tasks like automated bin picking, where a camera sensor analyzes images or point clouds to guide a
robotic arm in grasping objects. Pose recognition has broader applications, such as predicting a
car's trajectory in autonomous driving or adapting objects in virtual reality based on the viewer's
perspective.
This dissertation focuses on RGB-based pose estimation methods that use depth information only
for refinement, which is a challenging problem. Recent advances in deep learning have made it
possible to predict object poses in RGB images, despite challenges like object overlap, object
symmetries and more.
We introduce two implicit deep learning-based pose estimation methods for RGB images, covering
the entire process from data generation to pose selection. Furthermore, theoretical findings on
Fourier embeddings are shown to improve the performance of the so-called implicit neural
representations - which are then successfully utilized for the task of implicit pose estimation