7 research outputs found

    AQuaMaM: An Autoregressive, Quaternion Manifold Model for Rapidly Estimating Complex SO(3) Distributions

    Full text link
    Accurately modeling complex, multimodal distributions is necessary for optimal decision-making, but doing so for rotations in three-dimensions, i.e., the SO(3) group, is challenging due to the curvature of the rotation manifold. The recently described implicit-PDF (IPDF) is a simple, elegant, and effective approach for learning arbitrary distributions on SO(3) up to a given precision. However, inference with IPDF requires NN forward passes through the network's final multilayer perceptron (where NN places an upper bound on the likelihood that can be calculated by the model), which is prohibitively slow for those without the computational resources necessary to parallelize the queries. In this paper, I introduce AQuaMaM, a neural network capable of both learning complex distributions on the rotation manifold and calculating exact likelihoods for query rotations in a single forward pass. Specifically, AQuaMaM autoregressively models the projected components of unit quaternions as mixtures of uniform distributions that partition their geometrically-restricted domain of values. When trained on an "infinite" toy dataset with ambiguous viewpoints, AQuaMaM rapidly converges to a sampling distribution closely matching the true data distribution. In contrast, the sampling distribution for IPDF dramatically diverges from the true data distribution, despite IPDF approaching its theoretical minimum evaluation loss during training. When trained on a constructed dataset of 500,000 renders of a die in different rotations, AQuaMaM reaches a test log-likelihood 14% higher than IPDF. Further, compared to IPDF, AQuaMaM uses 24% fewer parameters, has a prediction throughput 52×\times faster on a single GPU, and converges in a similar amount of time during training

    Implicit-PDF: Non-Parametric Representation of Probability Distributions on the Rotation Manifold

    Full text link
    Single image pose estimation is a fundamental problem in many vision and robotics tasks, and existing deep learning approaches suffer by not completely modeling and handling: i) uncertainty about the predictions, and ii) symmetric objects with multiple (sometimes infinite) correct poses. To this end, we introduce a method to estimate arbitrary, non-parametric distributions on SO(3). Our key idea is to represent the distributions implicitly, with a neural network that estimates the probability given the input image and a candidate pose. Grid sampling or gradient ascent can be used to find the most likely pose, but it is also possible to evaluate the probability at any pose, enabling reasoning about symmetries and uncertainty. This is the most general way of representing distributions on manifolds, and to showcase the rich expressive power, we introduce a dataset of challenging symmetric and nearly-symmetric objects. We require no supervision on pose uncertainty -- the model trains only with a single pose per example. Nonetheless, our implicit model is highly expressive to handle complex distributions over 3D poses, while still obtaining accurate pose estimation on standard non-ambiguous environments, achieving state-of-the-art performance on Pascal3D+ and ModelNet10-SO(3) benchmarks

    On deep generative modelling methods for protein-protein interaction

    Get PDF
    Proteins form the basis for almost all biological processes, identifying the interactions that proteins have with themselves, the environment, and each other are critical to understanding their biological function in an organism, and thus the impact of drugs designed to affect them. Consequently a significant body of research and development focuses on methods to analyse and predict protein structure and interactions. Due to the breadth of possible interactions and the complexity of structures, \textit{in sillico} methods are used to propose models of both interaction and structure that can then be verified experimentally. However the computational complexity of protein interaction means that full physical simulation of these processes requires exceptional computational resources and is often infeasible. Recent advances in deep generative modelling have shown promise in correctly capturing complex conditional distributions. These models derive their basic principles from statistical mechanics and thermodynamic modelling. While the learned functions of these methods are not guaranteed to be physically accurate, they result in a similar sampling process to that suggested by the thermodynamic principles of protein folding and interaction. However, limited research has been applied to extending these models to work over the space of 3D rotation, limiting their applicability to protein models. In this thesis we develop an accelerated sampling strategy for faster sampling of potential docking locations, we then address the rotational diffusion limitation by extending diffusion models to the space of SO(3)SO(3) and finally present a framework for the use of this rotational diffusion model to rigid docking of proteins

    Implicit Object Pose Estimation on RGB Images Using Deep Learning Methods

    Get PDF
    With the rise of robotic and camera systems and the success of deep learning in computer vision, there is growing interest in precisely determining object positions and orientations. This is crucial for tasks like automated bin picking, where a camera sensor analyzes images or point clouds to guide a robotic arm in grasping objects. Pose recognition has broader applications, such as predicting a car's trajectory in autonomous driving or adapting objects in virtual reality based on the viewer's perspective. This dissertation focuses on RGB-based pose estimation methods that use depth information only for refinement, which is a challenging problem. Recent advances in deep learning have made it possible to predict object poses in RGB images, despite challenges like object overlap, object symmetries and more. We introduce two implicit deep learning-based pose estimation methods for RGB images, covering the entire process from data generation to pose selection. Furthermore, theoretical findings on Fourier embeddings are shown to improve the performance of the so-called implicit neural representations - which are then successfully utilized for the task of implicit pose estimation
    corecore