507 research outputs found
Toward Reliable Human Pose Forecasting with Uncertainty
Recently, there has been an arms race of pose forecasting methods aimed at
solving the spatio-temporal task of predicting a sequence of future 3D poses of
a person given a sequence of past observed ones. However, the lack of unified
benchmarks and limited uncertainty analysis have hindered progress in the
field. To address this, we first develop an open-source library for human pose
forecasting, featuring multiple models, datasets, and standardized evaluation
metrics, with the aim of promoting research and moving toward a unified and
fair evaluation. Second, we devise two types of uncertainty in the problem to
increase performance and convey better trust: 1) we propose a method for
modeling aleatoric uncertainty by using uncertainty priors to inject knowledge
about the behavior of uncertainty. This focuses the capacity of the model in
the direction of more meaningful supervision while reducing the number of
learned parameters and improving stability; 2) we introduce a novel approach
for quantifying the epistemic uncertainty of any model through clustering and
measuring the entropy of its assignments. Our experiments demonstrate up to
improvements in accuracy and better performance in uncertainty
estimation
Parametric active learning techniques for 3D hand pose estimation
Active learning (AL) has recently gained popularity for deep learning (DL) models due to efficient and informative sampling, especially when the models
require large-scale datasets. The DL models designed for 3D-HPE demand
accurate and diverse large-scale datasets that are time-consuming, costly and
require experts. This thesis aims to explore AL primarily for the 3D hand
pose estimation (3D-HPE) task for the first time.
The thesis delves directly into an AL methodology customised for 3D-HPE learners to address this. Because predominantly the learners are regression-based algorithms, a Bayesian approximation of a DL architecture is presented to model uncertainties. This approximation generates data and model-
dependent uncertainties that are further combined with the data representativeness AL function, CoreSet, for sampling. Despite being the first work, it
creates informative samples and minimal joint errors with less training data
on three well-known depth datasets.
The second AL algorithm continues to improve the selection following a
new trend of parametric samplers. Precisely, this is proceeded task-agnostic with a Graph Convolutional Network (GCN) to offer higher order of representations between labelled and unlabelled data. The newly selected unlabelled
images are ranked based on uncertainty or GCN feature distribution.
Another novel sampler extends this idea, and tackles encountered AL issues,
like cold-start and distribution shift, by training in a self-supervised way with
contrastive learning. It shows leveraging the visual concepts from labelled
and unlabelled images while attaining state-of-the-art results.
The last part of the thesis brings prior AL insights and achievements in a
unified parametric-based sampler proposal for the multi-modal 3D-HPE task.
This sampler trains multi-variational auto-encoders to align the modalities
and provide better selection representation. Several query functions are
studied to open a new direction in deep AL sampling.Open Acces
Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
Hand trajectory forecasting from egocentric views is crucial for enabling a
prompt understanding of human intentions when interacting with AR/VR systems.
However, existing methods handle this problem in a 2D image space which is
inadequate for 3D real-world applications. In this paper, we set up an
egocentric 3D hand trajectory forecasting task that aims to predict hand
trajectories in a 3D space from early observed RGB videos in a first-person
view. To fulfill this goal, we propose an uncertainty-aware state space
Transformer (USST) that takes the merits of the attention mechanism and
aleatoric uncertainty within the framework of the classical state-space model.
The model can be further enhanced by the velocity constraint and visual prompt
tuning (VPT) on large vision transformers. Moreover, we develop an annotation
workflow to collect 3D hand trajectories with high quality. Experimental
results on H2O and EgoPAT3D datasets demonstrate the superiority of USST for
both 2D and 3D trajectory forecasting. The code and datasets are publicly
released: https://actionlab-cv.github.io/EgoHandTrajPred.Comment: ICCV 2023 Accepted (Camera Ready
Semi-Supervised Deep Regression with Uncertainty Consistency and Variational Model Ensembling via Bayesian Neural Networks
Deep regression is an important problem with numerous applications. These
range from computer vision tasks such as age estimation from photographs, to
medical tasks such as ejection fraction estimation from echocardiograms for
disease tracking. Semi-supervised approaches for deep regression are notably
under-explored compared to classification and segmentation tasks, however.
Unlike classification tasks, which rely on thresholding functions for
generating class pseudo-labels, regression tasks use real number target
predictions directly as pseudo-labels, making them more sensitive to prediction
quality. In this work, we propose a novel approach to semi-supervised
regression, namely Uncertainty-Consistent Variational Model Ensembling (UCVME),
which improves training by generating high-quality pseudo-labels and
uncertainty estimates for heteroscedastic regression. Given that aleatoric
uncertainty is only dependent on input data by definition and should be equal
for the same inputs, we present a novel uncertainty consistency loss for
co-trained models. Our consistency loss significantly improves uncertainty
estimates and allows higher quality pseudo-labels to be assigned greater
importance under heteroscedastic regression. Furthermore, we introduce a novel
variational model ensembling approach to reduce prediction noise and generate
more robust pseudo-labels. We analytically show our method generates higher
quality targets for unlabeled data and further improves training. Experiments
show that our method outperforms state-of-the-art alternatives on different
tasks and can be competitive with supervised methods that use full labels. Our
code is available at https://github.com/xmed-lab/UCVME.Comment: Accepted by AAAI2
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression
Visual-inertial localization is a key problem in computer vision and robotics
applications such as virtual reality, self-driving cars, and aerial vehicles.
The goal is to estimate an accurate pose of an object when either the
environment or the dynamics are known. Recent methods directly regress the pose
using convolutional and spatio-temporal networks. Absolute pose regression
(APR) techniques predict the absolute camera pose from an image input in a
known scene. Odometry methods perform relative pose regression (RPR) that
predicts the relative pose from a known object dynamic (visual or inertial
inputs). The localization task can be improved by retrieving information of
both data sources for a cross-modal setup, which is a challenging problem due
to contradictory tasks. In this work, we conduct a benchmark to evaluate deep
multimodal fusion based on PGO and attention networks. Auxiliary and Bayesian
learning are integrated for the APR task. We show accuracy improvements for the
RPR-aided APR task and for the RPR-RPR task for aerial vehicles and hand-held
devices. We conduct experiments on the EuRoC MAV and PennCOSYVIO datasets, and
record a novel industry dataset.Comment: Under revie
Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks
Despite the state-of-the-art performance for medical image segmentation, deep
convolutional neural networks (CNNs) have rarely provided uncertainty
estimations regarding their segmentation outputs, e.g., model (epistemic) and
image-based (aleatoric) uncertainties. In this work, we analyze these different
types of uncertainties for CNN-based 2D and 3D medical image segmentation
tasks. We additionally propose a test-time augmentation-based aleatoric
uncertainty to analyze the effect of different transformations of the input
image on the segmentation output. Test-time augmentation has been previously
used to improve segmentation accuracy, yet not been formulated in a consistent
mathematical framework. Hence, we also propose a theoretical formulation of
test-time augmentation, where a distribution of the prediction is estimated by
Monte Carlo simulation with prior distributions of parameters in an image
acquisition model that involves image transformations and noise. We compare and
combine our proposed aleatoric uncertainty with model uncertainty. Experiments
with segmentation of fetal brains and brain tumors from 2D and 3D Magnetic
Resonance Images (MRI) showed that 1) the test-time augmentation-based
aleatoric uncertainty provides a better uncertainty estimation than calculating
the test-time dropout-based model uncertainty alone and helps to reduce
overconfident incorrect predictions, and 2) our test-time augmentation
outperforms a single-prediction baseline and dropout-based multiple
predictions.Comment: 13 pages, 8 figures, accepted by NeuroComputin
Reasoning with Uncertainty in Deep Learning for Safer Medical Image Computing
Deep learning is now ubiquitous in the research field of medical image computing. As such technologies progress towards clinical translation, the question of safety becomes critical. Once deployed, machine learning systems unavoidably face situations where the correct decision or prediction is ambiguous. However, the current methods disproportionately rely on deterministic algorithms, lacking a mechanism to represent and manipulate uncertainty. In safety-critical applications such as medical imaging, reasoning under uncertainty is crucial for developing a reliable decision making system. Probabilistic machine learning provides a natural framework to quantify the degree of uncertainty over different variables of interest, be it the prediction, the model parameters and structures, or the underlying data (images and labels). Probability distributions are used to represent all the uncertain unobserved quantities in a model and how they relate to the data, and probability theory is used as a language to compute and manipulate these distributions. In this thesis, we explore probabilistic modelling as a framework to integrate uncertainty information into deep learning models, and demonstrate its utility in various high-dimensional medical imaging applications. In the process, we make several fundamental enhancements to current methods. We categorise our contributions into three groups according to the types of uncertainties being modelled: (i) predictive; (ii) structural and (iii) human uncertainty. Firstly, we discuss the importance of quantifying predictive uncertainty and understanding its sources for developing a risk-averse and transparent medical image enhancement application. We demonstrate how a measure of predictive uncertainty can be used as a proxy for the predictive accuracy in the absence of ground-truths. Furthermore, assuming the structure of the model is flexible enough for the task, we introduce a way to decompose the predictive uncertainty into its orthogonal sources i.e. aleatoric and parameter uncertainty. We show the potential utility of such decoupling in providing a quantitative “explanations” into the model performance. Secondly, we introduce our recent attempts at learning model structures directly from data. One work proposes a method based on variational inference to learn a posterior distribution over connectivity structures within a neural network architecture for multi-task learning, and share some preliminary results in the MR-only radiotherapy planning application. Another work explores how the training algorithm of decision trees could be extended to grow the architecture of a neural network to adapt to the given availability of data and the complexity of the task. Lastly, we develop methods to model the “measurement noise” (e.g., biases and skill levels) of human annotators, and integrate this information into the learning process of the neural network classifier. In particular, we show that explicitly modelling the uncertainty involved in the annotation process not only leads to an improvement in robustness to label noise, but also yields useful insights into the patterns of errors that characterise individual experts
- …