1,148 research outputs found
LInKs "Lifting Independent Keypoints" -- Partial Pose Lifting for Occlusion Handling with Improved Accuracy in 2D-3D Human Pose Estimation
We present LInKs, a novel unsupervised learning method to recover 3D human
poses from 2D kinematic skeletons obtained from a single image, even when
occlusions are present. Our approach follows a unique two-step process, which
involves first lifting the occluded 2D pose to the 3D domain, followed by
filling in the occluded parts using the partially reconstructed 3D coordinates.
This lift-then-fill approach leads to significantly more accurate results
compared to models that complete the pose in 2D space alone. Additionally, we
improve the stability and likelihood estimation of normalising flows through a
custom sampling function replacing PCA dimensionality reduction previously used
in prior work. Furthermore, we are the first to investigate if different parts
of the 2D kinematic skeleton can be lifted independently which we find by
itself reduces the error of current lifting approaches. We attribute this to
the reduction of long-range keypoint correlations. In our detailed evaluation,
we quantify the error under various realistic occlusion scenarios, showcasing
the versatility and applicability of our model. Our results consistently
demonstrate the superiority of handling all types of occlusions in 3D space
when compared to others that complete the pose in 2D space. Our approach also
exhibits consistent accuracy in scenarios without occlusion, as evidenced by a
7.9% reduction in reconstruction error compared to prior works on the Human3.6M
dataset. Furthermore, our method excels in accurately retrieving complete 3D
poses even in the presence of occlusions, making it highly applicable in
situations where complete 2D pose information is unavailable
MixerFlow for Image Modelling
Normalising flows are statistical models that transform a complex density
into a simpler density through the use of bijective transformations enabling
both density estimation and data generation from a single model. In the context
of image modelling, the predominant choice has been the Glow-based
architecture, whereas alternative architectures remain largely unexplored in
the research community. In this work, we propose a novel architecture called
MixerFlow, based on the MLP-Mixer architecture, further unifying the generative
and discriminative modelling architectures. MixerFlow offers an effective
mechanism for weight sharing for flow-based models. Our results demonstrate
better density estimation on image datasets under a fixed computational budget
and scales well as the image resolution increases, making MixeFlow a powerful
yet simple alternative to the Glow-based architectures. We also show that
MixerFlow provides more informative embeddings than Glow-based architectures
OverFlow: Putting flows on top of neural transducers for better TTS
Neural HMMs are a type of neural transducer recently proposed for
sequence-to-sequence modelling in text-to-speech. They combine the best
features of classic statistical speech synthesis and modern neural TTS,
requiring less data and fewer training updates, and are less prone to gibberish
output caused by neural attention failures. In this paper, we combine neural
HMM TTS with normalising flows for describing the highly non-Gaussian
distribution of speech acoustics. The result is a powerful, fully probabilistic
model of durations and acoustics that can be trained using exact maximum
likelihood. Experiments show that a system based on our proposal needs fewer
updates than comparable methods to produce accurate pronunciations and a
subjective speech quality close to natural speech. Please see
https://shivammehta25.github.io/OverFlow/ for audio examples and code.Comment: 5 pages, 2 figures. Accepted for publication at Interspeech 202
Region-based Appearance and Flow Characteristics for Anomaly Detection in Infrared Surveillance Imagery
Anomaly detection is a classical problem within automated visual surveillance, namely the determination of the normal from the abnormal when operational data availability is highly biased towards one class (normal) due to both insufficient sample size, and inadequate distribution coverage for the other class (abnormal). In this work, we propose the dual use of both visual appearance and localized motion characteristics, derived from optic flow, applied on a per-region basis to facilitate object-wise anomaly detection within this context. Leveraging established object localization techniques from a region proposal network, optic flow is extracted from each object region and combined with appearance in the far infrared (thermal) band to give a 3-channel spatiotemporal tensor representation for each object (1 × thermal - spatial appearance; 2 × optic flow magnitude as x and y components - temporal motion). This formulation is used as the basis for training contemporary semi-supervised anomaly detection approaches in a region-based manner such that anomalous objects can be detected as a combination of appearance and/or motion within the scene. Evaluation is performed using the LongTerm infrared (thermal) Imaging (LTD) benchmark dataset against which successful detection of both anomalous object appearance and motion characteristics are demonstrated using a range of semi-supervised anomaly detection approaches
Probabilistic and Semantic Descriptions of Image Manifolds and Their Applications
This paper begins with a description of methods for estimating probability
density functions for images that reflects the observation that such data is
usually constrained to lie in restricted regions of the high-dimensional image
space - not every pattern of pixels is an image. It is common to say that
images lie on a lower-dimensional manifold in the high-dimensional space.
However, although images may lie on such lower-dimensional manifolds, it is not
the case that all points on the manifold have an equal probability of being
images. Images are unevenly distributed on the manifold, and our task is to
devise ways to model this distribution as a probability distribution. In
pursuing this goal, we consider generative models that are popular in AI and
computer vision community. For our purposes, generative/probabilistic models
should have the properties of 1) sample generation: it should be possible to
sample from this distribution according to the modelled density function, and
2) probability computation: given a previously unseen sample from the dataset
of interest, one should be able to compute the probability of the sample, at
least up to a normalising constant. To this end, we investigate the use of
methods such as normalising flow and diffusion models. We then show that such
probabilistic descriptions can be used to construct defences against
adversarial attacks. In addition to describing the manifold in terms of
density, we also consider how semantic interpretations can be used to describe
points on the manifold. To this end, we consider an emergent language framework
which makes use of variational encoders to produce a disentangled
representation of points that reside on a given manifold. Trajectories between
points on a manifold can then be described in terms of evolving semantic
descriptions.Comment: 23 pages, 17 figures, 1 tabl
Learning Disentangled Representations in the Imaging Domain
Disentangled representation learning has been proposed as an approach to
learning general representations even in the absence of, or with limited,
supervision. A good general representation can be fine-tuned for new target
tasks using modest amounts of data, or used directly in unseen domains
achieving remarkable performance in the corresponding task. This alleviation of
the data and annotation requirements offers tantalising prospects for
applications in computer vision and healthcare. In this tutorial paper, we
motivate the need for disentangled representations, present key theory, and
detail practical building blocks and criteria for learning such
representations. We discuss applications in medical imaging and computer vision
emphasising choices made in exemplar key works. We conclude by presenting
remaining challenges and opportunities.Comment: Submitted. This paper follows a tutorial style but also surveys a
considerable (more than 200 citations) number of work
Normalizing Flows for Human Pose Anomaly Detection
Video anomaly detection is an ill-posed problem because it relies on many
parameters such as appearance, pose, camera angle, background, and more. We
distill the problem to anomaly detection of human pose, thus reducing the risk
of nuisance parameters such as appearance affecting the result. Focusing on
pose alone also has the side benefit of reducing bias against distinct minority
groups. Our model works directly on human pose graph sequences and is
exceptionally lightweight ( parameters), capable of running on any
machine able to run the pose estimation with negligible additional resources.
We leverage the highly compact pose representation in a normalizing flows
framework, which we extend to tackle the unique characteristics of
spatio-temporal pose data and show its advantages in this use case. Our
algorithm uses normalizing flows to learn a bijective mapping between the pose
data distribution and a Gaussian distribution, using spatio-temporal graph
convolution blocks. The algorithm is quite general and can handle training data
of only normal examples, as well as a supervised dataset that consists of
labeled normal and abnormal examples. We report state-of-the-art results on two
anomaly detection benchmarks - the unsupervised ShanghaiTech dataset and the
recent supervised UBnormal dataset
Learning Likelihoods with Conditional Normalizing Flows
Normalizing Flows (NFs) are able to model complicated distributions p(y) with
strong inter-dimensional correlations and high multimodality by transforming a
simple base density p(z) through an invertible neural network under the change
of variables formula. Such behavior is desirable in multivariate structured
prediction tasks, where handcrafted per-pixel loss-based methods inadequately
capture strong correlations between output dimensions. We present a study of
conditional normalizing flows (CNFs), a class of NFs where the base density to
output space mapping is conditioned on an input x, to model conditional
densities p(y|x). CNFs are efficient in sampling and inference, they can be
trained with a likelihood-based objective, and CNFs, being generative flows, do
not suffer from mode collapse or training instabilities. We provide an
effective method to train continuous CNFs for binary problems and in
particular, we apply these CNFs to super-resolution and vessel segmentation
tasks demonstrating competitive performance on standard benchmark datasets in
terms of likelihood and conventional metrics.Comment: 18 pages, 8 Tables, 9 Figures, Preprin
- …