54 research outputs found
Latent Embedding Clustering for Occlusion Robust Head Pose Estimation
Head pose estimation has become a crucial area of research in computer vision
given its usefulness in a wide range of applications, including robotics,
surveillance, or driver attention monitoring. One of the most difficult
challenges in this field is managing head occlusions that frequently take place
in real-world scenarios. In this paper, we propose a novel and efficient
framework that is robust in real world head occlusion scenarios. In particular,
we propose an unsupervised latent embedding clustering with regression and
classification components for each pose angle. The model optimizes latent
feature representations for occluded and non-occluded images through a
clustering term while improving fine-grained angle predictions. Experimental
evaluation on in-the-wild head pose benchmark datasets reveal competitive
performance in comparison to state-of-the-art methodologies with the advantage
of having a significant data reduction. We observe a substantial improvement in
occluded head pose estimation. Also, an ablation study is conducted to
ascertain the impact of the clustering term within our proposed framework.Comment: Accepted at 18th IEEE International Conference on Automatic Face and
Gesture Recognition (FG'24
2D Image head pose estimation via latent space regression under occlusion settings
Head orientation is a challenging Computer Vision problem that has been
extensively researched having a wide variety of applications. However, current
state-of-the-art systems still underperform in the presence of occlusions and
are unreliable for many task applications in such scenarios. This work proposes
a novel deep learning approach for the problem of head pose estimation under
occlusions. The strategy is based on latent space regression as a fundamental
key to better structure the problem for occluded scenarios. Our model surpasses
several state-of-the-art methodologies for occluded HPE, and achieves similar
accuracy for non-occluded scenarios. We demonstrate the usefulness of the
proposed approach with: (i) two synthetically occluded versions of the BIWI and
AFLW2000 datasets, (ii) real-life occlusions of the Pandora dataset, and (iii)
a real-life application to human-robot interaction scenarios where face
occlusions often occur. Specifically, the autonomous feeding from a robotic
arm
Efficient Optimization Algorithm for Space-Variant Mixture of Vector Fields
This paper presents a new algorithm for trajectory classifi- cation of human activities. The presented framework uses a mixture of parametric space-variant vector fields to describe pedestrian’s trajecto- ries. An advantage of the proposed method is that the vector fields are not constant and depend on the pedestrian’s localization. This means that the switching motion among vector fields may occur at any image location and should be accurately estimated. In this paper, the model is equipped with a novel methodology to estimate the switching probabilities among motion regimes. More specifically, we propose an iterative optimization of switching probabilities based on the natural gradient vector, with respect to the Fisher information metric. This approach follows an information geometric framework and contrasts with more traditional approaches of constrained optimization in which euclidean gradient based methods are used combined with probability simplex constraints. We testify the per- formance superiority of the proposed approach in the classification of pedestrian’s trajectories in synthetic and real data sets concerning farfield surveillance scenarios
MDF-Net for Abnormality Detection by Fusing X-Rays with Clinical Data
This study investigates the effects of including patients' clinical
information on the performance of deep learning (DL) classifiers for disease
location in chest X-ray images. Although current classifiers achieve high
performance using chest X-ray images alone, our interviews with radiologists
indicate that clinical data is highly informative and essential for
interpreting images and making proper diagnoses.
In this work, we propose a novel architecture consisting of two fusion
methods that enable the model to simultaneously process patients' clinical data
(structured data) and chest X-rays (image data). Since these data modalities
are in different dimensional spaces, we propose a spatial arrangement strategy,
spatialization, to facilitate the multimodal learning process in a Mask R-CNN
model. We performed an extensive experimental evaluation using MIMIC-Eye, a
dataset comprising modalities: MIMIC-CXR (chest X-ray images), MIMIC IV-ED
(patients' clinical data), and REFLACX (annotations of disease locations in
chest X-rays).
Results show that incorporating patients' clinical data in a DL model
together with the proposed fusion methods improves the disease localization in
chest X-rays by 12\% in terms of Average Precision compared to a standard Mask
R-CNN using only chest X-rays. Further ablation studies also emphasize the
importance of multimodal DL architectures and the incorporation of patients'
clinical data in disease localization. The architecture proposed in this work
is publicly available to promote the scientific reproducibility of our study
(https://github.com/ChihchengHsieh/multimodal-abnormalities-detection
- …