54 research outputs found

    Latent Embedding Clustering for Occlusion Robust Head Pose Estimation

    Full text link
    Head pose estimation has become a crucial area of research in computer vision given its usefulness in a wide range of applications, including robotics, surveillance, or driver attention monitoring. One of the most difficult challenges in this field is managing head occlusions that frequently take place in real-world scenarios. In this paper, we propose a novel and efficient framework that is robust in real world head occlusion scenarios. In particular, we propose an unsupervised latent embedding clustering with regression and classification components for each pose angle. The model optimizes latent feature representations for occluded and non-occluded images through a clustering term while improving fine-grained angle predictions. Experimental evaluation on in-the-wild head pose benchmark datasets reveal competitive performance in comparison to state-of-the-art methodologies with the advantage of having a significant data reduction. We observe a substantial improvement in occluded head pose estimation. Also, an ablation study is conducted to ascertain the impact of the clustering term within our proposed framework.Comment: Accepted at 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG'24

    2D Image head pose estimation via latent space regression under occlusion settings

    Full text link
    Head orientation is a challenging Computer Vision problem that has been extensively researched having a wide variety of applications. However, current state-of-the-art systems still underperform in the presence of occlusions and are unreliable for many task applications in such scenarios. This work proposes a novel deep learning approach for the problem of head pose estimation under occlusions. The strategy is based on latent space regression as a fundamental key to better structure the problem for occluded scenarios. Our model surpasses several state-of-the-art methodologies for occluded HPE, and achieves similar accuracy for non-occluded scenarios. We demonstrate the usefulness of the proposed approach with: (i) two synthetically occluded versions of the BIWI and AFLW2000 datasets, (ii) real-life occlusions of the Pandora dataset, and (iii) a real-life application to human-robot interaction scenarios where face occlusions often occur. Specifically, the autonomous feeding from a robotic arm

    Efficient Optimization Algorithm for Space-Variant Mixture of Vector Fields

    Get PDF
    This paper presents a new algorithm for trajectory classifi- cation of human activities. The presented framework uses a mixture of parametric space-variant vector fields to describe pedestrian’s trajecto- ries. An advantage of the proposed method is that the vector fields are not constant and depend on the pedestrian’s localization. This means that the switching motion among vector fields may occur at any image location and should be accurately estimated. In this paper, the model is equipped with a novel methodology to estimate the switching probabilities among motion regimes. More specifically, we propose an iterative optimization of switching probabilities based on the natural gradient vector, with respect to the Fisher information metric. This approach follows an information geometric framework and contrasts with more traditional approaches of constrained optimization in which euclidean gradient based methods are used combined with probability simplex constraints. We testify the per- formance superiority of the proposed approach in the classification of pedestrian’s trajectories in synthetic and real data sets concerning farfield surveillance scenarios

    MDF-Net for Abnormality Detection by Fusing X-Rays with Clinical Data

    Full text link
    This study investigates the effects of including patients' clinical information on the performance of deep learning (DL) classifiers for disease location in chest X-ray images. Although current classifiers achieve high performance using chest X-ray images alone, our interviews with radiologists indicate that clinical data is highly informative and essential for interpreting images and making proper diagnoses. In this work, we propose a novel architecture consisting of two fusion methods that enable the model to simultaneously process patients' clinical data (structured data) and chest X-rays (image data). Since these data modalities are in different dimensional spaces, we propose a spatial arrangement strategy, spatialization, to facilitate the multimodal learning process in a Mask R-CNN model. We performed an extensive experimental evaluation using MIMIC-Eye, a dataset comprising modalities: MIMIC-CXR (chest X-ray images), MIMIC IV-ED (patients' clinical data), and REFLACX (annotations of disease locations in chest X-rays). Results show that incorporating patients' clinical data in a DL model together with the proposed fusion methods improves the disease localization in chest X-rays by 12\% in terms of Average Precision compared to a standard Mask R-CNN using only chest X-rays. Further ablation studies also emphasize the importance of multimodal DL architectures and the incorporation of patients' clinical data in disease localization. The architecture proposed in this work is publicly available to promote the scientific reproducibility of our study (https://github.com/ChihchengHsieh/multimodal-abnormalities-detection
    corecore