19 research outputs found

    LONG WAVE INFRARED IMAGE COLORIZATION FOR PERSON RE-IDENTIFICATION

    Get PDF
    Person re-identification (ReID) in color and thermal images require matching of the object color and its temperature. While thermal cameras increase the performance of ReID systems during the night-time, identification of corresponding features in the visible and the long-wave infrared range is challenging. The biggest challenge arises from the multimodal relationship between an object’s color and its temperature. Modern ReID methods provide state-of-the-art results in person matching in the visible range. Hence, it is possible to perform multimodal matching by translation of a thermal probe image to the color domain. After that, the synthetic color probe image is matched with images from the real color gallery set. This paper is focused on the development of the ThermalReID multispectral person ReID framework. The framework performs matching in two steps. Firstly, it colorizes the input thermal probe image using a Generative Adversarial Network (GAN). Secondly, it matches images in the color domain using color histograms and MSCR features. We evaluate the ThermalReID framework using RegDB and ThermalWorld datasets. The results of the evaluation are twofold. Firstly, the developed GAN performs realistic colorization of thermal images. Secondly, the ThermalReID framework provides matching of persons in color and thermal images that compete with and surpass the state-of-the-art. The developed ThermalReID framework can be used in video surveillance systems for effective person ReID during the nighttime

    Apprentissage de la Cohérence Photométrique pour la Reconstruction de Formes Multi-Vues

    Get PDF
    International audienceWith the rise of augmented and virtual reality, estimating accurate shapes from multi-view RGB images is becoming an important task in computer vision. The dominant strategy employed for that purpose in the recent years relies on depth maps estimation followed by depth fusion, as depth maps prove to be efficient in recovering local surface details. Motivated by recent success of convolutional neural networks, we take this strategy a step further and present a novel solution for depth map estimation which consists in sweeping a volume along projected rays from a camera, and inferring surface presence probability at a point, seen by an arbitrary number of cameras. A strong motivation behind this work is to study the ability of learning based features to outperform traditional 2D features when estimating depth from multi-view cues. Especially with real life dynamic scenes, containing multiple moving subjects with complex surface details, scenarios where previous image based MVS methods fail to recover accurate details. Our results demonstrate this ability, showing that a CNN, trained on a standard static dataset, can help recovering surface details on dynamic scenes that are not visible to traditional 2D feature based methods. In addition, our evaluation also includes a comparison to existing reconstruction pipelines on the standard evaluation dataset we used to train our network with, showing that our solution performs on par or better than these approaches.L'essor des technologies de réalité virtuelle et augmentée s'accompagne d'un besoin accru de contenus appropriés à ces technologies et à leurs méthodes de visualisation. En particulier, la capacité à produire des contenus réels visualisables en 3D devient prépondérante. Nous considérons dans cet article le problème de la reconstruction de scènes 3D dynamiques à partir d'images couleurs. Nous intéressons tout particulièrement à la possibilité de bénéficier des réseaux de neurones convolutifs dans ce processus de reconstruction pour l'améliorer de manière effective. Les méthodes les plus récentes de reconstruction multi-vues estiment des cartes de profondeur par vue et fusionnent ensuite ces cartes dans une forme implicite 3D. Une étape clé de ces méthodes réside dans l'estimation des cartes de profondeurs. Cette étape est traditionnellement effectuée par la recherche de correspondances multi-vues à l'aide de critères de photo-cohérence. Nous proposons ici d'apprendre cette fonction de photo-cohérence sur des exemples au lieu de la définir à travers la corrélation de descripteurs photométriques, comme c'est le cas dans la plupart des méthodes actuelles. L'intuition est que la corrélation de descripteurs d'images est intrinsèquement contrainte et limitée, et que les réseaux profonds ont la capacité d'apprendre des configurations plus larges. Nos résultats sur des données réelles démontrent que cela est le cas. Entraîné sur un jeu de données statiques standard, les réseaux de convolution nous permettent de récupérer des détails sur une forme en mouvement que les descripteurs d'images classiques ne peuvent extraire. Les évaluations comparatives sur ces données standards sont par ailleurs favorables à la méthode que nous proposons

    Multi-View Dynamic Shape Refinement Using Local Temporal Integration

    Get PDF
    International audienceWe consider 4D shape reconstructions in multi-view environments and investigate how to exploit temporal redundancy for precision refinement. In addition to being beneficial to many dynamic multi-view scenarios this also enables larger scenes where such increased precision can compensate for the reduced spatial resolution per image frame. With precision and scalability in mind, we propose a symmetric (non-causal) local time-window geometric integration scheme over temporal sequences, where shape reconstructions are refined framewise by warping local and reliable geometric regions of neighboring frames to them. This is in contrast to recent comparable approaches targeting a different context with more compact scenes and real-time applications. These usually use a single dense volumetric update space or geometric template, which they causally track and update globally frame by frame, with limitations in scalability for larger scenes and in topology and precision with a template based strategy. Our templateless and local approach is a first step towards temporal shape super-resolution. We show that it improves reconstruction accuracy by considering multiple frames. To this purpose, and in addition to real data examples, we introduce a multi-camera synthetic dataset that provides ground-truth data for mid-scale dynamic scenes
    corecore