Similarity learning for person re-identification and semantic video retrieval

Démeunier, Jean Nicolas; Gaultier de Biauzat, Jean-François; Mirabeau, Honoré-Gabriel Riquetti, comte de; Target, Guy Jean-Baptiste; Thouret, Jacques Guillaume; Verdet, Louis

thesis

Similarity learning for person re-identification and semantic video retrieval

Authors: Jean Nicolas Démeunier
Jean-François Gaultier de Biauzat
Honoré-Gabriel Riquetti, comte de Mirabeau
Guy Jean-Baptiste Target
Jacques Guillaume Thouret
Louis Verdet
Publication date: 1 January 1877
Publisher

Abstract

Many computer vision problems boil down to the learning of a good visual similarity function that calculates a score of how likely two instances share the same semantic concept. In this thesis, we focus on two problems related to similarity learning: Person Re-Identification, and Semantic Video Retrieval. Person Re-Identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. Starting with two cameras, we propose a novel visual word co-occurrence based appearance model to measure the similarities between pedestrian images. This model naturally accounts for spatial similarities and variations caused by pose, illumination and configuration changes across camera views. As a generalization to multiple camera views, we introduce the Group Membership Prediction (GMP) problem. The GMP problem involves predicting whether a collection of instances shares the same semantic property. In this context, we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our method is tested on various benchmarks demonstrating superior accuracy over state-of-art. Semantic Video Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity-level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos. Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.2019-07-09T00:00:00

Similar works

Full text

Available Versions

Périodiques Scientifiques en Édition Électronique

oai:persee:article/arcpa_0000-...

Last time updated on 09/04/2018

Boston University Institutional Repository (OpenBU)

oai:open.bu.edu:2144/23572

Last time updated on 19/12/2017

Persee: Revues scientifique en sciences humaines et sociales

oai:persee:article/arcpa_0000-...

Last time updated on 21/04/2018