22 research outputs found
Semantically Selective Augmentation for Deep Compact Person Re-Identification
We present a deep person re-identification approach that combines
semantically selective, deep data augmentation with clustering-based network
compression to generate high performance, light and fast inference networks. In
particular, we propose to augment limited training data via sampling from a
deep convolutional generative adversarial network (DCGAN), whose discriminator
is constrained by a semantic classifier to explicitly control the domain
specificity of the generation process. Thereby, we encode information in the
classifier network which can be utilized to steer adversarial synthesis, and
which fuels our CondenseNet ID-network training. We provide a quantitative and
qualitative analysis of the approach and its variants on a number of datasets,
obtaining results that outperform the state-of-the-art on the LIMA dataset for
long-term monitoring in indoor living spaces
Agent-based framework for person re-identification
In computer based human object re-identification, a detected human is recognised to a
level sufficient to re-identify a tracked person in either a different camera capturing the
same individual, often at a different angle, or the same camera at a different time and/or
the person approaching the camera at a different angle. Instead of relying on face
recognition technology such systems study the clothing of the individuals being monitored
and/or objects being carried to establish correspondence and hence re-identify the human
object.
Unfortunately present human-object re-identification systems consider the entire human
object as one connected region in making the decisions about similarity of two objects
being matched. This assumption has a major drawback in that when a person is partially
occluded, a part of the occluding foreground will be picked up and used in matching. Our
research revealed that when a human observer carries out a manual human-object re-identification
task, the attention is often taken over by some parts of the human
figure/body, more than the others, e.g. face, brightly colour shirt, presence of texture
patterns in clothing etc., and occluding parts are ignored.
In this thesis, a novel multi-agent based framework is proposed for the design of a human
object re-identification system. Initially a HOG based feature extraction is used in a SVM
based classification of a human object as a human of a full-body or of half body nature.
Subsequently the relative visual significance of the top and the bottom parts of the human,
in re-identification is quantified by the analysis of Gray Level Co-occurrence based
texture features and colour histograms obtained in the HSV colour space. Accordingly
different weights are assigned to the top and bottom of the human body using a novel
probabilistic approach. The weights are then used to modify the Hybrid Spatiogram and
Covariance Descriptor (HSCD) feature based re-identification algorithm adopted.
A significant novelty of the human object re-identification systems proposed in this thesis
is the agent based design procedure adopted that separates the use of computer vision
algorithms for feature extraction, comparison etc., from the decision making process of re-identification. Multiple agents are assigned to execute different algorithmic tasks and
the agents communicate to make the required logical decisions.
Detailed experimental results are provided to prove that the proposed multi agent based
framework for human object re-identification performs significantly better than the state of-the-art algorithms. Further it is shown that the design flexibilities and scalabilities of
the proposed system allows it to be effectively utilised in more complex computer vision
based video analytic/forensic tasks often conducted within distributed, multi-camera
systems
Dimensionality reduction and sparse representations in computer vision
The proliferation of camera equipped devices, such as netbooks, smartphones and game stations, has led to a significant increase in the production of visual content. This visual information could be used for understanding the environment and offering a natural interface between the users and their surroundings. However, the massive amounts of data and the high computational cost associated with them, encumbers the transfer of sophisticated vision algorithms to real life systems, especially ones that exhibit resource limitations such as restrictions in available memory, processing power and bandwidth. One approach for tackling these issues is to generate compact and descriptive representations of image data by exploiting inherent redundancies. We propose the investigation of dimensionality reduction and sparse representations in order to accomplish this task. In dimensionality reduction, the aim is to reduce the dimensions of the space where image data reside in order to allow resource constrained systems to handle them and, ideally, provide a more insightful description. This goal is achieved by exploiting the inherent redundancies that many classes of images, such as faces under different illumination conditions and objects from different viewpoints, exhibit. We explore the description of natural images by low dimensional non-linear models called image manifolds and investigate the performance of computer vision tasks such as recognition and classification using these low dimensional models. In addition to dimensionality reduction, we study a novel approach in representing images as a sparse linear combination of dictionary examples. We investigate how sparse image representations can be used for a variety of tasks including low level image modeling and higher level semantic information extraction. Using tools from dimensionality reduction and sparse representation, we propose the application of these methods in three hierarchical image layers, namely low-level features, mid-level structures and high-level attributes. Low level features are image descriptors that can be extracted directly from the raw image pixels and include pixel intensities, histograms, and gradients. In the first part of this work, we explore how various techniques in dimensionality reduction, ranging from traditional image compression to the recently proposed Random Projections method, affect the performance of computer vision algorithms such as face detection and face recognition. In addition, we discuss a method that is able to increase the spatial resolution of a single image, without using any training examples, according to the sparse representations framework. In the second part, we explore mid-level structures, including image manifolds and sparse models, produced by abstracting information from low-level features and offer compact modeling of high dimensional data. We propose novel techniques for generating more descriptive image representations and investigate their application in face recognition and object tracking. In the third part of this work, we propose the investigation of a novel framework for representing the semantic contents of images. This framework employs high level semantic attributes that aim to bridge the gap between the visual information of an image and its textual description by utilizing low level features and mid level structures. This innovative paradigm offers revolutionary possibilities including recognizing the category of an object from purely textual information without providing any explicit visual example
Real-time person re-identification for interactive environments
The work presented in this thesis was motivated by a vision of the future in which intelligent environments in public spaces such as galleries and museums, deliver useful and personalised services to people via natural interaction, that is, without the need for people to provide explicit instructions via tangible interfaces. Delivering the right services to the right people requires a means of biometrically identifying individuals and then re-identifying them as they move freely through the environment. Delivering the service they desire requires sensing their context, for example, sensing their location or proximity to resources.
This thesis presents both a context-aware system and a person re-identification method. A tabletop display was designed and prototyped with an infrared person-sensing context function. In experimental evaluation it exhibited tracking performance comparable to other more complex systems. A real-time, viewpoint invariant, person re-identification method is proposed based on a novel set of Viewpoint Invariant Multi-modal (ViMM) feature descriptors collected from depth-sensing cameras. The method uses colour and a combination of anthropometric properties logged as a function of body orientation. A neural network classifier is used to perform re-identification
Soft Biometric Analysis: MultiPerson and RealTime Pedestrian Attribute Recognition in Crowded Urban Environments
Traditionally, recognition systems were only based on human hard biometrics. However,
the ubiquitous CCTV cameras have raised the desire to analyze human biometrics from
far distances, without people attendance in the acquisition process. Highresolution
face closeshots
are rarely available at far distances such that facebased
systems cannot
provide reliable results in surveillance applications. Human soft biometrics such as body
and clothing attributes are believed to be more effective in analyzing human data collected
by security cameras.
This thesis contributes to the human soft biometric analysis in uncontrolled environments
and mainly focuses on two tasks: Pedestrian Attribute Recognition (PAR) and person reidentification
(reid).
We first review the literature of both tasks and highlight the history
of advancements, recent developments, and the existing benchmarks. PAR and person reid
difficulties are due to significant distances between intraclass
samples, which originate
from variations in several factors such as body pose, illumination, background, occlusion,
and data resolution. Recent stateoftheart
approaches present endtoend
models that
can extract discriminative and comprehensive feature representations from people. The
correlation between different regions of the body and dealing with limited learning data
is also the objective of many recent works. Moreover, class imbalance and correlation
between human attributes are specific challenges associated with the PAR problem.
We collect a large surveillance dataset to train a novel gender recognition model suitable
for uncontrolled environments. We propose a deep residual network that extracts several
posewise
patches from samples and obtains a comprehensive feature representation. In
the next step, we develop a model for multiple attribute recognition at once. Considering
the correlation between human semantic attributes and class imbalance, we respectively
use a multitask
model and a weighted loss function. We also propose a multiplication
layer on top of the backbone features extraction layers to exclude the background features
from the final representation of samples and draw the attention of the model to the
foreground area.
We address the problem of person reid
by implicitly defining the receptive fields of
deep learning classification frameworks. The receptive fields of deep learning models
determine the most significant regions of the input data for providing correct decisions.
Therefore, we synthesize a set of learning data in which the destructive regions (e.g.,
background) in each pair of instances are interchanged. A segmentation module
determines destructive and useful regions in each sample, and the label of synthesized
instances are inherited from the sample that shared the useful regions in the synthesized
image. The synthesized learning data are then used in the learning phase and help
the model rapidly learn that the identity and background regions are not correlated.
Meanwhile, the proposed solution could be seen as a data augmentation approach that
fully preserves the label information and is compatible with other data augmentation
techniques.
When reid
methods are learned in scenarios where the target person appears with identical garments in the gallery, the visual appearance of clothes is given the most
importance in the final feature representation. Clothbased
representations are not
reliable in the longterm
reid
settings as people may change their clothes. Therefore,
developing solutions that ignore clothing cues and focus on identityrelevant
features are
in demand. We transform the original data such that the identityrelevant
information of
people (e.g., face and body shape) are removed, while the identityunrelated
cues (i.e.,
color and texture of clothes) remain unchanged. A learned model on the synthesized
dataset predicts the identityunrelated
cues (shortterm
features). Therefore, we train a
second model coupled with the first model and learns the embeddings of the original data
such that the similarity between the embeddings of the original and synthesized data is
minimized. This way, the second model predicts based on the identityrelated
(longterm)
representation of people.
To evaluate the performance of the proposed models, we use PAR and person reid
datasets, namely BIODI, PETA, RAP, Market1501,
MSMTV2,
PRCC, LTCC, and MIT
and compared our experimental results with stateoftheart
methods in the field.
In conclusion, the data collected from surveillance cameras have low resolution, such
that the extraction of hard biometric features is not possible, and facebased
approaches
produce poor results. In contrast, soft biometrics are robust to variations in data quality.
So, we propose approaches both for PAR and person reid
to learn discriminative features
from each instance and evaluate our proposed solutions on several publicly available
benchmarks.This thesis was prepared at the University of Beria Interior, IT Instituto de Telecomunicações, Soft Computing and Image Analysis Laboratory (SOCIA Lab), Covilhã Delegation, and was submitted to the University of Beira Interior for defense in a public examination session
Weakly Supervised Learning of Objects and Attributes.
PhDThis thesis presents weakly supervised learning approaches to directly
exploit image-level tags (e.g. objects, attributes) for comprehensive
image understanding, including tasks such as object localisation, image
description, image retrieval, semantic segmentation, person re-identification
and person search, etc. Unlike the conventional approaches which tackle
weakly supervised problem by learning a discriminative model, a generative
Bayesian framework is proposed which provides better mechanisms
to resolve the ambiguity problem. The proposed model significantly differentiates
from the existing approaches in that: (1) All foreground object
classes are modelled jointly in a single generative model that encodes multiple
objects co-existence so that “explaining away” inference can resolve
ambiguity and lead to better learning. (2) Image backgrounds are shared
across classes to better learn varying surroundings and “push out” objects
of interest. (3) the Bayesian formulation enables the exploitation of various
types of prior knowledge to compensate for the limited supervision
offered by weakly labelled data, as well as Bayesian domain adaptation
for transfer learning.
Detecting objects is the first and critical component in image understanding
paradigm. Unlike conventional fully supervised object detection
approaches, the proposed model aims to train an object detector
from weakly labelled data. A novel framework based on Bayesian latent
topic model is proposed to address the problem of localisation of objects
as bounding boxes in images and videos with image level object labels.
The inferred object location can be then used as the annotation to train a
classic object detector with conventional approaches.
However, objects cannot tell the whole story in an image. Beyond detecting
objects, a general visual model should be able to describe objects
and segment them at a pixel level. Another limitation of the initial model is
that it still requires an additional object detector. To remedy the above two
drawbacks, a novel weakly supervised non-parametric Bayesian model is
presented to model objects, attributes and their associations automatically
from weakly labelled images. Once learned, given a new image, the proposed
model can describe the image with the combination of objects and
attributes, as well as their locations and segmentation.
Finally, this thesis further tackles the weakly supervised learning problem
from a transfer learning perspective, by considering the fact that there
are always some fully labelled or weakly labelled data available in a related
domain while only insufficient labelled data exist for training in the
target domain. A powerful semantic description is transferred from the existing
fashion photography datasets to surveillance data to solve the person
re-identification problem
Human Body Pose Estimation for Gait Identification: A Comprehensive Survey of Datasets and Models
Person identification is a problem that has received substantial attention, particularly in security domains. Gait recognition is one of the most convenient approaches enabling person identification at a distance without the need of high-quality images. There are several review studies addressing person identification such as the utilization of facial images, silhouette images, and wearable sensor. Despite skeletonbased person identification gaining popularity while overcoming the challenges of traditional approaches, existing survey studies lack the comprehensive review of skeleton-based approaches to gait identification. We present a detailed review of the human pose estimation and gait analysis that make the skeleton-based approaches possible. The study covers various types of related datasets, tools, methodologies, and evaluation metrics with associated challenges, limitations, and application domains. Detailed comparisons are presented for each of these aspects with recommendations for potential research and alternatives. A common trend throughout this paper is the positive impact that deep learning techniques are beginning to have on topics such as human pose estimation and gait identification. The survey outcomes might be useful for the related research community and other stakeholders in terms of performance analysis of existing methodologies, potential research gaps, application domains, and possible contributions in the future
Advances in Computer Recognition, Image Processing and Communications, Selected Papers from CORES 2021 and IP&C 2021
As almost all human activities have been moved online due to the pandemic, novel robust and efficient approaches and further research have been in higher demand in the field of computer science and telecommunication. Therefore, this (reprint) book contains 13 high-quality papers presenting advancements in theoretical and practical aspects of computer recognition, pattern recognition, image processing and machine learning (shallow and deep), including, in particular, novel implementations of these techniques in the areas of modern telecommunications and cybersecurity