38 research outputs found
Improving CNN-based Person Re-identification using score Normalization
Person re-identification (PRe-ID) is a crucial task in security,
surveillance, and retail analysis, which involves identifying an individual
across multiple cameras and views. However, it is a challenging task due to
changes in illumination, background, and viewpoint. Efficient feature
extraction and metric learning algorithms are essential for a successful PRe-ID
system. This paper proposes a novel approach for PRe-ID, which combines a
Convolutional Neural Network (CNN) based feature extraction method with
Cross-view Quadratic Discriminant Analysis (XQDA) for metric learning.
Additionally, a matching algorithm that employs Mahalanobis distance and a
score normalization process to address inconsistencies between camera scores is
implemented. The proposed approach is tested on four challenging datasets,
including VIPeR, GRID, CUHK01, and PRID450S, and promising results are
obtained. For example, without normalization, the rank-20 rate accuracies of
the GRID, CUHK01, VIPeR and PRID450S datasets were 61.92%, 83.90%, 92.03%,
96.22%; however, after score normalization, they have increased to 64.64%,
89.30%, 92.78%, and 98.76%, respectively. Accordingly, the promising results on
four challenging datasets indicate the effectiveness of the proposed approach.Comment: 5 pages, 6 figures and 2 table
Person Re-identification: Past, Present and Future
Person re-identification (re-ID) has become increasingly popular in the community due to its application and research significance. It aims at spotting a person of interest in other cameras. In the early days, hand-crafted algorithms and small-scale evaluation were predominantly reported. Recent years have witnessed the emergence of large-scale datasets and deep learning systems which make use of large data volumes. Considering different tasks, we classify most current re-ID methods into two classes, i.e., image-based and video-based; in both tasks, hand-crafted and deep learning systems will be reviewed. Moreover, two new re-ID tasks which are much closer to real-world applications are described and discussed, i.e., end-to-end re-ID and fast re-ID in very large galleries. This paper: 1) introduces the history of person re-ID and its relationship with image classification and instance retrieval; 2) surveys a broad selection of the hand-crafted systems and the large-scale methods in both image- and video-based re-ID; 3) describes critical future directions in end-to-end re-ID and fast retrieval in large galleries; and 4) finally briefs some important yet under-developed issues
Robust subspace learning for static and dynamic affect and behaviour modelling
Machine analysis of human affect and behavior in naturalistic contexts has witnessed a growing attention in the last decade from various disciplines ranging from social and cognitive sciences to machine learning and computer vision. Endowing machines with the ability to seamlessly detect, analyze, model, predict as well as simulate and synthesize manifestations of internal emotional and behavioral states in real-world data is deemed essential for the deployment of next-generation, emotionally- and socially-competent human-centered interfaces. In this thesis, we are primarily motivated by the problem of modeling, recognizing and predicting spontaneous expressions of non-verbal human affect and behavior manifested through either low-level facial attributes in static images or high-level semantic events in image sequences. Both visual data and annotations of naturalistic affect and behavior naturally contain noisy measurements of unbounded magnitude at random locations, commonly referred to as ‘outliers’. We present here machine learning methods that are robust to such gross, sparse noise. First, we deal with static analysis of face images, viewing the latter as a superposition of mutually-incoherent, low-complexity components corresponding to facial attributes, such as facial identity, expressions and activation of atomic facial muscle actions. We develop a robust, discriminant dictionary learning framework to extract these components from grossly corrupted training data and combine it with sparse representation to recognize the associated attributes. We demonstrate that our framework can jointly address interrelated classification tasks such as face and facial expression recognition. Inspired by the well-documented importance of the temporal aspect in perceiving affect and behavior, we direct the bulk of our research efforts into continuous-time modeling of dimensional affect and social behavior. Having identified a gap in the literature which is the lack of data containing annotations of social attitudes in continuous time and scale, we first curate a new audio-visual database of multi-party conversations from political debates annotated frame-by-frame in terms of real-valued conflict intensity and use it to conduct the first study on continuous-time conflict intensity estimation. Our experimental findings corroborate previous evidence indicating the inability of existing classifiers in capturing the hidden temporal structures of affective and behavioral displays. We present here a novel dynamic behavior analysis framework which models temporal dynamics in an explicit way, based on the natural assumption that continuous- time annotations of smoothly-varying affect or behavior can be viewed as outputs of a low-complexity linear dynamical system when behavioral cues (features) act as system inputs. A novel robust structured rank minimization framework is proposed to estimate the system parameters in the presence of gross corruptions and partially missing data. Experiments on prediction of dimensional conflict and affect as well as multi-object tracking from detection validate the effectiveness of our predictive framework and demonstrate that for the first time that complex human behavior and affect can be learned and predicted based on small training sets of person(s)-specific observations.Open Acces
Recent Advances in Deep Learning Techniques for Face Recognition
In recent years, researchers have proposed many deep learning (DL) methods
for various tasks, and particularly face recognition (FR) made an enormous leap
using these techniques. Deep FR systems benefit from the hierarchical
architecture of the DL methods to learn discriminative face representation.
Therefore, DL techniques significantly improve state-of-the-art performance on
FR systems and encourage diverse and efficient real-world applications. In this
paper, we present a comprehensive analysis of various FR systems that leverage
the different types of DL techniques, and for the study, we summarize 168
recent contributions from this area. We discuss the papers related to different
algorithms, architectures, loss functions, activation functions, datasets,
challenges, improvement ideas, current and future trends of DL-based FR
systems. We provide a detailed discussion of various DL methods to understand
the current state-of-the-art, and then we discuss various activation and loss
functions for the methods. Additionally, we summarize different datasets used
widely for FR tasks and discuss challenges related to illumination, expression,
pose variations, and occlusion. Finally, we discuss improvement ideas, current
and future trends of FR tasks.Comment: 32 pages and citation: M. T. H. Fuad et al., "Recent Advances in Deep
Learning Techniques for Face Recognition," in IEEE Access, vol. 9, pp.
99112-99142, 2021, doi: 10.1109/ACCESS.2021.309613
People re-identification using deep appearance, feature and attribute learning
Person Re-Identification (Re-ID) is the act of matching one or more query images of an individual with images of the same individual in a gallery set. We propose various methods to improve Re-ID performance via foreground modelling, skeleton prediction and attribute detection.
Foreground modelling is an important preprocessing step in Re-ID, allowing more representative features to be extracted. We propose two foreground modelling methods which learn a mapping between a set of training images and skeleton keypoints. The first utilises Partial Least Squares (PLS) regression to learn a mapping between Histogram of Oriented Gradients (HOG) features extracted from person images, and skeleton keypoints. The second instead learns the mapping using a deep convolutional neural network (CNN). Using a CNN has been shown to generalise better, particularly for unusual pedestrian poses.
We then utilise the predicted skeleton to generate a binary mask, separating the foreground from the background. This is useful for weighting image features extracted from foreground areas higher than those extracted from background areas. We apply this weighting during the feature extraction stage to increase matching rates.
The predicted skeleton can be used to divide a pedestrian image into multiple parts, such as head and torso. We propose using the divided images as input to an attribute prediction network. We then use this network to generate robust feature descriptors, and demonstrate competitive Re-ID matching rates.
We evaluate on a number of dfferent Re-ID data sets, each possessing significant variations in visual characteristics. We validate our proposals by measuring the rank-n score, which is equivalent to the percentage of identities correctly predicted within n attempts. We evaluate our skeleton prediction network using root mean square error (RMSE), and our attribute prediction network using accuracy. Experiments demonstrate that our proposed methods can supplement traditional Re-ID approaches to increase rank-n matching rates
From Data to Software to Science with the Rubin Observatory LSST
The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) dataset
will dramatically alter our understanding of the Universe, from the origins of
the Solar System to the nature of dark matter and dark energy. Much of this
research will depend on the existence of robust, tested, and scalable
algorithms, software, and services. Identifying and developing such tools ahead
of time has the potential to significantly accelerate the delivery of early
science from LSST. Developing these collaboratively, and making them broadly
available, can enable more inclusive and equitable collaboration on LSST
science.
To facilitate such opportunities, a community workshop entitled "From Data to
Software to Science with the Rubin Observatory LSST" was organized by the LSST
Interdisciplinary Network for Collaboration and Computing (LINCC) and partners,
and held at the Flatiron Institute in New York, March 28-30th 2022. The
workshop included over 50 in-person attendees invited from over 300
applications. It identified seven key software areas of need: (i) scalable
cross-matching and distributed joining of catalogs, (ii) robust photometric
redshift determination, (iii) software for determination of selection
functions, (iv) frameworks for scalable time-series analyses, (v) services for
image access and reprocessing at scale, (vi) object image access (cutouts) and
analysis at scale, and (vii) scalable job execution systems.
This white paper summarizes the discussions of this workshop. It considers
the motivating science use cases, identified cross-cutting algorithms,
software, and services, their high-level technical specifications, and the
principles of inclusive collaborations needed to develop them. We provide it as
a useful roadmap of needs, as well as to spur action and collaboration between
groups and individuals looking to develop reusable software for early LSST
science.Comment: White paper from "From Data to Software to Science with the Rubin
Observatory LSST" worksho