Search CORE

19 research outputs found

Person Re-identification in Identity Regression Space

Author: Gong S
Wang H
Xiang T
Zhu X
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/06/2018
Field of study

This work was partially supported by the China Scholarship Council, Vision Semantics Ltd, Royal Society Newton Advanced Fellowship Programme (NA150459), and Innovate UK Industrial Challenge Project on Developing and Commercialising Intelligent Video Analytics Solutions for Public Safety (98111-571149)

arXiv.org e-Print Archive

Queen Mary Research Online

Cross-View Learning

Author: Zhang Li
Publication venue: 'Queen Mary University of London'
Publication date: 09/08/2018
Field of study

PhDKey to achieving more efficient machine intelligence is the capability to analysing and understanding data across different views – which can be camera views or modality views (such as visual and textual). One generic learning paradigm for automated understanding data from different views called cross-view learning which includes cross-view matching, cross-view fusion and cross-view generation. Specifically, this thesis investigates two of them, cross-view matching and cross-view generation, by developing new methods for addressing the following specific computer vision problems. The first problem is cross-view matching for person re-identification which a person is captured by multiple non-overlapping camera views, the objective is to match him/her across views among a large number of imposters. Typically a person’s appearance is represented using features of thousands of dimensions, whilst only hundreds of training samples are available due to the difficulties in collecting matched training samples. With the number of training samples much smaller than the feature dimension, the existing methods thus face the classic small sample size (SSS) problem and have to resort to dimensionality reduction techniques and/or matrix regularisation, which lead to loss of discriminative power for cross-view matching. To that end, this thesis proposes to overcome the SSS problem in subspace learning by matching cross-view data in a discriminative null space of the training data. The second problem is cross-view matching for zero-shot learning where data are drawn from different modalities each for a different view (e.g. visual or textual), versus single-modal data considered in the first problem. This is inherently more challenging as the gap between different views becomes larger. Specifically, the zero-shot learning problem can be solved if the visual representation/view of the data (object) and its textual view are matched. Moreover, it requires learning a joint embedding space where different view data can be projected to for nearest neighbour search. This thesis argues that the key to make zero-shot learning models succeed is to choose the right embedding space. Different from most existing zero-shot learning models utilising a textual or an intermediate space as the embedding space for achieving crossview matching, the proposed method uniquely explores the visual space as the embedding space. This thesis finds that in the visual space, the subsequent nearest neighbour search would suffer much less from the hubness problem and thus become more effective. Moreover, a natural mechanism for multiple textual modalities optimised jointly in an end-to-end manner in this model demonstrates significant advantages over existing methods. The last problem is cross-view generation for image captioning which aims to automatically generate textual sentences from visual images. Most existing image captioning studies are limited to investigate variants of deep learning-based image encoders, improving the inputs for the subsequent deep sentence decoders. Existing methods have two limitations: (i) They are trained to maximise the likelihood of each ground-truth word given the previous ground-truth words and the image, termed Teacher-Forcing. This strategy may cause a mismatch between training and testing since at test-time the model uses the previously generated words from the model distribution to predict the next word. This exposure bias can result in error accumulation in sentence generation during test time, since the model has never been exposed to its own predictions. (ii) The training supervision metric, such as the widely used cross entropy loss, is different from the evaluation metrics at test time. In other words, the model is not directly optimised towards the task expectation. This learned model is therefore suboptimal. One main underlying reason responsible is that the evaluation metrics are non-differentiable and therefore much harder to be optimised against. This thesis overcomes the problems as above by exploring the reinforcement learning idea. Specifically, a novel actor-critic based learning approach is formulated to directly maximise the reward - the actual Natural Language Processing quality metrics of interest. As compared to existing reinforcement learning based captioning models, the new method has the unique advantage of a per-token advantage and value computation is enabled leading to better model training

Queen Mary Research Online

Quadratic Discriminant Analysis Revisited

Author: Cao Wenbo
Publication venue: CUNY Academic Works
Publication date: 01/02/2015
Field of study

In this thesis, we revisit quadratic discriminant analysis (QDA), a standard classification method. Specifically, we investigate the parameter estimation and dimension reduction problems for QDA. Traditionally, the parameters of QDA are estimated generatively; that is the parameters are estimated by maximizing the joint likelihood of observations and their labels. In practice, classical QDA, though computationally efficient, often underperforms discriminative classifiers, such as SVM, Boosting methods, and logistic regression. Motivated by recent research on hybrid generative/discriminative learning, we propose to estimate the parameters of QDA by minimizing a convex combination of negative joint log-likelihood and negative conditional log-likelihood of observations and their labels. For this purpose, we propose an iterative majorize-minimize (MM) algorithm for classifiers of which conditional distributions are from the exponential family; in each iteration of the MM algorithm, a convex optimization problem needs to be solved. To solve the convex problem specially derived for QDA, we propose a block-coordinate descent algorithm that sequentially updates the parameters of QDA; in each update, we present a trust region method for solving optimal estimations, of which we have closed form solutions in each iteration. Numerical experiments show: 1) the hybrid approach to QDA is competitive with, and in some cases significant better than other approaches to QDA, SVM with polynomial kernel (

d=2

) and logistic regression with linear and quadratic features; 2) in many cases, our optimization method converges faster to equal or better optimums than the conjugate gradient method used in the literature. Dimension reduction methods are commonly used to extract more compact features in the hope to build more efficient and possibly more robust classifiers. It is well known that Fisher\u27s discriminant analysis generates optimal lower dimensional features for linear discriminant analysis. However, ...for QDA, where so far there has been no universally accepted dimension-reduction technique in the literature\u27\u27, though considerable efforts have been made. To construct a dimension reduction method for QDA, we generalize the Fukunaga-Koontz transformation, and propose novel affine feature extraction (AFE) methods for binary QDA. The proposed AFE methods have closed-form solutions and thus can be solved efficiently. We show that 1) the AFE methods have desired geometrical, statistical and information-theoretical properties; and 2) the AFE methods generalize dimension reduction methods for LDA and QDA with equal means. Numerical experiments show that the new proposed AFE method is competitive with, and in some cases significantly better than some commonly used linear dimension reduction techniques for QDA in the literature

City University of New York

Recommended from our members

View-invariant gait person re-identification with spatial and temporal attention

Author: Rahi Babak
Publication venue: Brunel University London
Publication date: 01/01/2021
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonPerson re-identification at a distance across multiple none overlapping cameras has been an active research area for years. In the past ten years, Short term Person Re-Id techniques have made great strides in terms of accuracy using only appearance features in limited environments. However, massive intraclass variations and inter-class confusion limit their ability to be used in practical applications. Moreover, appearance consistency can only be assumed in a short time span from one camera to the other. Since the holistic appearance will change drastically over days and weeks, the technique, as mentioned above, will be ineffective. Practical applications usually require a long-term solution in which the subject appearance and clothing might have changed after a significant period has elapsed. Facing these problems, soft biometric features such as Gait have been proposed in the past. Nevertheless, even Gait can vary with illness, ageing and changes in the emotional state, changes in walking surfaces, shoe type, clothes type, objects carried by the subject and even clutter in the scene. Therefore, Gait is considered a temporal cue that could provide biometric motion information. On the other hand, the shape of the human body could be viewed as a spatial signal which can produce valuable information. So, extracting discriminative features from both spatial and temporal domains would be very beneficial to this research. Therefore, this thesis focuses on finding the best and most robust method to tackle the gait human Re-identification problem and solve it for practical applications. In real-world surveillance scenarios, the human gait cycle is primarily abnormal. These abnormalities include but not limited to temporal and spatial characteristics changes such as walking speed, broken gait phase and most importantly, varied camera angles. Our work performed an extensive literature study on spatial and temporal gait feature extraction methods with a focus on deep learning. Next, we conducted a comparative study and proposed a spatial-temporal approach for gait feature extraction using the fusion of multiple modalities, including optical-flow, raw silhouettes and RGB images. This approach was tested on two of the most challenging publicly available datasets for gait recognition TUM-GAID and CASIA-B, with excellent results presented in chapter 3. Furthermore, a modern spatial-temporal attention mechanism was proposed and tested on CASIA-B and OULP datasets which learns salient features independent of the gait cycle and view variations. The spatial attention layer in the proposed method extracts the spatial feature maps using a two-layered architecture that are fused using late fusion. It can pay attention to the identity-related salient regions in silhouette sequences discriminatively using the spatial feature maps. The temporal attention layer consists of an LSTM that encodes the temporal motion for silhouette sequences. It uses the encoded output vectors in the temporal attention architecture to focus on the most critical timesteps in the gait cycle and discard the rest. Furthermore, we improved the performance of our method by mapping our extracted spatial-temporal gait features to a discriminative null space for use in our Siamese architecture for crossmatching. We also conducted an element removal experiment on each segment of our spatial-temporal attentional network to gain insight into each component’s contribution to the performance. Our method showed outstanding robustness against abnormal gait cycles as well as viewpoint variations on both benchmark datasets

Brunel University Research Archive

Re-identifying people in the crowd

Author: Riachy Chirine
Publication venue
Publication date
Field of study

Developing an automated surveillance system is of great interest for various reasons including forensic and security applications. In the case of a network of surveillance cameras with non-overlapping fields of view, person detection and tracking alone are insufficient to track a subject of interest across the network. In this case, instances of a person captured in one camera view need to be retrieved among a gallery of different people, in other camera views. This vision problem is commonly known as person re-identification (re-id). Cross-view instances of pedestrians exhibit varied levels of illumination, viewpoint, and pose variations which makes the problem very challenging. Despite recent progress towards improving accuracy, existing systems suffer from low applicability to real-world scenarios. This is mainly caused by the need for large amounts of annotated data from pairwise camera views to be available for training. Given the difficulty of obtaining such data and annotating it, this thesis aims to bring the person re-id problem a step closer to real-world deployment. In the first contribution, the single-shot protocol, where each individual is represented by a pair of images that need to be matched, is considered. Following the extensive annotation of four datasets for six attributes, an evaluation of the most widely used feature extraction schemes is conducted. The results reveal two high-performing descriptors among those evaluated, and show illumination variation to have the most impact on re-id accuracy. Motivated by the wide availability of videos from surveillance cameras and the additional visual and temporal information they provide, video-based person re-id is then investigated, and a su-pervised system is developed. This is achieved by improving and extending the best performing image-based person descriptor into three dimensions and combining it with distance metric learn-ing. The system obtained achieves state-of-the-art results on two widely used datasets. Given the cost and difficulty of obtaining labelled data from pairwise cameras in a network to train the model, an unsupervised video-based person re-id method is also developed. It is based on a set-based distance measure that leverages rank vectors to estimate the similarity scores between person tracklets. The proposed system outperforms other unsupervised methods by a large margin on two datasets while competing with deep learning methods on another large-scale dataset

Northumbria Research Link

Classification task-driven efficient feature extraction from tensor data

Author: Alahmadi Hanin
Publication venue
Publication date: 12/07/2019
Field of study

Automatic classification of complex data is an area of great interest as it allows to make efficient use of the increasingly data intensive environment that characterizes our modern world. This thesis presents to two contributions to this research area. Firstly, the problem of discriminative feature extraction for data organized in multidimensional arrays. In machine learning, Linear Discriminant Analysis (LDA) is a popular discriminative feature extraction method based on optimizing a Fisher type criterion to find the most discriminative data projection. Various extension of LDA to high-order tensor data have been developed. The method proposed is called the Efficient Greedy Feature Extraction method (EGFE). This method avoids solving optimization problems of very high dimension. Also, it can be stopped when the extracted features are deemed to be sufficient for a proper discrimination of the classes. Secondly, an application of EGFE methods to early detection of dementia disease. For the early detection task, four cognitive scores are used as the original data while we employ our greedy feature extraction method to derive discriminative privileged information feature from fMRI data. The results from the experiments presented in this thesis demonstrate the advantage of using privileged information for the early detection task

University of Birmingham Research Archive, E-theses Repository

INCREMENTAL AND REGULARIZED LINEAR DISCRIMINANT ANALYSIS

Author: WANG XIAOYAN
Publication venue
Publication date: 23/08/2012
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS