61,557 research outputs found
Robust Person Re-identification by Modelling Feature Uncertainty
We aim to learn deep person re-identification (ReID) models that are robust against noisy training data. Two types of noise are prevalent in practice: (1) label noise caused by human annotator errors and (2) data outliers caused by person detector errors or occlusion. Both types of noise pose serious problems for training ReID models, yet have been largely ignored so far. In this paper, we propose a novel deep network termed DistributionNet for robust ReID. Instead of representing each person image as a feature vector, DistributionNet models it as a Gaussian distribution with its variance representing the uncertainty of the extracted features. A carefully designed loss is formulated in DistributionNet to unevenly allocate uncertainty across training samples. Consequently, noisy samples are assigned large variance/uncertainty, which effectively alleviates their negative impacts on model fitting. Extensive experiments demonstrate that our model is more effective than alternative noise-robust deep models. The source code is available at: https://github.com/TianyuanYu/DistributionNet
Beyond simulation: designing for uncertainty and robust solutions
Simulation is an increasingly essential tool in the design of our environment, but any model is only as good as the initial assumptions on which it is built. This paper aims to outline some of the limits and potential dangers of reliance on simulation, and suggests how to make our models, and our buildings, more robust with respect to the uncertainty we face in design. It argues that the single analyses provided by most simulations display too precise and too narrow a result to be maximally useful in design, and instead a broader description is required, as might be provided by many differing simulations. Increased computing power now allows this in many areas. Suggestions are made for the further development of simulation tools for design, in that these increased resources should be dedicated not simply to the accuracy of single solutions, but to a bigger picture that takes account of a designβs robustness to change, multiple phenomena that cannot be predicted, and the wider range of possible solutions. Methods for doing so, including statistical methods, adaptive modelling, machine learning and pattern recognition algorithms for identifying persistent structures in models, will be identified. We propose a number of avenues for future research and how these fit into design process, particularly in the case of the design of very large buildings
Pigment Melanin: Pattern for Iris Recognition
Recognition of iris based on Visible Light (VL) imaging is a difficult
problem because of the light reflection from the cornea. Nonetheless, pigment
melanin provides a rich feature source in VL, unavailable in Near-Infrared
(NIR) imaging. This is due to biological spectroscopy of eumelanin, a chemical
not stimulated in NIR. In this case, a plausible solution to observe such
patterns may be provided by an adaptive procedure using a variational technique
on the image histogram. To describe the patterns, a shape analysis method is
used to derive feature-code for each subject. An important question is how much
the melanin patterns, extracted from VL, are independent of iris texture in
NIR. With this question in mind, the present investigation proposes fusion of
features extracted from NIR and VL to boost the recognition performance. We
have collected our own database (UTIRIS) consisting of both NIR and VL images
of 158 eyes of 79 individuals. This investigation demonstrates that the
proposed algorithm is highly sensitive to the patterns of cromophores and
improves the iris recognition rate.Comment: To be Published on Special Issue on Biometrics, IEEE Transaction on
Instruments and Measurements, Volume 59, Issue number 4, April 201
Activity understanding and unusual event detection in surveillance videos
PhDComputer scientists have made ceaseless efforts to replicate cognitive video understanding abilities
of human brains onto autonomous vision systems. As video surveillance cameras become
ubiquitous, there is a surge in studies on automated activity understanding and unusual event detection
in surveillance videos. Nevertheless, video content analysis in public scenes remained a
formidable challenge due to intrinsic difficulties such as severe inter-object occlusion in crowded
scene and poor quality of recorded surveillance footage. Moreover, it is nontrivial to achieve
robust detection of unusual events, which are rare, ambiguous, and easily confused with noise.
This thesis proposes solutions for resolving ambiguous visual observations and overcoming unreliability
of conventional activity analysis methods by exploiting multi-camera visual context
and human feedback.
The thesis first demonstrates the importance of learning visual context for establishing reliable
reasoning on observed activity in a camera network. In the proposed approach, a new Cross
Canonical Correlation Analysis (xCCA) is formulated to discover and quantify time delayed pairwise
correlations of regional activities observed within and across multiple camera views. This
thesis shows that learning time delayed pairwise activity correlations offers valuable contextual
information for (1) spatial and temporal topology inference of a camera network, (2) robust person
re-identification, and (3) accurate activity-based video temporal segmentation. Crucially, in
contrast to conventional methods, the proposed approach does not rely on either intra-camera or
inter-camera object tracking; it can thus be applied to low-quality surveillance videos featuring
severe inter-object occlusions.
Second, to detect global unusual event across multiple disjoint cameras, this thesis extends
visual context learning from pairwise relationship to global time delayed dependency between
regional activities. Specifically, a Time Delayed Probabilistic Graphical Model (TD-PGM) is
proposed to model the multi-camera activities and their dependencies. Subtle global unusual
events are detected and localised using the model as context-incoherent patterns across multiple
camera views. In the model, different nodes represent activities in different decomposed re3
gions from different camera views, and the directed links between nodes encoding time delayed
dependencies between activities observed within and across camera views. In order to learn optimised
time delayed dependencies in a TD-PGM, a novel two-stage structure learning approach
is formulated by combining both constraint-based and scored-searching based structure learning
methods.
Third, to cope with visual context changes over time, this two-stage structure learning approach
is extended to permit tractable incremental update of both TD-PGM parameters and its
structure. As opposed to most existing studies that assume static model once learned, the proposed
incremental learning allows a model to adapt itself to reflect the changes in the current
visual context, such as subtle behaviour drift over time or removal/addition of cameras. Importantly,
the incremental structure learning is achieved without either exhaustive search in a large
graph structure space or storing all past observations in memory, making the proposed solution
memory and time efficient.
Forth, an active learning approach is presented to incorporate human feedback for on-line
unusual event detection. Contrary to most existing unsupervised methods that perform passive
mining for unusual events, the proposed approach automatically requests supervision for critical
points to resolve ambiguities of interest, leading to more robust detection of subtle unusual
events. The active learning strategy is formulated as a stream-based solution, i.e. it makes decision
on-the-fly on whether to request label for each unlabelled sample observed in sequence.
It selects adaptively two active learning criteria, namely likelihood criterion and uncertainty criterion
to achieve (1) discovery of unknown event classes and (2) refinement of classification
boundary.
The effectiveness of the proposed approaches is validated using videos captured from busy
public scenes such as underground stations and traffic intersections
Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based Object Re-Identification
Object Re-IDentification (ReID), one of the most significant problems in
biometrics and surveillance systems, has been extensively studied by image
processing and computer vision communities in the past decades. Learning a
robust and discriminative feature representation is a crucial challenge for
object ReID. The problem is even more challenging in ReID based on Unmanned
Aerial Vehicle (UAV) as the images are characterized by continuously varying
camera parameters (e.g., view angle, altitude, etc.) of a flying drone. To
address this challenge, multiscale feature representation has been considered
to characterize images captured from UAV flying at different altitudes. In this
work, we propose a multitask learning approach, which employs a new multiscale
architecture without convolution, Pyramid Vision Transformer (PVT), as the
backbone for UAV-based object ReID. By uncertainty modeling of intraclass
variations, our proposed model can be jointly optimized using both
uncertainty-aware object ID and camera ID information. Experimental results are
reported on PRAI and VRAI, two ReID data sets from aerial surveillance, to
verify the effectiveness of our proposed approac
- β¦