202 research outputs found
Activity understanding and unusual event detection in surveillance videos
PhDComputer scientists have made ceaseless efforts to replicate cognitive video understanding abilities
of human brains onto autonomous vision systems. As video surveillance cameras become
ubiquitous, there is a surge in studies on automated activity understanding and unusual event detection
in surveillance videos. Nevertheless, video content analysis in public scenes remained a
formidable challenge due to intrinsic difficulties such as severe inter-object occlusion in crowded
scene and poor quality of recorded surveillance footage. Moreover, it is nontrivial to achieve
robust detection of unusual events, which are rare, ambiguous, and easily confused with noise.
This thesis proposes solutions for resolving ambiguous visual observations and overcoming unreliability
of conventional activity analysis methods by exploiting multi-camera visual context
and human feedback.
The thesis first demonstrates the importance of learning visual context for establishing reliable
reasoning on observed activity in a camera network. In the proposed approach, a new Cross
Canonical Correlation Analysis (xCCA) is formulated to discover and quantify time delayed pairwise
correlations of regional activities observed within and across multiple camera views. This
thesis shows that learning time delayed pairwise activity correlations offers valuable contextual
information for (1) spatial and temporal topology inference of a camera network, (2) robust person
re-identification, and (3) accurate activity-based video temporal segmentation. Crucially, in
contrast to conventional methods, the proposed approach does not rely on either intra-camera or
inter-camera object tracking; it can thus be applied to low-quality surveillance videos featuring
severe inter-object occlusions.
Second, to detect global unusual event across multiple disjoint cameras, this thesis extends
visual context learning from pairwise relationship to global time delayed dependency between
regional activities. Specifically, a Time Delayed Probabilistic Graphical Model (TD-PGM) is
proposed to model the multi-camera activities and their dependencies. Subtle global unusual
events are detected and localised using the model as context-incoherent patterns across multiple
camera views. In the model, different nodes represent activities in different decomposed re3
gions from different camera views, and the directed links between nodes encoding time delayed
dependencies between activities observed within and across camera views. In order to learn optimised
time delayed dependencies in a TD-PGM, a novel two-stage structure learning approach
is formulated by combining both constraint-based and scored-searching based structure learning
methods.
Third, to cope with visual context changes over time, this two-stage structure learning approach
is extended to permit tractable incremental update of both TD-PGM parameters and its
structure. As opposed to most existing studies that assume static model once learned, the proposed
incremental learning allows a model to adapt itself to reflect the changes in the current
visual context, such as subtle behaviour drift over time or removal/addition of cameras. Importantly,
the incremental structure learning is achieved without either exhaustive search in a large
graph structure space or storing all past observations in memory, making the proposed solution
memory and time efficient.
Forth, an active learning approach is presented to incorporate human feedback for on-line
unusual event detection. Contrary to most existing unsupervised methods that perform passive
mining for unusual events, the proposed approach automatically requests supervision for critical
points to resolve ambiguities of interest, leading to more robust detection of subtle unusual
events. The active learning strategy is formulated as a stream-based solution, i.e. it makes decision
on-the-fly on whether to request label for each unlabelled sample observed in sequence.
It selects adaptively two active learning criteria, namely likelihood criterion and uncertainty criterion
to achieve (1) discovery of unknown event classes and (2) refinement of classification
boundary.
The effectiveness of the proposed approaches is validated using videos captured from busy
public scenes such as underground stations and traffic intersections
Instance-level Facial Attributes Transfer with Geometry-Aware Flow
We address the problem of instance-level facial attribute transfer without
paired training data, e.g. faithfully transferring the exact mustache from a
source face to a target face. This is a more challenging task than the
conventional semantic-level attribute transfer, which only preserves the
generic attribute style instead of instance-level traits. We propose the use of
geometry-aware flow, which serves as a well-suited representation for modeling
the transformation between instance-level facial attributes. Specifically, we
leverage the facial landmarks as the geometric guidance to learn the
differentiable flows automatically, despite of the large pose gap existed.
Geometry-aware flow is able to warp the source face attribute into the target
face context and generate a warp-and-blend result. To compensate for the
potential appearance gap between source and target faces, we propose a
hallucination sub-network that produces an appearance residual to further
refine the warp-and-blend result. Finally, a cycle-consistency framework
consisting of both attribute transfer module and attribute removal module is
designed, so that abundant unpaired face images can be used as training data.
Extensive evaluations validate the capability of our approach in transferring
instance-level facial attributes faithfully across large pose and appearance
gaps. Thanks to the flow representation, our approach can readily be applied to
generate realistic details on high-resolution images.Comment: To appear in AAAI 2019. Code and models are available at:
https://github.com/wdyin/GeoGA
Aesthetic-Driven Image Enhancement by Adversarial Learning
We introduce EnhanceGAN, an adversarial learning based model that performs
automatic image enhancement. Traditional image enhancement frameworks typically
involve training models in a fully-supervised manner, which require expensive
annotations in the form of aligned image pairs. In contrast to these
approaches, our proposed EnhanceGAN only requires weak supervision (binary
labels on image aesthetic quality) and is able to learn enhancement operators
for the task of aesthetic-based image enhancement. In particular, we show the
effectiveness of a piecewise color enhancement module trained with weak
supervision, and extend the proposed EnhanceGAN framework to learning a deep
filtering-based aesthetic enhancer. The full differentiability of our image
enhancement operators enables the training of EnhanceGAN in an end-to-end
manner. We further demonstrate the capability of EnhanceGAN in learning
aesthetic-based image cropping without any groundtruth cropping pairs. Our
weakly-supervised EnhanceGAN reports competitive quantitative results on
aesthetic-based color enhancement as well as automatic image cropping, and a
user study confirms that our image enhancement results are on par with or even
preferred over professional enhancement
DifFace: Blind Face Restoration with Diffused Error Contraction
While deep learning-based methods for blind face restoration have achieved
unprecedented success, they still suffer from two major limitations. First,
most of them deteriorate when facing complex degradations out of their training
data. Second, these methods require multiple constraints, e.g., fidelity,
perceptual, and adversarial losses, which require laborious hyper-parameter
tuning to stabilize and balance their influences. In this work, we propose a
novel method named DifFace that is capable of coping with unseen and complex
degradations more gracefully without complicated loss designs. The key of our
method is to establish a posterior distribution from the observed low-quality
(LQ) image to its high-quality (HQ) counterpart. In particular, we design a
transition distribution from the LQ image to the intermediate state of a
pre-trained diffusion model and then gradually transmit from this intermediate
state to the HQ target by recursively applying a pre-trained diffusion model.
The transition distribution only relies on a restoration backbone that is
trained with loss on some synthetic data, which favorably avoids the
cumbersome training process in existing methods. Moreover, the transition
distribution can contract the error of the restoration backbone and thus makes
our method more robust to unknown degradations. Comprehensive experiments show
that DifFace is superior to current state-of-the-art methods, especially in
cases with severe degradations. Our code and model are available at
https://github.com/zsyOAOA/DifFace.Comment: 21 page
- β¦