3,138 research outputs found
Recommended from our members
Multitarget Tracking in Nonoverlapping Cameras Using a Reference Set
Tracking multiple targets in nonoverlapping cameras are challenging since the observations of the same targets are often separated by time and space. There might be significant appearance change of a target across camera views caused by variations in illumination conditions, poses, and camera imaging characteristics. Consequently, the same target may appear very different in two cameras. Therefore, associating tracks in different camera views directly based on their appearance similarity is difficult and prone to error. In most previous methods, the appearance similarity is computed either using color histograms or based on pretrained brightness transfer function that maps color between cameras. In this paper, a novel reference set based appearance model is proposed to improve multitarget tracking in a network of nonoverlapping cameras. Contrary to previous work, a reference set is constructed for a pair of cameras, containing subjects appearing in both camera views. For track association, instead of directly comparing the appearance of two targets in different camera views, they are compared indirectly via the reference set. Besides global color histograms, texture and shape features are extracted at different locations of a target, and AdaBoost is used to learn the discriminative power of each feature. The effectiveness of the proposed method over the state of the art on two challenging real-world multicamera video data sets is demonstrated by thorough experiments
Person re-Identification over distributed spaces and time
PhDReplicating the human visual system and cognitive abilities that the brain uses to process the
information it receives is an area of substantial scientific interest. With the prevalence of video
surveillance cameras a portion of this scientific drive has been into providing useful automated
counterparts to human operators. A prominent task in visual surveillance is that of matching
people between disjoint camera views, or re-identification. This allows operators to locate people
of interest, to track people across cameras and can be used as a precursory step to multi-camera
activity analysis. However, due to the contrasting conditions between camera views and their
effects on the appearance of people re-identification is a non-trivial task. This thesis proposes
solutions for reducing the visual ambiguity in observations of people between camera views
This thesis first looks at a method for mitigating the effects on the appearance of people under
differing lighting conditions between camera views. This thesis builds on work modelling
inter-camera illumination based on known pairs of images. A Cumulative Brightness Transfer
Function (CBTF) is proposed to estimate the mapping of colour brightness values based on limited
training samples. Unlike previous methods that use a mean-based representation for a set of
training samples, the cumulative nature of the CBTF retains colour information from underrepresented
samples in the training set. Additionally, the bi-directionality of the mapping function
is explored to try and maximise re-identification accuracy by ensuring samples are accurately
mapped between cameras.
Secondly, an extension is proposed to the CBTF framework that addresses the issue of changing
lighting conditions within a single camera. As the CBTF requires manually labelled training
samples it is limited to static lighting conditions and is less effective if the lighting changes. This
Adaptive CBTF (A-CBTF) differs from previous approaches that either do not consider lighting
change over time, or rely on camera transition time information to update. By utilising contextual
information drawn from the background in each camera view, an estimation of the lighting
change within a single camera can be made. This background lighting model allows the mapping
of colour information back to the original training conditions and thus remove the need for
3
retraining.
Thirdly, a novel reformulation of re-identification as a ranking problem is proposed. Previous
methods use a score based on a direct distance measure of set features to form a correct/incorrect
match result. Rather than offering an operator a single outcome, the ranking paradigm is to give
the operator a ranked list of possible matches and allow them to make the final decision. By utilising
a Support Vector Machine (SVM) ranking method, a weighting on the appearance features
can be learned that capitalises on the fact that not all image features are equally important to
re-identification. Additionally, an Ensemble-RankSVM is proposed to address scalability issues
by separating the training samples into smaller subsets and boosting the trained models.
Finally, the thesis looks at a practical application of the ranking paradigm in a real world application.
The system encompasses both the re-identification stage and the precursory extraction
and tracking stages to form an aid for CCTV operators. Segmentation and detection are combined
to extract relevant information from the video, while several combinations of matching
techniques are combined with temporal priors to form a more comprehensive overall matching
criteria.
The effectiveness of the proposed approaches is tested on datasets obtained from a variety
of challenging environments including offices, apartment buildings, airports and outdoor public
spaces
Activity understanding and unusual event detection in surveillance videos
PhDComputer scientists have made ceaseless efforts to replicate cognitive video understanding abilities
of human brains onto autonomous vision systems. As video surveillance cameras become
ubiquitous, there is a surge in studies on automated activity understanding and unusual event detection
in surveillance videos. Nevertheless, video content analysis in public scenes remained a
formidable challenge due to intrinsic difficulties such as severe inter-object occlusion in crowded
scene and poor quality of recorded surveillance footage. Moreover, it is nontrivial to achieve
robust detection of unusual events, which are rare, ambiguous, and easily confused with noise.
This thesis proposes solutions for resolving ambiguous visual observations and overcoming unreliability
of conventional activity analysis methods by exploiting multi-camera visual context
and human feedback.
The thesis first demonstrates the importance of learning visual context for establishing reliable
reasoning on observed activity in a camera network. In the proposed approach, a new Cross
Canonical Correlation Analysis (xCCA) is formulated to discover and quantify time delayed pairwise
correlations of regional activities observed within and across multiple camera views. This
thesis shows that learning time delayed pairwise activity correlations offers valuable contextual
information for (1) spatial and temporal topology inference of a camera network, (2) robust person
re-identification, and (3) accurate activity-based video temporal segmentation. Crucially, in
contrast to conventional methods, the proposed approach does not rely on either intra-camera or
inter-camera object tracking; it can thus be applied to low-quality surveillance videos featuring
severe inter-object occlusions.
Second, to detect global unusual event across multiple disjoint cameras, this thesis extends
visual context learning from pairwise relationship to global time delayed dependency between
regional activities. Specifically, a Time Delayed Probabilistic Graphical Model (TD-PGM) is
proposed to model the multi-camera activities and their dependencies. Subtle global unusual
events are detected and localised using the model as context-incoherent patterns across multiple
camera views. In the model, different nodes represent activities in different decomposed re3
gions from different camera views, and the directed links between nodes encoding time delayed
dependencies between activities observed within and across camera views. In order to learn optimised
time delayed dependencies in a TD-PGM, a novel two-stage structure learning approach
is formulated by combining both constraint-based and scored-searching based structure learning
methods.
Third, to cope with visual context changes over time, this two-stage structure learning approach
is extended to permit tractable incremental update of both TD-PGM parameters and its
structure. As opposed to most existing studies that assume static model once learned, the proposed
incremental learning allows a model to adapt itself to reflect the changes in the current
visual context, such as subtle behaviour drift over time or removal/addition of cameras. Importantly,
the incremental structure learning is achieved without either exhaustive search in a large
graph structure space or storing all past observations in memory, making the proposed solution
memory and time efficient.
Forth, an active learning approach is presented to incorporate human feedback for on-line
unusual event detection. Contrary to most existing unsupervised methods that perform passive
mining for unusual events, the proposed approach automatically requests supervision for critical
points to resolve ambiguities of interest, leading to more robust detection of subtle unusual
events. The active learning strategy is formulated as a stream-based solution, i.e. it makes decision
on-the-fly on whether to request label for each unlabelled sample observed in sequence.
It selects adaptively two active learning criteria, namely likelihood criterion and uncertainty criterion
to achieve (1) discovery of unknown event classes and (2) refinement of classification
boundary.
The effectiveness of the proposed approaches is validated using videos captured from busy
public scenes such as underground stations and traffic intersections
Trajectory based video analysis in multi-camera setups
PhDThis thesis presents an automated framework for activity analysis in multi-camera
setups. We start with the calibration of cameras particularly without overlapping
views. An algorithm is presented that exploits trajectory observations in each view
and works iteratively on camera pairs. First outliers are identified and removed
from observations of each camera. Next, spatio-temporal information derived from
the available trajectory is used to estimate unobserved trajectory segments in areas
uncovered by the cameras. The unobserved trajectory estimates are used to estimate
the relative position of each camera pair, whereas the exit-entrance direction of
each object is used to estimate their relative orientation. The process continues and
iteratively approximates the configuration of all cameras with respect to each other.
Finally, we refi ne the initial configuration estimates with bundle adjustment, based
on the observed and estimated trajectory segments. For cameras with overlapping
views, state-of-the-art homography based approaches are used for calibration.
Next we establish object correspondence across multiple views. Our algorithm
consists of three steps, namely association, fusion and linkage. For association,
local trajectory pairs corresponding to the same physical object are estimated using
multiple spatio-temporal features on a common ground plane. To disambiguate
spurious associations, we employ a hybrid approach that utilises the matching results
on the image plane and ground plane. The trajectory segments after association
are fused by adaptive averaging. Trajectory linkage then integrates segments and generates a single trajectory of an object across the entire observed area.
Finally, for activities analysis clustering is applied on complete trajectories. Our
clustering algorithm is based on four main steps, namely the extraction of a set of
representative trajectory features, non-parametric clustering, cluster merging and
information fusion for the identification of normal and rare object motion patterns.
First we transform the trajectories into a set of feature spaces on which Meanshift
identi es the modes and the corresponding clusters. Furthermore, a merging
procedure is devised to re fine these results by combining similar adjacent clusters.
The fi nal common patterns are estimated by fusing the clustering results across all
feature spaces. Clusters corresponding to reoccurring trajectories are considered as
normal, whereas sparse trajectories are associated to abnormal and rare events.
The performance of the proposed framework is evaluated on standard data-sets
and compared with state-of-the-art techniques. Experimental results show that
the proposed framework outperforms state-of-the-art algorithms both in terms of
accuracy and robustness
- β¦