2 research outputs found
Zero-Shot Crowd Behavior Recognition
Understanding crowd behavior in video is challenging for computer vision.
There have been increasing attempts on modeling crowded scenes by introducing
ever larger property ontologies (attributes) and annotating ever larger
training datasets. However, in contrast to still images, manually annotating
video attributes needs to consider spatiotemporal evolution which is inherently
much harder and more costly. Critically, the most interesting crowd behaviors
captured in surveillance videos (e.g., street fighting, flash mobs) are either
rare, thus have few examples for model training, or unseen previously. Existing
crowd analysis techniques are not readily scalable to recognize novel (unseen)
crowd behaviors. To address this problem, we investigate and develop methods
for recognizing visual crowd behavioral attributes without any training
samples, i.e., zero-shot learning crowd behavior recognition. To that end, we
relax the common assumption that each individual crowd video instance is only
associated with a single crowd attribute. Instead, our model learns to jointly
recognize multiple crowd behavioral attributes in each video instance by
exploring multiattribute cooccurrence as contextual knowledge for optimizing
individual crowd attribute recognition. Joint multilabel attribute prediction
in zero-shot learning is inherently nontrivial because cooccurrence statistics
does not exist for unseen attributes. To solve this problem, we learn to
predict cross-attribute cooccurrence from both online text corpus and
multilabel annotation of videos with known attributes. Our experiments show
that this approach to modeling multiattribute context not only improves
zero-shot crowd behavior recognition on the WWW crowd video dataset, but also
generalizes to novel behavior (violence) detection cross-domain in the Violence
Flow video dataset.Comment: Group and Crowd Behavior for Computer Vision 2017, Pages 341-36
Cross-Domain Traffic Scene Understanding by Motion Model Transfer
This paper proposes a novel framework for cross-domain traffic scene understanding. Existing learning-based outdoor wide-area scene interpretation models suffer from requiring long term data collection in order to acquire statistically sufficient model training samples for every new scene. This makes installation costly, prevents models from being easily relocated, and from being used in UAVs with continuously changing scenes. In contrast, our method adopts a geometrical matching approach to relate motion models learned from a database of source scenes (source domains) with a handful sparsely observed data in a new target scene (target domain). This framework is capable of online “sparse-shot” anomaly detection and motion event classification in the unseen target domain, without the need for extensive data collection, labelling and offline model training for each new target domain. That is, trained models in different source domains can be deployed to a new target domain with only a few unlabelled observations and without any training in the new target domain. Crucially, to provide cross-domain interpretation without risk of dramatic negative transfer, we introduce and formulate a scene association criterion to quantify transferability of motion models from one scene to another. Extensive experiments show the effectiveness of the proposed framework for cross-domain motion event classification, anomaly detection and scene association