5 research outputs found
Learning to Detect Carried Objects with Minimal Supervision
We propose a learning-based method for detecting carried objects that
generates candidate image regions from protrusion, color contrast and
occlusion boundary cues, and uses a classifier to filter out the regions
unlikely to be carried objects. The method achieves higher accuracy than
state of the art, which can only detect protrusions from the human
shape, and the discriminative model it builds for the silhouette
context-based region features generalizes well. To reduce annotation
effort, we investigate training the model in a Multiple Instance
Learning framework where the only available supervision is "walk" and
"carry" labels associated with intervals of human tracks, i.e., the
spatial extent of carried objects is not annotated. We present an
extension to the miSVM algorithm that uses knowledge of the fraction of
positive instances in positive bags and that scales to training sets of
hundreds of thousands of instances
IMPROVING EFFICIENCY AND SCALABILITY IN VISUAL SURVEILLANCE APPLICATIONS
We present four contributions to visual surveillance: (a) an action recognition method based on the characteristics of human motion in image space; (b) a study of the strengths of five regression techniques for monocular pose estimation that highlights the advantages of kernel PLS; (c) a learning-based method for detecting objects carried by humans requiring minimal annotation; (d) an interactive video segmentation system that reduces supervision by using occlusion and long term spatio-temporal structure information.
We propose a representation for human actions that is based solely on motion information and that leverages the characteristics of human movement in the image space. The representation is best suited to visual surveillance settings in which the actions of interest are highly constrained, but also works on more general problems if the actions are ballistic in nature. Our computationally efficient representation achieves good recognition performance on both a commonly used action recognition dataset and on a dataset we collected to simulate a checkout counter.
We study discriminative methods for 3D human pose estimation from single images, which build a map from image features to pose. The main difficulty with these methods is the insufficiency of training data due to the high dimensionality of the pose space. However, real datasets can be augmented with data from character animation software, so the scalability of existing approaches becomes important. We argue that Kernel Partial Least Squares approximates Gaussian Process regression robustly, enabling the use of larger datasets, and we show in experiments that kPLS outperforms two state-of-the-art methods based on GP.
The high variability in the appearance of carried objects suggests using their relation to the human silhouette to detect them. We adopt a generate-and-test approach that produces candidate regions from protrusion, color contrast and occlusion boundary cues and then filters them with a kernel SVM classifier on context features. Our method exceeds state of the art accuracy and has good generalization capability. We also propose a Multiple Instance Learning framework for the classifier that reduces annotation effort by two orders of magnitude while maintaining comparable accuracy.
Finally, we present an interactive video segmentation system that trades off a small amount of segmentation quality for significantly less supervision than necessary in systems in the literature. While applications like video editing could not directly use the output of our system, reasoning about the trajectories of objects in a scene or learning coarse appearance models is still possible. The unsupervised segmentation component at the base of our system effectively employs occlusion boundary cues and achieves competitive results on an unsupervised segmentation dataset. On videos used to evaluate interactive methods, our system requires less interaction time than others, does not rely on appearance information and can extract multiple objects at the same time
Learning to Detect Carried Objects with Minimal Supervision ∗
We propose a learning-based method for detecting carried objects that generates candidate image regions from protrusion, color contrast and occlusion boundary cues, and uses a classifier to filter out the regions unlikely to be carried objects. The method achieves higher accuracy than state of the art, which can only detect protrusions from the human shape, and the discriminative model it builds for the silhouette context-based region features generalizes well. To reduce annotation effort, we investigate training the model in a Multiple Instance Learning framework where the only available supervision is “walk ” and “carry ” labels associated with intervals of human tracks, i.e., the spatial extent of carried objects is not annotated. We present an extension to the miSVM algorithm that uses knowledge of the fraction of positive instances in positive bags and that scales to training sets of hundreds of thousands of instances. 1
Carried baggage detection and recognition in video surveillance with foreground segmentation
Security cameras installed in public spaces or in private organizations continuously
record video data with the aim of detecting and preventing crime. For that reason,
video content analysis applications, either for real time (i.e. analytic) or post-event
(i.e. forensic) analysis, have gained high interest in recent years. In this thesis,
the primary focus is on two key aspects of video analysis, reliable moving object
segmentation and carried object detection & identification.
A novel moving object segmentation scheme by background subtraction is presented
in this thesis. The scheme relies on background modelling which is based
on multi-directional gradient and phase congruency. As a post processing step,
the detected foreground contours are refined by classifying the edge segments as
either belonging to the foreground or background. Further contour completion
technique by anisotropic diffusion is first introduced in this area. The proposed
method targets cast shadow removal, gradual illumination change invariance, and
closed contour extraction.
A state of the art carried object detection method is employed as a benchmark
algorithm. This method includes silhouette analysis by comparing human temporal
templates with unencumbered human models. The implementation aspects of
the algorithm are improved by automatically estimating the viewing direction of
the pedestrian and are extended by a carried luggage identification module. As
the temporal template is a frequency template and the information that it provides
is not sufficient, a colour temporal template is introduced. The standard
steps followed by the state of the art algorithm are approached from a different
extended (by colour information) perspective, resulting in more accurate carried
object segmentation.
The experiments conducted in this research show that the proposed closed
foreground segmentation technique attains all the aforementioned goals. The incremental
improvements applied to the state of the art carried object detection
algorithm revealed the full potential of the scheme. The experiments demonstrate
the ability of the proposed carried object detection algorithm to supersede the
state of the art method