1,083 research outputs found
Boosted Multiple Kernel Learning for First-Person Activity Recognition
Activity recognition from first-person (ego-centric) videos has recently
gained attention due to the increasing ubiquity of the wearable cameras. There
has been a surge of efforts adapting existing feature descriptors and designing
new descriptors for the first-person videos. An effective activity recognition
system requires selection and use of complementary features and appropriate
kernels for each feature. In this study, we propose a data-driven framework for
first-person activity recognition which effectively selects and combines
features and their respective kernels during the training. Our experimental
results show that use of Multiple Kernel Learning (MKL) and Boosted MKL in
first-person activity recognition problem exhibits improved results in
comparison to the state-of-the-art. In addition, these techniques enable the
expansion of the framework with new features in an efficient and convenient
way.Comment: First published in the Proceedings of the 25th European Signal
Processing Conference (EUSIPCO-2017) in 2017, published by EURASI
Recommended from our members
Multitarget Tracking in Nonoverlapping Cameras Using a Reference Set
Tracking multiple targets in nonoverlapping cameras are challenging since the observations of the same targets are often separated by time and space. There might be significant appearance change of a target across camera views caused by variations in illumination conditions, poses, and camera imaging characteristics. Consequently, the same target may appear very different in two cameras. Therefore, associating tracks in different camera views directly based on their appearance similarity is difficult and prone to error. In most previous methods, the appearance similarity is computed either using color histograms or based on pretrained brightness transfer function that maps color between cameras. In this paper, a novel reference set based appearance model is proposed to improve multitarget tracking in a network of nonoverlapping cameras. Contrary to previous work, a reference set is constructed for a pair of cameras, containing subjects appearing in both camera views. For track association, instead of directly comparing the appearance of two targets in different camera views, they are compared indirectly via the reference set. Besides global color histograms, texture and shape features are extracted at different locations of a target, and AdaBoost is used to learn the discriminative power of each feature. The effectiveness of the proposed method over the state of the art on two challenging real-world multicamera video data sets is demonstrated by thorough experiments
IMPROVING EFFICIENCY AND SCALABILITY IN VISUAL SURVEILLANCE APPLICATIONS
We present four contributions to visual surveillance: (a) an action recognition method based on the characteristics of human motion in image space; (b) a study of the strengths of five regression techniques for monocular pose estimation that highlights the advantages of kernel PLS; (c) a learning-based method for detecting objects carried by humans requiring minimal annotation; (d) an interactive video segmentation system that reduces supervision by using occlusion and long term spatio-temporal structure information.
We propose a representation for human actions that is based solely on motion information and that leverages the characteristics of human movement in the image space. The representation is best suited to visual surveillance settings in which the actions of interest are highly constrained, but also works on more general problems if the actions are ballistic in nature. Our computationally efficient representation achieves good recognition performance on both a commonly used action recognition dataset and on a dataset we collected to simulate a checkout counter.
We study discriminative methods for 3D human pose estimation from single images, which build a map from image features to pose. The main difficulty with these methods is the insufficiency of training data due to the high dimensionality of the pose space. However, real datasets can be augmented with data from character animation software, so the scalability of existing approaches becomes important. We argue that Kernel Partial Least Squares approximates Gaussian Process regression robustly, enabling the use of larger datasets, and we show in experiments that kPLS outperforms two state-of-the-art methods based on GP.
The high variability in the appearance of carried objects suggests using their relation to the human silhouette to detect them. We adopt a generate-and-test approach that produces candidate regions from protrusion, color contrast and occlusion boundary cues and then filters them with a kernel SVM classifier on context features. Our method exceeds state of the art accuracy and has good generalization capability. We also propose a Multiple Instance Learning framework for the classifier that reduces annotation effort by two orders of magnitude while maintaining comparable accuracy.
Finally, we present an interactive video segmentation system that trades off a small amount of segmentation quality for significantly less supervision than necessary in systems in the literature. While applications like video editing could not directly use the output of our system, reasoning about the trajectories of objects in a scene or learning coarse appearance models is still possible. The unsupervised segmentation component at the base of our system effectively employs occlusion boundary cues and achieves competitive results on an unsupervised segmentation dataset. On videos used to evaluate interactive methods, our system requires less interaction time than others, does not rely on appearance information and can extract multiple objects at the same time
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
- …