Search CORE

991 research outputs found

Video Acceleration Magnification

Author: Pintea Silvia L.
van Gemert Jan C.
Zhang Yichao
Publication venue
Publication date: 22/04/2017
Field of study

The ability to amplify or reduce subtle image changes over time is useful in contexts such as video editing, medical video analysis, product quality control and sports. In these contexts there is often large motion present which severely distorts current video amplification methods that magnify change linearly. In this work we propose a method to cope with large motions while still magnifying small changes. We make the following two observations: i) large motions are linear on the temporal scale of the small changes; ii) small changes deviate from this linearity. We ignore linear motion and propose to magnify acceleration. Our method is pure Eulerian and does not require any optical flow, temporal alignment or region annotations. We link temporal second-order derivative filtering to spatial acceleration magnification. We apply our method to moving objects where we show motion magnification and color magnification. We provide quantitative as well as qualitative evidence for our method while comparing to the state-of-the-art.Comment: Accepted paper at CVPR 2017. Project webpage: http://acceleration-magnification.github.io

arXiv.org e-Print Archive

Crossref

Vision-based Detection of Acoustic Timed Events: a Case Study on Clarinet Note Onsets

Author: Bazzica A.
van Gemert J. C.
Liem C. C. S.
Hanjalic A.
Publication venue
Publication date: 01/01/1910
Field of study

Acoustic events often have a visual counterpart. Knowledge of visual information can aid the understanding of complex auditory scenes, even when only a stereo mixdown is available in the audio domain, \eg identifying which musicians are playing in large musical ensembles. In this paper, we consider a vision-based approach to note onset detection. As a case study we focus on challenging, real-world clarinetist videos and carry out preliminary experiments on a 3D convolutional neural network based on multiple streams and purposely avoiding temporal pooling. We release an audiovisual dataset with 4.5 hours of clarinetist videos together with cleaned annotations which include about 36,000 onsets and the coordinates for a number of salient points and regions of interest. By performing several training trials on our dataset, we learned that the problem is challenging. We found that the CNN model is highly sensitive to the optimization algorithm and hyper-parameters, and that treating the problem as binary classification may prevent the joint optimization of precision and recall. To encourage further research, we publicly share our dataset, annotations and all models and detail which issues we came across during our preliminary experiments.Comment: Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE]

arXiv.org e-Print Archive

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

No Spare Parts: Sharing Part Detectors for Image Categorization

Author: Mettes Pascal
Snoek Cees G. M.
van Gemert Jan C.
Publication venue
Publication date: 01/01/2016
Field of study

This work aims for image categorization using a representation of distinctive parts. Different from existing part-based work, we argue that parts are naturally shared between image categories and should be modeled as such. We motivate our approach with a quantitative and qualitative analysis by backtracking where selected parts come from. Our analysis shows that in addition to the category parts defining the class, the parts coming from the background context and parts from other image categories improve categorization performance. Part selection should not be done separately for each category, but instead be shared and optimized over all categories. To incorporate part sharing between categories, we present an algorithm based on AdaBoost to jointly optimize part sharing and selection, as well as fusion with the global image representation. We achieve results competitive to the state-of-the-art on object, scene, and action categories, further improving over deep convolutional neural networks

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Objects2action: Classifying and localizing actions without any video example

Author: Jain Mihir
Mensink Thomas
Snoek Cees G. M.
van Gemert Jan C.
Publication venue
Publication date: 01/01/2015
Field of study

The goal of this paper is to recognize actions in video without the need for examples. Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen classes. Our key contribution is objects2action, a semantic word embedding that is spanned by a skip-gram model of thousands of object categories. Action labels are assigned to an object encoding of unseen video based on a convex combination of action and object affinities. Our semantic embedding has three main characteristics to accommodate for the specifics of actions. First, we propose a mechanism to exploit multiple-word descriptions of actions and objects. Second, we incorporate the automated selection of the most responsive objects per action. And finally, we demonstrate how to extend our zero-shot approach to the spatio-temporal localization of actions in video. Experiments on four action datasets demonstrate the potential of our approach

arXiv.org e-Print Archive

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications