1,616 research outputs found
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Strategies for Searching Video Content with Text Queries or Video Examples
The large number of user-generated videos uploaded on to the Internet
everyday has led to many commercial video search engines, which mainly rely on
text metadata for search. However, metadata is often lacking for user-generated
videos, thus these videos are unsearchable by current search engines.
Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity
problem by directly analyzing the visual and audio streams of each video. CBVR
encompasses multiple research topics, including low-level feature design,
feature fusion, semantic detector training and video search/reranking. We
present novel strategies in these topics to enhance CBVR in both accuracy and
speed under different query inputs, including pure textual queries and query by
video examples. Our proposed strategies have been incorporated into our
submission for the TRECVID 2014 Multimedia Event Detection evaluation, where
our system outperformed other submissions in both text queries and video
example queries, thus demonstrating the effectiveness of our proposed
approaches
Discriminatively Trained Latent Ordinal Model for Video Classification
We study the problem of video classification for facial analysis and human
action recognition. We propose a novel weakly supervised learning method that
models the video as a sequence of automatically mined, discriminative
sub-events (eg. onset and offset phase for "smile", running and jumping for
"highjump"). The proposed model is inspired by the recent works on Multiple
Instance Learning and latent SVM/HCRF -- it extends such frameworks to model
the ordinal aspect in the videos, approximately. We obtain consistent
improvements over relevant competitive baselines on four challenging and
publicly available video based facial analysis datasets for prediction of
expression, clinical pain and intent in dyadic conversations and on three
challenging human action datasets. We also validate the method with qualitative
results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text
overlap with arXiv:1604.0150
A FRAMEWORK FOR SURVEILLANCE VIDEO INDEXING AND RETRIEVAL
International audienceWe propose a framework for surveillance video indexing and retrieval. In this paper, we focus on the following features: (1) combine recognized video contents (output from a video analysis module) with visual words (computed over all the raw video frames) to enrich the video indexation in a complimentary way; using this scheme user can make queries about objects of interest even when the video analysis output is not available; (2) support an interactive feature generation (currently color histogram and trajectory) that gives a facility for users to make queries at different levels according to the a priori available information and the expected results from retrieval; (3) develop a relevance feedback module adapted to the proposed indexing scheme and the specific properties of surveillance videos for the video surveillance context. Results emphasing these three aspects prove a good integration of video analysis for video surveillance and interactive indexing and retrieval
Leveraging large scale data for video retrieval
Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2014.Thesis (Master's) -- Bilkent University, 2014.Includes bibliographical references leaves 75-82.The large amount of video data shared on the web resulted in increased interest
on retrieving videos using usual cues, since textual cues alone are not sufficient for
satisfactory results. We address the problem of leveraging large scale image and video
data for capturing important characteristics in videos. We focus on three different
problems, namely finding common patterns in unusual videos, large scale multimedia
event detection, and semantic indexing of videos.
Unusual events are important as being possible indicators of undesired consequences.
Discovery of unusual events in videos is generally attacked as a problem
of finding usual patterns. With this challenging problem at hand, we propose a novel
descriptor to encode the rapid motions in videos utilizing densely extracted trajectories.
The proposed descriptor, trajectory snippet histograms, is used to distinguish
unusual videos from usual videos, and further exploited to discover snapshots in which
unusualness happen.
Next, we attack the Multimedia Event Detection (MED) task. We approach this
problem as representing the videos in the form of prototypes, that correspond to models
each describing a different visual characteristic of a video shot. Finally, we approach
the Semantic Indexing (SIN) problem, and collect web images to train models for each
concept.Armağan, AnılM.S
- …