72 research outputs found
Circulant temporal encoding for video retrieval and temporal alignment
We address the problem of specific video event retrieval. Given a query video
of a specific event, e.g., a concert of Madonna, the goal is to retrieve other
videos of the same event that temporally overlap with the query. Our approach
encodes the frame descriptors of a video to jointly represent their appearance
and temporal order. It exploits the properties of circulant matrices to
efficiently compare the videos in the frequency domain. This offers a
significant gain in complexity and accurately localizes the matching parts of
videos. The descriptors can be compressed in the frequency domain with a
product quantizer adapted to complex numbers. In this case, video retrieval is
performed without decompressing the descriptors. We also consider the temporal
alignment of a set of videos. We exploit the matching confidence and an
estimate of the temporal offset computed for all pairs of videos by our
retrieval approach. Our robust algorithm aligns the videos on a global timeline
by maximizing the set of temporally consistent matches. The global temporal
alignment enables synchronous playback of the videos of a given scene
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Fine-grained Incident Video Retrieval with Video Similarity Learning.
PhD ThesesIn this thesis, we address the problem of Fine-grained Incident Video Retrieval (FIVR)
using video similarity learning methods. FIVR is a video retrieval task that aims to
retrieve all videos that depict the same incident given a query video { related video
retrieval tasks adopt either very narrow or very broad scopes, considering only nearduplicate
or same event videos. To formulate the case of same incident videos, we
de ne three video associations taking into account the spatio-temporal spans captured
by video pairs. To cover the benchmarking needs of FIVR, we construct a large-scale
dataset, called FIVR-200K, consisting of 225,960 YouTube videos from major news
events crawled from Wikipedia. The dataset contains four annotation labels according
to FIVR de nitions; hence, it can simulate several retrieval scenarios with the same
video corpus. To address FIVR, we propose two video-level approaches leveraging
features extracted from intermediate layers of Convolutional Neural Networks (CNN).
The rst is an unsupervised method that relies on a modi ed Bag-of-Word scheme,
which generates video representations from the aggregation of the frame descriptors
based on learned visual codebooks. The second is a supervised method based on Deep
Metric Learning, which learns an embedding function that maps videos in a feature
space where relevant video pairs are closer than the irrelevant ones. However, videolevel
approaches generate global video representations, losing all spatial and temporal
relations between compared videos. Therefore, we propose a video similarity learning
approach that captures ne-grained relations between videos for accurate similarity
calculation. We train a CNN architecture to compute video-to-video similarity from
re ned frame-to-frame similarity matrices derived from a pairwise region-level similarity
function. The proposed approaches have been extensively evaluated on FIVR-
200K and other large-scale datasets, demonstrating their superiority over other video
retrieval methods and highlighting the challenging aspect of the FIVR problem
Efficient video collection association using geometry-aware Bag-of-Iconics representations
Abstract Recent years have witnessed the dramatic evolution in visual data volume and processing capabilities. For example, technical advances have enabled 3D modeling from large-scale crowdsourced photo collections. Compared to static image datasets, exploration and exploitation of Internet video collections are still largely unsolved. To address this challenge, we first propose to represent video contents using a histogram representation of iconic imagery attained from relevant visual datasets. We then develop a data-driven framework for a fully unsupervised extraction of such representations. Our novel Bag-of-Iconics (BoI) representation efficiently analyzes individual videos within a large-scale video collection. We demonstrate our proposed BoI representation with two novel applications: (1) finding video sequences connecting adjacent landmarks and aligning reconstructed 3D models and (2) retrieving geometrically relevant clips from video collections. Results on crowdsourced datasets illustrate the efficiency and effectiveness of our proposed Bag-of-Iconics representation
SEARCHING HETEROGENEOUS DOCUMENT IMAGE COLLECTIONS
A decrease in data storage costs and widespread use of scanning devices has led to massive quantities of scanned digital documents in corporations, organizations, and governments around the world. Automatically processing these large heterogeneous collections can be difficult due to considerable variation in resolution, quality, font, layout, noise, and content. In order to make this data available to a wide audience, methods for efficient retrieval and analysis from large collections of document images remain an open and important area of research. In this proposal, we present research in three areas that augment the current state of the art in the retrieval and analysis of large heterogeneous document image collections.
First, we explore an efficient approach to document image retrieval, which allows users to perform retrieval against large image collections in a query-by-example manner. Our approach is compared to text retrieval of OCR on a collection of 7 million document images collected from lawsuits against tobacco companies. Next, we present research in document verification and change detection, where one may want to quickly determine if two document images contain any differences (document verification) and if so, to determine precisely what and where changes have occurred (change detection). A motivating example is legal contracts, where scanned images are often e-mailed back and forth and small changes can have severe ramifications. Finally, approaches useful for exploiting the biometric properties of handwriting in order to perform writer identification and retrieval in document images are examined
Gazo bunseki to kanren joho o riyoshita gazo imi rikai ni kansuru kenkyu
制度:新 ; 報告番号:甲3514号 ; 学位の種類:博士(国際情報通信学) ; 授与年月日:2012/2/8 ; 早大学位記番号:新585
- …