7,677 research outputs found
Multimedia search without visual analysis: the value of linguistic and contextual information
This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features
A Deep Learning Framework for Unsupervised Affine and Deformable Image Registration
Image registration, the process of aligning two or more images, is the core
technique of many (semi-)automatic medical image analysis tasks. Recent studies
have shown that deep learning methods, notably convolutional neural networks
(ConvNets), can be used for image registration. Thus far training of ConvNets
for registration was supervised using predefined example registrations.
However, obtaining example registrations is not trivial. To circumvent the need
for predefined examples, and thereby to increase convenience of training
ConvNets for image registration, we propose the Deep Learning Image
Registration (DLIR) framework for \textit{unsupervised} affine and deformable
image registration. In the DLIR framework ConvNets are trained for image
registration by exploiting image similarity analogous to conventional
intensity-based image registration. After a ConvNet has been trained with the
DLIR framework, it can be used to register pairs of unseen images in one shot.
We propose flexible ConvNets designs for affine image registration and for
deformable image registration. By stacking multiple of these ConvNets into a
larger architecture, we are able to perform coarse-to-fine image registration.
We show for registration of cardiac cine MRI and registration of chest CT that
performance of the DLIR framework is comparable to conventional image
registration while being several orders of magnitude faster.Comment: Accepted: Medical Image Analysis - Elsevie
Real-time Monocular Object SLAM
We present a real-time object-based SLAM system that leverages the largest
object database to date. Our approach comprises two main components: 1) a
monocular SLAM algorithm that exploits object rigidity constraints to improve
the map and find its real scale, and 2) a novel object recognition algorithm
based on bags of binary words, which provides live detections with a database
of 500 3D objects. The two components work together and benefit each other: the
SLAM algorithm accumulates information from the observations of the objects,
anchors object features to especial map landmarks and sets constrains on the
optimization. At the same time, objects partially or fully located within the
map are used as a prior to guide the recognition algorithm, achieving higher
recall. We evaluate our proposal on five real environments showing improvements
on the accuracy of the map and efficiency with respect to other
state-of-the-art techniques
Circulant temporal encoding for video retrieval and temporal alignment
We address the problem of specific video event retrieval. Given a query video
of a specific event, e.g., a concert of Madonna, the goal is to retrieve other
videos of the same event that temporally overlap with the query. Our approach
encodes the frame descriptors of a video to jointly represent their appearance
and temporal order. It exploits the properties of circulant matrices to
efficiently compare the videos in the frequency domain. This offers a
significant gain in complexity and accurately localizes the matching parts of
videos. The descriptors can be compressed in the frequency domain with a
product quantizer adapted to complex numbers. In this case, video retrieval is
performed without decompressing the descriptors. We also consider the temporal
alignment of a set of videos. We exploit the matching confidence and an
estimate of the temporal offset computed for all pairs of videos by our
retrieval approach. Our robust algorithm aligns the videos on a global timeline
by maximizing the set of temporally consistent matches. The global temporal
alignment enables synchronous playback of the videos of a given scene
- …