1,131 research outputs found
Using segmented objects in ostensive video shot retrieval
This paper presents a system for video shot retrieval in which shots are retrieved based on matching video objects using a combination of colour, shape and texture. Rather than matching on individual objects, our system supports sets of query objects which in total reflect the user’s object-based information need. Our work also adapts to a shifting user information need by initiating the partitioning of a user’s search into two or more distinct search threads, which can be followed by the user in sequence. This is an automatic process which maps neatly to the ostensive model for information retrieval in that it allows a user to place a virtual checkpoint on their search, explore one thread or aspect of their information need and then return to that checkpoint to then explore an alternative thread. Our system is fully functional and operational and in this paper we illustrate several design decisions we have made in building it
Weakly Supervised Localization using Deep Feature Maps
Object localization is an important computer vision problem with a variety of
applications. The lack of large scale object-level annotations and the relative
abundance of image-level labels makes a compelling case for weak supervision in
the object localization task. Deep Convolutional Neural Networks are a class of
state-of-the-art methods for the related problem of object recognition. In this
paper, we describe a novel object localization algorithm which uses
classification networks trained on only image labels. This weakly supervised
method leverages local spatial and semantic patterns captured in the
convolutional layers of classification networks. We propose an efficient beam
search based approach to detect and localize multiple objects in images. The
proposed method significantly outperforms the state-of-the-art in standard
object localization data-sets with a 8 point increase in mAP scores
EDGE PRESERVING FILTERS USING GEODESIC DISTANCES ON WEIGHTED ORTHOGONAL DOMAINS
We introduce a framework for image enhancement, which smooths images while preserving edge information. Domain (spatial) and range (feature) information are combined in one single measure in a principled way. This measure turns out to be the geodesic distance between pixels, calculated on weighted orthogonal domains. The weight function is computed to capture the underlying structure of the image manifold, but allowing at the same time to efficiently solve, using the Fast Marching algorithm on orthogonal domains, the eikonal equation to obtain the geodesic distances. We show promising results in edge-preserving denoising of gray scale, color and texture images. Index Terms — Adaptive smoothing filters, geodesic distance, Fast Marching Method, edge-preserving filtering. 1
Search Tracker: Human-derived object tracking in-the-wild through large-scale search and retrieval
Humans use context and scene knowledge to easily localize moving objects in
conditions of complex illumination changes, scene clutter and occlusions. In
this paper, we present a method to leverage human knowledge in the form of
annotated video libraries in a novel search and retrieval based setting to
track objects in unseen video sequences. For every video sequence, a document
that represents motion information is generated. Documents of the unseen video
are queried against the library at multiple scales to find videos with similar
motion characteristics. This provides us with coarse localization of objects in
the unseen video. We further adapt these retrieved object locations to the new
video using an efficient warping scheme. The proposed method is validated on
in-the-wild video surveillance datasets where we outperform state-of-the-art
appearance-based trackers. We also introduce a new challenging dataset with
complex object appearance changes.Comment: Under review with the IEEE Transactions on Circuits and Systems for
Video Technolog
Accurate 3D Cell Segmentation using Deep Feature and CRF Refinement
We consider the problem of accurately identifying cell boundaries and
labeling individual cells in confocal microscopy images, specifically, 3D image
stacks of cells with tagged cell membranes. Precise identification of cell
boundaries, their shapes, and quantifying inter-cellular space leads to a
better understanding of cell morphogenesis. Towards this, we outline a cell
segmentation method that uses a deep neural network architecture to extract a
confidence map of cell boundaries, followed by a 3D watershed algorithm and a
final refinement using a conditional random field. In addition to improving the
accuracy of segmentation compared to other state-of-the-art methods, the
proposed approach also generalizes well to different datasets without the need
to retrain the network for each dataset. Detailed experimental results are
provided, and the source code is available on GitHub.Comment: 5 pages, 5 figures, 3 table
Eye-CU: Sleep Pose Classification for Healthcare using Multimodal Multiview Data
Manual analysis of body poses of bed-ridden patients requires staff to
continuously track and record patient poses. Two limitations in the
dissemination of pose-related therapies are scarce human resources and
unreliable automated systems. This work addresses these issues by introducing a
new method and a new system for robust automated classification of sleep poses
in an Intensive Care Unit (ICU) environment. The new method,
coupled-constrained Least-Squares (cc-LS), uses multimodal and multiview (MM)
data and finds the set of modality trust values that minimizes the difference
between expected and estimated labels. The new system, Eye-CU, is an affordable
multi-sensor modular system for unobtrusive data collection and analysis in
healthcare. Experimental results indicate that the performance of cc-LS matches
the performance of existing methods in ideal scenarios. This method outperforms
the latest techniques in challenging scenarios by 13% for those with poor
illumination and by 70% for those with both poor illumination and occlusions.
Results also show that a reduced Eye-CU configuration can classify poses
without pressure information with only a slight drop in its performance.Comment: Ten-page manuscript including references and ten figure
- …