8,131 research outputs found
Articulated Clinician Detection Using 3D Pictorial Structures on RGB-D Data
Reliable human pose estimation (HPE) is essential to many clinical
applications, such as surgical workflow analysis, radiation safety monitoring
and human-robot cooperation. Proposed methods for the operating room (OR) rely
either on foreground estimation using a multi-camera system, which is a
challenge in real ORs due to color similarities and frequent illumination
changes, or on wearable sensors or markers, which are invasive and therefore
difficult to introduce in the room. Instead, we propose a novel approach based
on Pictorial Structures (PS) and on RGB-D data, which can be easily deployed in
real ORs. We extend the PS framework in two ways. First, we build robust and
discriminative part detectors using both color and depth images. We also
present a novel descriptor for depth images, called histogram of depth
differences (HDD). Second, we extend PS to 3D by proposing 3D pairwise
constraints and a new method that makes exact inference tractable. Our approach
is evaluated for pose estimation and clinician detection on a challenging RGB-D
dataset recorded in a busy operating room during live surgeries. We conduct
series of experiments to study the different part detectors in conjunction with
the various 2D or 3D pairwise constraints. Our comparisons demonstrate that 3D
PS with RGB-D part detectors significantly improves the results in a visually
challenging operating environment.Comment: The supplementary video is available at https://youtu.be/iabbGSqRSg
Convolutional Neural Networks for Counting Fish in Fisheries Surveillance Video
We present a computer vision tool that analyses video from a CCTV system installed on fishing trawlers to monitor discarded fish catch. The system aims to support expert observers who review the footage and verify numbers, species and sizes of discarded fish. The operational environment presents a significant challenge for these tasks. Fish are processed below deck under fluorescent lights, they are randomly oriented and there are multiple occlusions. The scene is unstructured and complicated by the presence of fishermen processing the catch. We describe an approach to segmenting the scene and counting fish that exploits the -Fields algorithm. We performed extensive tests of the algorithm on a data set comprising 443 frames from 6 belts. Results indicate the relative count error (for individual fish) ranges from 2\% to 16\%. We believe this is the first system that is able to handle footage from operational trawlers
Low Complexity Interpolation Filters for Motion Estimation and Application to the H.264 Encoders
Techniques for image super-resolution play an important role in a plethora of applications, which include video compression and motion estimation. The detection of the fractional displacements among frames facilitates the removal of temporal redundancy and improves the video quality by 2-4 dB PSNR. However, the increased complexity of the Fractional Motion Estimation (FME) process adds a significant computational load to the encoder and sets constraints to real-time designs. Researchers have performed timing analysis for the motion estimation process and they reported that FME accounts for almost half of the entire motion estimation period, which in turn accounts for 60-90% of the total encoding time depending on the design configuration
A Multiple-Expert Binarization Framework for Multispectral Images
In this work, a multiple-expert binarization framework for multispectral
images is proposed. The framework is based on a constrained subspace selection
limited to the spectral bands combined with state-of-the-art gray-level
binarization methods. The framework uses a binarization wrapper to enhance the
performance of the gray-level binarization. Nonlinear preprocessing of the
individual spectral bands is used to enhance the textual information. An
evolutionary optimizer is considered to obtain the optimal and some suboptimal
3-band subspaces from which an ensemble of experts is then formed. The
framework is applied to a ground truth multispectral dataset with promising
results. In addition, a generalization to the cross-validation approach is
developed that not only evaluates generalizability of the framework, it also
provides a practical instance of the selected experts that could be then
applied to unseen inputs despite the small size of the given ground truth
dataset.Comment: 12 pages, 8 figures, 6 tables. Presented at ICDAR'1
- …