1,588 research outputs found
RGB-D-based Action Recognition Datasets: A Survey
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has
attracted increasing attention since the first work reported in 2010. Over this
period, many benchmark datasets have been created to facilitate the development
and evaluation of new algorithms. This raises the question of which dataset to
select and how to use it in providing a fair and objective comparative
evaluation against state-of-the-art methods. To address this issue, this paper
provides a comprehensive review of the most commonly used action recognition
related RGB-D video datasets, including 27 single-view datasets, 10 multi-view
datasets, and 7 multi-person datasets. The detailed information and analysis of
these datasets is a useful resource in guiding insightful selection of datasets
for future research. In addition, the issues with current algorithm evaluation
vis-\'{a}-vis limitations of the available datasets and evaluation protocols
are also highlighted; resulting in a number of recommendations for collection
of new datasets and use of evaluation protocols
Dünaamiline kiiruse jaotamine interaktiivses mitmevaatelises video vaatevahetuse ennustamineses
In Interactive Multi-View Video (IMVV), the video has been captured by numbers of
cameras positioned in array and transmitted those camera views to users. The user can
interact with the transmitted video content by choosing viewpoints (views from different
cameras in the array) with the expectation of minimum transmission delay while
changing among various views. View switching delay is one of the primary concern that
is dealt in this thesis work, where the contribution is to minimize the transmission delay
of new view switch frame through a novel process of selection of the predicted view
and compression considering the transmission efficiency. Mainly considered a realtime
IMVV streaming, and the view switch is mapped as discrete Markov chain, where
the transition probability is derived using Zipf distribution, which provides information
regarding view switch prediction. To eliminate Round-Trip Time (RTT) transmission
delay, Quantization Parameters (QP) are adaptively allocated to the remaining redundant
transmitted frames to maintain view switching time minimum, trading off with
the quality of the video till RTT time-span. The experimental results of the proposed
method show superior performance on PSNR and view switching delay for better viewing quality over the existing methods
Fuzzy-based Propagation of Prior Knowledge to Improve Large-Scale Image Analysis Pipelines
Many automatically analyzable scientific questions are well-posed and offer a
variety of information about the expected outcome a priori. Although often
being neglected, this prior knowledge can be systematically exploited to make
automated analysis operations sensitive to a desired phenomenon or to evaluate
extracted content with respect to this prior knowledge. For instance, the
performance of processing operators can be greatly enhanced by a more focused
detection strategy and the direct information about the ambiguity inherent in
the extracted data. We present a new concept for the estimation and propagation
of uncertainty involved in image analysis operators. This allows using simple
processing operators that are suitable for analyzing large-scale 3D+t
microscopy images without compromising the result quality. On the foundation of
fuzzy set theory, we transform available prior knowledge into a mathematical
representation and extensively use it enhance the result quality of various
processing operators. All presented concepts are illustrated on a typical
bioimage analysis pipeline comprised of seed point detection, segmentation,
multiview fusion and tracking. Furthermore, the functionality of the proposed
approach is validated on a comprehensive simulated 3D+t benchmark data set that
mimics embryonic development and on large-scale light-sheet microscopy data of
a zebrafish embryo. The general concept introduced in this contribution
represents a new approach to efficiently exploit prior knowledge to improve the
result quality of image analysis pipelines. Especially, the automated analysis
of terabyte-scale microscopy data will benefit from sophisticated and efficient
algorithms that enable a quantitative and fast readout. The generality of the
concept, however, makes it also applicable to practically any other field with
processing strategies that are arranged as linear pipelines.Comment: 39 pages, 12 figure
End-to-End Multiview Gesture Recognition for Autonomous Car Parking System
The use of hand gestures can be the most intuitive human-machine interaction medium.
The early approaches for hand gesture recognition used device-based methods. These
methods use mechanical or optical sensors attached to a glove or markers, which hinders
the natural human-machine communication. On the other hand, vision-based methods are
not restrictive and allow for a more spontaneous communication without the need of an
intermediary between human and machine. Therefore, vision gesture recognition has been
a popular area of research for the past thirty years.
Hand gesture recognition finds its application in many areas, particularly the automotive
industry where advanced automotive human-machine interface (HMI) designers are
using gesture recognition to improve driver and vehicle safety. However, technology advances
go beyond active/passive safety and into convenience and comfort. In this context,
one of America’s big three automakers has partnered with the Centre of Pattern Analysis
and Machine Intelligence (CPAMI) at the University of Waterloo to investigate expanding
their product segment through machine learning to provide an increased driver convenience
and comfort with the particular application of hand gesture recognition for autonomous
car parking.
In this thesis, we leverage the state-of-the-art deep learning and optimization techniques
to develop a vision-based multiview dynamic hand gesture recognizer for self-parking system.
We propose a 3DCNN gesture model architecture that we train on a publicly available
hand gesture database. We apply transfer learning methods to fine-tune the pre-trained
gesture model on a custom-made data, which significantly improved the proposed system
performance in real world environment. We adapt the architecture of the end-to-end solution
to expand the state of the art video classifier from a single image as input (fed by
monocular camera) to a multiview 360 feed, offered by a six cameras module. Finally, we
optimize the proposed solution to work on a limited resources embedded platform (Nvidia
Jetson TX2) that is used by automakers for vehicle-based features, without sacrificing the
accuracy robustness and real time functionality of the system
Recommended from our members
Holoscopic 3D imaging and display technology: Camera/ processing/ display
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonHoloscopic 3D imaging “Integral imaging” was first proposed by Lippmann in 1908. It has become an attractive technique for creating full colour 3D scene that exists in space. It promotes a single camera aperture for recording spatial information of a real scene and it uses a regularly spaced microlens arrays to simulate the principle of Fly’s eye technique, which creates physical duplicates of light field “true 3D-imaging technique”.
While stereoscopic and multiview 3D imaging systems which simulate human eye technique are widely available in the commercial market, holoscopic 3D imaging technology is still in the research phase. The aim of this research is to investigate spatial resolution of holoscopic 3D imaging and display technology, which includes holoscopic 3D camera, processing and display.
Smart microlens array architecture is proposed that doubles spatial resolution of holoscopic 3D camera horizontally by trading horizontal and vertical resolutions. In particular, it overcomes unbalanced pixel aspect ratio of unidirectional holoscopic 3D images. In addition, omnidirectional holoscopic 3D computer graphics rendering techniques are proposed that simplify the rendering complexity and facilitate holoscopic 3D content generation.
Holoscopic 3D image stitching algorithm is proposed that widens overall viewing angle of holoscopic 3D camera aperture and pre-processing of holoscopic 3D image filters are proposed for spatial data alignment and 3D image data processing. In addition, Dynamic hyperlinker tool is developed that offers interactive holoscopic 3D video content search-ability and browse-ability.
Novel pixel mapping techniques are proposed that improves spatial resolution and visual definition in space. For instance, 4D-DSPM enhances 3D pixels per inch from 44 3D-PPIs to 176 3D-PPIs horizontally and achieves spatial resolution of 1365 × 384 3D-Pixels whereas the traditional spatial resolution is 341 × 1536 3D-Pixels. In addition distributed pixel mapping is proposed that improves quality of holoscopic 3D scene in space by creating RGB-colour channel elemental images
- …