1,460 research outputs found
Histogram of Oriented Principal Components for Cross-View Action Recognition
Existing techniques for 3D action recognition are sensitive to viewpoint
variations because they extract features from depth images which are viewpoint
dependent. In contrast, we directly process pointclouds for cross-view action
recognition from unknown and unseen views. We propose the Histogram of Oriented
Principal Components (HOPC) descriptor that is robust to noise, viewpoint,
scale and action speed variations. At a 3D point, HOPC is computed by
projecting the three scaled eigenvectors of the pointcloud within its local
spatio-temporal support volume onto the vertices of a regular dodecahedron.
HOPC is also used for the detection of Spatio-Temporal Keypoints (STK) in 3D
pointcloud sequences so that view-invariant STK descriptors (or Local HOPC
descriptors) at these key locations only are used for action recognition. We
also propose a global descriptor computed from the normalized spatio-temporal
distribution of STKs in 4-D, which we refer to as STK-D. We have evaluated the
performance of our proposed descriptors against nine existing techniques on two
cross-view and three single-view human action recognition datasets. The
Experimental results show that our techniques provide significant improvement
over state-of-the-art methods
Multi-View Video Packet Scheduling
In multiview applications, multiple cameras acquire the same scene from
different viewpoints and generally produce correlated video streams. This
results in large amounts of highly redundant data. In order to save resources,
it is critical to handle properly this correlation during encoding and
transmission of the multiview data. In this work, we propose a
correlation-aware packet scheduling algorithm for multi-camera networks, where
information from all cameras are transmitted over a bottleneck channel to
clients that reconstruct the multiview images. The scheduling algorithm relies
on a new rate-distortion model that captures the importance of each view in the
scene reconstruction. We propose a problem formulation for the optimization of
the packet scheduling policies, which adapt to variations in the scene content.
Then, we design a low complexity scheduling algorithm based on a trellis search
that selects the subset of candidate packets to be transmitted towards
effective multiview reconstruction at clients. Extensive simulation results
confirm the gain of our scheduling algorithm when inter-source correlation
information is used in the scheduler, compared to scheduling policies with no
information about the correlation or non-adaptive scheduling policies. We
finally show that increasing the optimization horizon in the packet scheduling
algorithm improves the transmission performance, especially in scenarios where
the level of correlation rapidly varies with time
Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing
Free-viewpoint video conferencing allows a participant to observe the remote
3D scene from any freely chosen viewpoint. An intermediate virtual viewpoint
image is commonly synthesized using two pairs of transmitted texture and depth
maps from two neighboring captured viewpoints via depth-image-based rendering
(DIBR). To maintain high quality of synthesized images, it is imperative to
contain the adverse effects of network packet losses that may arise during
texture and depth video transmission. Towards this end, we develop an
integrated approach that exploits the representation redundancy inherent in the
multiple streamed videos a voxel in the 3D scene visible to two captured views
is sampled and coded twice in the two views. In particular, at the receiver we
first develop an error concealment strategy that adaptively blends
corresponding pixels in the two captured views during DIBR, so that pixels from
the more reliable transmitted view are weighted more heavily. We then couple it
with a sender-side optimization of reference picture selection (RPS) during
real-time video coding, so that blocks containing samples of voxels that are
visible in both views are more error-resiliently coded in one view only, given
adaptive blending will erase errors in the other view. Further, synthesized
view distortion sensitivities to texture versus depth errors are analyzed, so
that relative importance of texture and depth code blocks can be computed for
system-wide RPS optimization. Experimental results show that the proposed
scheme can outperform the use of a traditional feedback channel by up to 0.82
dB on average at 8% packet loss rate, and by as much as 3 dB for particular
frames
A content based method for perceptually driven joint color/depth compression
International audienceMulti-view Video plus Depth (MVD) data refer to a set of conventional color video sequences and an associated set of depth video sequences, all acquired at slightly different viewpoints. This huge amount of data necessitates a reliable compression method. However, there is no standardized compression method for MVD sequences. H.264/MVC compression method, which was standardized for Multi-View-Video representation (MVV), has been the subject of many adaptations to MVD. However, it has been shown that MVC is not well adapted to encode multi-view depth data. We propose a novel option as for compression of MVD data. Its main purpose is to preserve joint color/depth consistency. The originality of the proposed method relies on the use of the decoded color data as a prior for the associated depth compression. This is meant to ensure consistency in both types of data after decoding. Our strategy is motivated by previous studies of artifacts occurring in synthesized views: most annoying distortions are located around strong depth discontinuities and these distortions are due to misalignment of depth and color edges in decoded images. Thus the method is meant to preserve edges and to ensure consistent localization of color edges and depth edges. To ensure compatibility, colored sequences are encoded with H.264. Depth maps compression is based on a 2D still image codec, namely LAR (Locally adapted Resolution). It consists in a quad-tree representation of the images. The quad-tree representation contributes in the preservation of edges in both color and depth data. The adopted strategy is meant to be more perceptually driven than state-of-the-art methods. The proposed approach is compared to H.264 encoding of depth images. Objective metrics scores are similar with H.264 and with the proposed method, and visual quality of synthesized views is improved with the proposed approach
TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo
One of the most successful approaches in Multi-View Stereo estimates a depth
map and a normal map for each view via PatchMatch-based optimization and fuses
them into a consistent 3D points cloud. This approach relies on
photo-consistency to evaluate the goodness of a depth estimate. It generally
produces very accurate results; however, the reconstructed model often lacks
completeness, especially in correspondence of broad untextured areas where the
photo-consistency metrics are unreliable. Assuming the untextured areas
piecewise planar, in this paper we generate novel PatchMatch hypotheses so to
expand reliable depth estimates in neighboring untextured regions. At the same
time, we modify the photo-consistency measure such to favor standard or novel
PatchMatch depth hypotheses depending on the textureness of the considered
area. We also propose a depth refinement step to filter wrong estimates and to
fill the gaps on both the depth maps and normal maps while preserving the
discontinuities. The effectiveness of our new methods has been tested against
several state of the art algorithms in the publicly available ETH3D dataset
containing a wide variety of high and low-resolution images
Navigation domain representation for interactive multiview imaging
Enabling users to interactively navigate through different viewpoints of a
static scene is a new interesting functionality in 3D streaming systems. While
it opens exciting perspectives towards rich multimedia applications, it
requires the design of novel representations and coding techniques in order to
solve the new challenges imposed by interactive navigation. Interactivity
clearly brings new design constraints: the encoder is unaware of the exact
decoding process, while the decoder has to reconstruct information from
incomplete subsets of data since the server can generally not transmit images
for all possible viewpoints due to resource constrains. In this paper, we
propose a novel multiview data representation that permits to satisfy bandwidth
and storage constraints in an interactive multiview streaming system. In
particular, we partition the multiview navigation domain into segments, each of
which is described by a reference image and some auxiliary information. The
auxiliary information enables the client to recreate any viewpoint in the
navigation segment via view synthesis. The decoder is then able to navigate
freely in the segment without further data request to the server; it requests
additional data only when it moves to a different segment. We discuss the
benefits of this novel representation in interactive navigation systems and
further propose a method to optimize the partitioning of the navigation domain
into independent segments, under bandwidth and storage constraints.
Experimental results confirm the potential of the proposed representation;
namely, our system leads to similar compression performance as classical
inter-view coding, while it provides the high level of flexibility that is
required for interactive streaming. Hence, our new framework represents a
promising solution for 3D data representation in novel interactive multimedia
services
- …