65,720 research outputs found
Sparse Coding on Stereo Video for Object Detection
Deep Convolutional Neural Networks (DCNN) require millions of labeled
training examples for image classification and object detection tasks, which
restrict these models to domains where such datasets are available. In this
paper, we explore the use of unsupervised sparse coding applied to stereo-video
data to help alleviate the need for large amounts of labeled data. We show that
replacing a typical supervised convolutional layer with an unsupervised
sparse-coding layer within a DCNN allows for better performance on a car
detection task when only a limited number of labeled training examples is
available. Furthermore, the network that incorporates sparse coding allows for
more consistent performance over varying initializations and ordering of
training examples when compared to a fully supervised DCNN. Finally, we compare
activations between the unsupervised sparse-coding layer and the supervised
convolutional layer, and show that the sparse representation exhibits an
encoding that is depth selective, whereas encodings from the convolutional
layer do not exhibit such selectivity. These result indicates promise for using
unsupervised sparse-coding approaches in real-world computer vision tasks in
domains with limited labeled training data
Low-latency compression of mocap data using learned spatial decorrelation transform
Due to the growing needs of human motion capture (mocap) in movie, video
games, sports, etc., it is highly desired to compress mocap data for efficient
storage and transmission. This paper presents two efficient frameworks for
compressing human mocap data with low latency. The first framework processes
the data in a frame-by-frame manner so that it is ideal for mocap data
streaming and time critical applications. The second one is clip-based and
provides a flexible tradeoff between latency and compression performance. Since
mocap data exhibits some unique spatial characteristics, we propose a very
effective transform, namely learned orthogonal transform (LOT), for reducing
the spatial redundancy. The LOT problem is formulated as minimizing square
error regularized by orthogonality and sparsity and solved via alternating
iteration. We also adopt a predictive coding and temporal DCT for temporal
decorrelation in the frame- and clip-based frameworks, respectively.
Experimental results show that the proposed frameworks can produce higher
compression performance at lower computational cost and latency than the
state-of-the-art methods.Comment: 15 pages, 9 figure
- …