267 research outputs found
Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement
In this paper, we propose a Hierarchical Learned Video Compression (HLVC)
method with three hierarchical quality layers and a recurrent enhancement
network. The frames in the first layer are compressed by an image compression
method with the highest quality. Using these frames as references, we propose
the Bi-Directional Deep Compression (BDDC) network to compress the second layer
with relatively high quality. Then, the third layer frames are compressed with
the lowest quality, by the proposed Single Motion Deep Compression (SMDC)
network, which adopts a single motion map to estimate the motions of multiple
frames, thus saving bits for motion information. In our deep decoder, we
develop the Weighted Recurrent Quality Enhancement (WRQE) network, which takes
both compressed frames and the bit stream as inputs. In the recurrent cell of
WRQE, the memory and update signal are weighted by quality features to
reasonably leverage multi-frame information for enhancement. In our HLVC
approach, the hierarchical quality benefits the coding efficiency, since the
high quality information facilitates the compression and enhancement of low
quality frames at encoder and decoder sides, respectively. Finally, the
experiments validate that our HLVC approach advances the state-of-the-art of
deep video compression methods, and outperforms the "Low-Delay P (LDP) very
fast" mode of x265 in terms of both PSNR and MS-SSIM. The project page is at
https://github.com/RenYang-home/HLVC.Comment: Published in CVPR 2020; corrected a minor typo in the footnote of
Table 1; corrected Figure 1
Deep Learning based Recommender System: A Survey and New Perspectives
With the ever-growing volume of online information, recommender systems have
been an effective strategy to overcome such information overload. The utility
of recommender systems cannot be overstated, given its widespread adoption in
many web applications, along with its potential impact to ameliorate many
problems related to over-choice. In recent years, deep learning has garnered
considerable interest in many research fields such as computer vision and
natural language processing, owing not only to stellar performance but also the
attractive property of learning feature representations from scratch. The
influence of deep learning is also pervasive, recently demonstrating its
effectiveness when applied to information retrieval and recommender systems
research. Evidently, the field of deep learning in recommender system is
flourishing. This article aims to provide a comprehensive review of recent
research efforts on deep learning based recommender systems. More concretely,
we provide and devise a taxonomy of deep learning based recommendation models,
along with providing a comprehensive summary of the state-of-the-art. Finally,
we expand on current trends and provide new perspectives pertaining to this new
exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys.
https://doi.acm.org/10.1145/328502
StairNet: Visual Recognition of Stairs for Human-Robot Locomotion
Human-robot walking with prosthetic legs and exoskeletons, especially over
complex terrains such as stairs, remains a significant challenge. Egocentric
vision has the unique potential to detect the walking environment prior to
physical interactions, which can improve transitions to and from stairs. This
motivated us to create the StairNet initiative to support the development of
new deep learning models for visual sensing and recognition of stairs, with an
emphasis on lightweight and efficient neural networks for onboard real-time
inference. In this study, we present an overview of the development of our
large-scale dataset with over 515,000 manually labeled images, as well as our
development of different deep learning models (e.g., 2D and 3D CNN, hybrid CNN
and LSTM, and ViT networks) and training methods (e.g., supervised learning
with temporal data and semi-supervised learning with unlabeled images) using
our new dataset. We consistently achieved high classification accuracy (i.e.,
up to 98.8%) with different designs, offering trade-offs between model accuracy
and size. When deployed on mobile devices with GPU and NPU accelerators, our
deep learning models achieved inference speeds up to 2.8 ms. We also deployed
our models on custom-designed CPU-powered smart glasses. However, limitations
in the embedded hardware yielded slower inference speeds of 1.5 seconds,
presenting a trade-off between human-centered design and performance. Overall,
we showed that StairNet can be an effective platform to develop and study new
visual perception systems for human-robot locomotion with applications in
exoskeleton and prosthetic leg control
- …