1,843 research outputs found
HNeRV: A Hybrid Neural Representation for Videos
Implicit neural representations store videos as neural networks and have
performed well for various vision tasks such as video compression and
denoising. With frame index or positional index as input, implicit
representations (NeRV, E-NeRV, \etc) reconstruct video from fixed and
content-agnostic embeddings. Such embedding largely limits the regression
capacity and internal generalization for video interpolation. In this paper, we
propose a Hybrid Neural Representation for Videos (HNeRV), where a learnable
encoder generates content-adaptive embeddings, which act as the decoder input.
Besides the input embedding, we introduce HNeRV blocks, which ensure model
parameters are evenly distributed across the entire network, such that higher
layers (layers near the output) can have more capacity to store high-resolution
content and video details. With content-adaptive embeddings and re-designed
architecture, HNeRV outperforms implicit methods in video regression tasks for
both reconstruction quality ( PSNR) and convergence speed (
faster), and shows better internal generalization. As a simple and efficient
video representation, HNeRV also shows decoding advantages for speed,
flexibility, and deployment, compared to traditional codecs~(H.264, H.265) and
learning-based compression methods. Finally, we explore the effectiveness of
HNeRV on downstream tasks such as video compression and video inpainting. We
provide project page at https://haochen-rye.github.io/HNeRV, and Code at
https://github.com/haochen-rye/HNeRVComment: CVPR 2023. Project page at https://haochen-rye.github.io/HNeRV, and
Code at https://github.com/haochen-rye/HNeR
Soft-Label Dataset Distillation and Text Dataset Distillation
Dataset distillation is a method for reducing dataset sizes by learning a
small number of synthetic samples containing all the information of a large
dataset. This has several benefits like speeding up model training, reducing
energy consumption, and reducing required storage space. Currently, each
synthetic sample is assigned a single `hard' label, and also, dataset
distillation can currently only be used with image data.
We propose to simultaneously distill both images and their labels, thus
assigning each synthetic sample a `soft' label (a distribution of labels). Our
algorithm increases accuracy by 2-4% over the original algorithm for several
image classification tasks. Using `soft' labels also enables distilled datasets
to consist of fewer samples than there are classes as each sample can encode
information for multiple classes. For example, training a LeNet model with 10
distilled images (one per class) results in over 96% accuracy on MNIST, and
almost 92% accuracy when trained on just 5 distilled images.
We also extend the dataset distillation algorithm to distill sequential
datasets including texts. We demonstrate that text distillation outperforms
other methods across multiple datasets. For example, models attain almost their
original accuracy on the IMDB sentiment analysis task using just 20 distilled
sentences.
Our code can be found at
Network Sketching: Exploiting Binary Structure in Deep CNNs
Convolutional neural networks (CNNs) with deep architectures have
substantially advanced the state-of-the-art in computer vision tasks. However,
deep networks are typically resource-intensive and thus difficult to be
deployed on mobile devices. Recently, CNNs with binary weights have shown
compelling efficiency to the community, whereas the accuracy of such models is
usually unsatisfactory in practice. In this paper, we introduce network
sketching as a novel technique of pursuing binary-weight CNNs, targeting at
more faithful inference and better trade-off for practical applications. Our
basic idea is to exploit binary structure directly in pre-trained filter banks
and produce binary-weight models via tensor expansion. The whole process can be
treated as a coarse-to-fine model approximation, akin to the pencil drawing
steps of outlining and shading. To further speedup the generated models, namely
the sketches, we also propose an associative implementation of binary tensor
convolutions. Experimental results demonstrate that a proper sketch of AlexNet
(or ResNet) outperforms the existing binary-weight models by large margins on
the ImageNet large scale classification task, while the committed memory for
network parameters only exceeds a little.Comment: To appear in CVPR201
Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together
Neural networks equipped with self-attention have parallelizable computation,
light-weight structure, and the ability to capture both long-range and local
dependencies. Further, their expressive power and performance can be boosted by
using a vector to measure pairwise dependency, but this requires to expand the
alignment matrix to a tensor, which results in memory and computation
bottlenecks. In this paper, we propose a novel attention mechanism called
"Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as
memory-efficient as a CNN, but significantly outperforms previous
CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token)
and global (source2token) dependencies by a novel compatibility function
composed of dot-product and additive attentions, 2) uses a tensor to represent
the feature-wise alignment scores for better expressive power but only requires
parallelizable matrix multiplications, and 3) combines multi-head with
multi-dimensional attentions, and applies a distinct positional mask to each
head (subspace), so the memory and computation can be distributed to multiple
heads, each with sequential information encoded independently. The experiments
show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or
competitive performance on nine NLP benchmarks with compelling memory- and
time-efficiency
- …