108 research outputs found
GeneFormer: Learned Gene Compression using Transformer-based Context Modeling
With the development of gene sequencing technology, an explosive growth of
gene data has been witnessed. And the storage of gene data has become an
important issue. Traditional gene data compression methods rely on general
software like G-zip, which fails to utilize the interrelation of nucleotide
sequence. Recently, many researchers begin to investigate deep learning based
gene data compression method. In this paper, we propose a transformer-based
gene compression method named GeneFormer. Specifically, we first introduce a
modified transformer structure to fully explore the nucleotide sequence
dependency. Then, we propose fixed-length parallel grouping to accelerate the
decoding speed of our autoregressive model. Experimental results on real-world
datasets show that our method saves 29.7% bit rate compared with the
state-of-the-art method, and the decoding speed is significantly faster than
all existing learning-based gene compression methods
Noise Dimension of GAN: An Image Compression Perspective
Generative adversial network (GAN) is a type of generative model that maps a
high-dimensional noise to samples in target distribution. However, the
dimension of noise required in GAN is not well understood. Previous approaches
view GAN as a mapping from a continuous distribution to another continous
distribution. In this paper, we propose to view GAN as a discrete sampler
instead. From this perspective, we build a connection between the minimum noise
required and the bits to losslessly compress the images. Furthermore, to
understand the behaviour of GAN when noise dimension is limited, we propose
divergence-entropy trade-off. This trade-off depicts the best divergence we can
achieve when noise is limited. And as rate distortion trade-off, it can be
numerically solved when source distribution is known. Finally, we verifies our
theory with experiments on image generation.Comment: ICME2
ECM-OPCC: Efficient Context Model for Octree-based Point Cloud Compression
Recently, deep learning methods have shown promising results in point cloud
compression. For octree-based point cloud compression, previous works show that
the information of ancestor nodes and sibling nodes are equally important for
predicting current node. However, those works either adopt insufficient context
or bring intolerable decoding complexity (e.g. >600s). To address this problem,
we propose a sufficient yet efficient context model and design an efficient
deep learning codec for point clouds. Specifically, we first propose a
window-constrained multi-group coding strategy to exploit the autoregressive
context while maintaining decoding efficiency. Then, we propose a dual
transformer architecture to utilize the dependency of current node on its
ancestors and siblings. We also propose a random-masking pre-train method to
enhance our model. Experimental results show that our approach achieves
state-of-the-art performance for both lossy and lossless point cloud
compression. Moreover, our multi-group coding strategy saves 98% decoding time
compared with previous octree-based compression method
Flexible Neural Image Compression via Code Editing
Neural image compression (NIC) has outperformed traditional image codecs in
rate-distortion (R-D) performance. However, it usually requires a dedicated
encoder-decoder pair for each point on R-D curve, which greatly hinders its
practical deployment. While some recent works have enabled bitrate control via
conditional coding, they impose strong prior during training and provide
limited flexibility. In this paper we propose Code Editing, a highly flexible
coding method for NIC based on semi-amortized inference and adaptive
quantization. Our work is a new paradigm for variable bitrate NIC. Furthermore,
experimental results show that our method surpasses existing variable-rate
methods, and achieves ROI coding and multi-distortion trade-off with a single
decoder.Comment: NeurIPS 202
Tracking of Human Arm Based on MEMS Sensors
Abstract. This paper studied the method for motion tracking of arm using triaxial accelerometer, triaxial gyroscope and electronic compass. The motion model of arm is established. The hardware of tracking system of arm is designed. The track method of arm gesture based on multi-sensors data fusion is analyzed. The compensation algorithm for motion accelerations is researched. The experimental results demonstrate that the motion acceleration compensation algorithm is validity, which can improve the dynamic measure precision of arm gesture angle
Correcting the Sub-optimal Bit Allocation
In this paper, we investigate the problem of bit allocation in Neural Video
Compression (NVC). First, we reveal that a recent bit allocation approach
claimed to be optimal is, in fact, sub-optimal due to its implementation.
Specifically, we find that its sub-optimality lies in the improper application
of semi-amortized variational inference (SAVI) on latent with non-factorized
variational posterior. Then, we show that the corrected version of SAVI on
non-factorized latent requires recursively applying back-propagating through
gradient ascent, based on which we derive the corrected optimal bit allocation
algorithm. Due to the computational in-feasibility of the corrected bit
allocation, we design an efficient approximation to make it practical.
Empirical results show that our proposed correction significantly improves the
incorrect bit allocation in terms of R-D performance and bitrate error, and
outperforms all other bit allocation methods by a large margin. The source code
is provided in the supplementary material
Unified learning-based lossy and lossless JPEG recompression
JPEG is still the most widely used image compression algorithm. Most image
compression algorithms only consider uncompressed original image, while
ignoring a large number of already existing JPEG images. Recently, JPEG
recompression approaches have been proposed to further reduce the size of JPEG
files. However, those methods only consider JPEG lossless recompression, which
is just a special case of the rate-distortion theorem. In this paper, we
propose a unified lossly and lossless JPEG recompression framework, which
consists of learned quantization table and Markovian hierarchical variational
autoencoders. Experiments show that our method can achieve arbitrarily low
distortion when the bitrate is close to the upper bound, namely the bitrate of
the lossless compression model. To the best of our knowledge, this is the first
learned method that bridges the gap between lossy and lossless recompression of
JPEG images
EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection
In autonomous driving, cooperative perception makes use of multi-view cameras
from both vehicles and infrastructure, providing a global vantage point with
rich semantic context of road conditions beyond a single vehicle viewpoint.
Currently, two major challenges persist in vehicle-infrastructure cooperative
3D (VIC3D) object detection: inherent pose errors when fusing multi-view
images, caused by time asynchrony across cameras; information loss in
transmission process resulted from limited communication bandwidth. To address
these issues, we propose a novel camera-based 3D detection framework for VIC3D
task, Enhanced Multi-scale Image Feature Fusion (EMIFF). To fully exploit
holistic perspectives from both vehicles and infrastructure, we propose
Multi-scale Cross Attention (MCA) and Camera-aware Channel Masking (CCM)
modules to enhance infrastructure and vehicle features at scale, spatial, and
channel levels to correct the pose error introduced by camera asynchrony. We
also introduce a Feature Compression (FC) module with channel and spatial
compression blocks for transmission efficiency. Experiments show that EMIFF
achieves SOTA on DAIR-V2X-C datasets, significantly outperforming previous
early-fusion and late-fusion methods with comparable transmission costs.Comment: 7 pages, 8 figures. Accepted by ICRA 2024. arXiv admin note: text
overlap with arXiv:arXiv:2303.1097
Conditional Perceptual Quality Preserving Image Compression
We propose conditional perceptual quality, an extension of the perceptual
quality defined in \citet{blau2018perception}, by conditioning it on user
defined information. Specifically, we extend the original perceptual quality
to the conditional perceptual quality
, where is the original image, is the
reconstructed, is side information defined by user and is
divergence. We show that conditional perceptual quality has similar theoretical
properties as rate-distortion-perception trade-off \citep{blau2019rethinking}.
Based on these theoretical results, we propose an optimal framework for
conditional perceptual quality preserving compression. Experimental results
show that our codec successfully maintains high perceptual quality and semantic
quality at all bitrate. Besides, by providing a lowerbound of common randomness
required, we settle the previous arguments on whether randomness should be
incorporated into generator for (conditional) perceptual quality compression.
The source code is provided in supplementary material
- …