67 research outputs found
Direct Acyclic Graph based Ledger for Internet of Things: Performance and Security Analysis
Direct Acyclic Graph (DAG)-based ledger and the corresponding consensus
algorithm has been identified as a promising technology for Internet of Things
(IoT). Compared with Proof-of-Work (PoW) and Proof-of-Stake (PoS) that have
been widely used in blockchain, the consensus mechanism designed on DAG
structure (simply called as DAG consensus) can overcome some shortcomings such
as high resource consumption, high transaction fee, low transaction throughput
and long confirmation delay. However, the theoretic analysis on the DAG
consensus is an untapped venue to be explored. To this end, based on one of the
most typical DAG consensuses, Tangle, we investigate the impact of network load
on the performance and security of the DAG-based ledger. Considering unsteady
network load, we first propose a Markov chain model to capture the behavior of
DAG consensus process under dynamic load conditions. The key performance
metrics, i.e., cumulative weight and confirmation delay are analysed based on
the proposed model. Then, we leverage a stochastic model to analyse the
probability of a successful double-spending attack in different network load
regimes. The results can provide an insightful understanding of DAG consensus
process, e.g., how the network load affects the confirmation delay and the
probability of a successful attack. Meanwhile, we also demonstrate the
trade-off between security level and confirmation delay, which can act as a
guidance for practical deployment of DAG-based ledgers.Comment: accepted by IEEE Transactions on Networkin
Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning
Existing fine-tuning methods either tune all parameters of the pre-trained
model (full fine-tuning), which is not efficient, or only tune the last linear
layer (linear probing), which suffers a significant accuracy drop compared to
the full fine-tuning. In this paper, we propose a new parameter-efficient
fine-tuning method termed as SSF, representing that researchers only need to
Scale and Shift the deep Features extracted by a pre-trained model to catch up
with the performance of full fine-tuning. In this way, SSF also surprisingly
outperforms other parameter-efficient fine-tuning approaches even with a
smaller number of tunable parameters. Furthermore, different from some existing
parameter-efficient fine-tuning methods (e.g., Adapter or VPT) that introduce
the extra parameters and computational cost in the training and inference
stages, SSF only adds learnable parameters during the training stage, and these
additional parameters can be merged into the original pre-trained model weights
via re-parameterization in the inference phase. With the proposed SSF, our
model obtains 2.46% (90.72% vs. 88.54%) and 11.48% (73.10% vs. 65.57%)
performance improvement on FGVC and VTAB-1k in terms of Top-1 accuracy compared
to the full fine-tuning but only fine-tuning about 0.3M parameters. We also
conduct amounts of experiments in various model families (CNNs, Transformers,
and MLPs) and datasets. Results on 26 image classification datasets in total
and 3 robustness & out-of-distribution datasets show the effectiveness of SSF.
Code is available at https://github.com/dongzelian/SSF.Comment: Accepted by NeurIPS202
Shunted Self-Attention via Multi-Scale Token Aggregation
Recent Vision Transformer~(ViT) models have demonstrated encouraging results
across various computer vision tasks, thanks to their competence in modeling
long-range dependencies of image patches or tokens via self-attention. These
models, however, usually designate the similar receptive fields of each token
feature within each layer. Such a constraint inevitably limits the ability of
each self-attention layer in capturing multi-scale features, thereby leading to
performance degradation in handling images with multiple objects of different
scales. To address this issue, we propose a novel and generic strategy, termed
shunted self-attention~(SSA), that allows ViTs to model the attentions at
hybrid scales per attention layer. The key idea of SSA is to inject
heterogeneous receptive field sizes into tokens: before computing the
self-attention matrix, it selectively merges tokens to represent larger object
features while keeping certain tokens to preserve fine-grained features. This
novel merging scheme enables the self-attention to learn relationships between
objects with different sizes and simultaneously reduces the token numbers and
the computational cost. Extensive experiments across various tasks demonstrate
the superiority of SSA. Specifically, the SSA-based transformer achieves 84.0\%
Top-1 accuracy and outperforms the state-of-the-art Focal Transformer on
ImageNet with only half of the model size and computation cost, and surpasses
Focal Transformer by 1.3 mAP on COCO and 2.9 mIOU on ADE20K under similar
parameter and computation cost. Code has been released at
https://github.com/OliverRensu/Shunted-Transformer
xURLLC-Aware Service Provisioning in Vehicular Networks: A Semantic Communication Perspective
Semantic communication (SemCom), as an emerging paradigm focusing on meaning
delivery, has recently been considered a promising solution for the inevitable
crisis of scarce communication resources. This trend stimulates us to explore
the potential of applying SemCom to wireless vehicular networks, which normally
consume a tremendous amount of resources to meet stringent reliability and
latency requirements. Unfortunately, the unique background knowledge matching
mechanism in SemCom makes it challenging to simultaneously realize efficient
service provisioning for multiple users in vehicle-to-vehicle networks. To this
end, this paper identifies and jointly addresses two fundamental problems of
knowledge base construction (KBC) and vehicle service pairing (VSP) inherently
existing in SemCom-enabled vehicular networks in alignment with the
next-generation ultra-reliable and low-latency communication (xURLLC)
requirements. Concretely, we first derive the knowledge matching based queuing
latency specific for semantic data packets, and then formulate a
latency-minimization problem subject to several KBC and VSP related reliability
constraints. Afterward, a SemCom-empowered Service Supplying Solution
(S) is proposed along with the theoretical analysis of its
optimality guarantee and computational complexity. Numerical results
demonstrate the superiority of S in terms of average queuing
latency, semantic data packet throughput, user knowledge matching degree and
knowledge preference satisfaction compared with two benchmarks.Comment: This paper has been submitted to IEEE Transactions on Wireless
Communications for the second round of peer review after a major revisio
Expanding Small-Scale Datasets with Guided Imagination
The power of DNNs relies heavily on the quantity and quality of training
data. However, collecting and annotating data on a large scale is often
expensive and time-consuming. To address this issue, we explore a new task,
termed dataset expansion, aimed at expanding a ready-to-use small dataset by
automatically creating new labeled samples. To this end, we present a Guided
Imagination Framework (GIF) that leverages cutting-edge generative models like
DALL-E2 and Stable Diffusion (SD) to "imagine" and create informative new data
from the input seed data. Specifically, GIF conducts data imagination by
optimizing the latent features of the seed data in the semantically meaningful
space of the prior model, resulting in the creation of photo-realistic images
with new content. To guide the imagination towards creating informative samples
for model training, we introduce two key criteria, i.e., class-maintained
information boosting and sample diversity promotion. These criteria are
verified to be essential for effective dataset expansion: GIF-SD obtains 13.5%
higher model accuracy on natural image datasets than unguided expansion with
SD. With these essential criteria, GIF successfully expands small datasets in
various scenarios, boosting model accuracy by 36.9% on average over six natural
image datasets and by 13.5% on average over three medical datasets. The source
code is available at https://github.com/Vanint/DatasetExpansion.Comment: NeurIPS 2023. Source code: https://github.com/Vanint/DatasetExpansio
MagicVideo: Efficient Video Generation With Latent Diffusion Models
We present an efficient text-to-video generation framework based on latent
diffusion models, termed MagicVideo. Given a text description, MagicVideo can
generate photo-realistic video clips with high relevance to the text content.
With the proposed efficient latent 3D U-Net design, MagicVideo can generate
video clips with 256x256 spatial resolution on a single GPU card, which is 64x
faster than the recent video diffusion model (VDM). Unlike previous works that
train video generation from scratch in the RGB space, we propose to generate
video clips in a low-dimensional latent space. We further utilize all the
convolution operator weights of pre-trained text-to-image generative U-Net
models for faster training. To achieve this, we introduce two new designs to
adapt the U-Net decoder to video data: a framewise lightweight adaptor for the
image-to-video distribution adjustment and a directed temporal attention module
to capture frame temporal dependencies. The whole generation process is within
the low-dimension latent space of a pre-trained variation auto-encoder. We
demonstrate that MagicVideo can generate both realistic video content and
imaginary content in a photo-realistic style with a trade-off in terms of
quality and computational cost. Refer to https://magicvideo.github.io/# for
more examples
- …