20 research outputs found
Iterative Data Refinement for Self-Supervised MR Image Reconstruction
Magnetic Resonance Imaging (MRI) has become an important technique in the
clinic for the visualization, detection, and diagnosis of various diseases.
However, one bottleneck limitation of MRI is the relatively slow data
acquisition process. Fast MRI based on k-space undersampling and high-quality
image reconstruction has been widely utilized, and many deep learning-based
methods have been developed in recent years. Although promising results have
been achieved, most existing methods require fully-sampled reference data for
training the deep learning models. Unfortunately, fully-sampled MRI data are
difficult if not impossible to obtain in real-world applications. To address
this issue, we propose a data refinement framework for self-supervised MR image
reconstruction. Specifically, we first analyze the reason of the performance
gap between self-supervised and supervised methods and identify that the bias
in the training datasets between the two is one major factor. Then, we design
an effective self-supervised training data refinement method to reduce this
data bias. With the data refinement, an enhanced self-supervised MR image
reconstruction framework is developed to prompt accurate MR imaging. We
evaluate our method on an in-vivo MRI dataset. Experimental results show that
without utilizing any fully sampled MRI data, our self-supervised framework
possesses strong capabilities in capturing image details and structures at high
acceleration factors.Comment: 5 pages, 2 figures, 1 tabl
Learning Sparse Neural Networks with Identity Layers
The sparsity of Deep Neural Networks is well investigated to maximize the
performance and reduce the size of overparameterized networks as possible.
Existing methods focus on pruning parameters in the training process by using
thresholds and metrics. Meanwhile, feature similarity between different layers
has not been discussed sufficiently before, which could be rigorously proved to
be highly correlated to the network sparsity in this paper. Inspired by
interlayer feature similarity in overparameterized models, we investigate the
intrinsic link between network sparsity and interlayer feature similarity.
Specifically, we prove that reducing interlayer feature similarity based on
Centered Kernel Alignment (CKA) improves the sparsity of the network by using
information bottleneck theory. Applying such theory, we propose a plug-and-play
CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR,
which utilizes CKA to reduce feature similarity between layers and increase
network sparsity. In other words, layers of our sparse network tend to have
their own identity compared to each other. Experimentally, we plug the proposed
CKA-SR into the training process of sparse network training methods and find
that CKA-SR consistently improves the performance of several State-Of-The-Art
sparse training methods, especially at extremely high sparsity. Code is
included in the supplementary materials
Knowledge Prompt-tuning for Sequential Recommendation
Pre-trained language models (PLMs) have demonstrated strong performance in
sequential recommendation (SR), which are utilized to extract general
knowledge. However, existing methods still lack domain knowledge and struggle
to capture users' fine-grained preferences. Meanwhile, many traditional SR
methods improve this issue by integrating side information while suffering from
information loss. To summarize, we believe that a good recommendation system
should utilize both general and domain knowledge simultaneously. Therefore, we
introduce an external knowledge base and propose Knowledge Prompt-tuning for
Sequential Recommendation (\textbf{KP4SR}). Specifically, we construct a set of
relationship templates and transform a structured knowledge graph (KG) into
knowledge prompts to solve the problem of the semantic gap. However, knowledge
prompts disrupt the original data structure and introduce a significant amount
of noise. We further construct a knowledge tree and propose a knowledge tree
mask, which restores the data structure in a mask matrix form, thus mitigating
the noise problem. We evaluate KP4SR on three real-world datasets, and
experimental results show that our approach outperforms state-of-the-art
methods on multiple evaluation metrics. Specifically, compared with PLM-based
methods, our method improves NDCG@5 and HR@5 by \textcolor{red}{40.65\%} and
\textcolor{red}{36.42\%} on the books dataset, \textcolor{red}{11.17\%} and
\textcolor{red}{11.47\%} on the music dataset, and \textcolor{red}{22.17\%} and
\textcolor{red}{19.14\%} on the movies dataset, respectively. Our code is
publicly available at the link:
\href{https://github.com/zhaijianyang/KP4SR}{\textcolor{blue}{https://github.com/zhaijianyang/KP4SR}.
Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning
Recent advances in robust semi-supervised learning (SSL) typically filter
out-of-distribution (OOD) information at the sample level. We argue that an
overlooked problem of robust SSL is its corrupted information on semantic
level, practically limiting the development of the field. In this paper, we
take an initial step to explore and propose a unified framework termed OOD
Semantic Pruning (OSP), which aims at pruning OOD semantics out from
in-distribution (ID) features. Specifically, (i) we propose an aliasing OOD
matching module to pair each ID sample with an OOD sample with semantic
overlap. (ii) We design a soft orthogonality regularization, which first
transforms each ID feature by suppressing its semantic component that is
collinear with paired OOD sample. It then forces the predictions before and
after soft orthogonality decomposition to be consistent. Being practically
simple, our method shows a strong performance in OOD detection and ID
classification on challenging benchmarks. In particular, OSP surpasses the
previous state-of-the-art by 13.7% on accuracy for ID classification and 5.9%
on AUROC for OOD detection on TinyImageNet dataset. The source codes are
publicly available at https://github.com/rain305f/OSP.Comment: Accpected by CVPR 202
DLIP: Distilling Language-Image Pre-training
Vision-Language Pre-training (VLP) shows remarkable progress with the
assistance of extremely heavy parameters, which challenges deployment in real
applications. Knowledge distillation is well recognized as the essential
procedure in model compression. However, existing knowledge distillation
techniques lack an in-depth investigation and analysis of VLP, and practical
guidelines for VLP-oriented distillation are still not yet explored. In this
paper, we present DLIP, a simple yet efficient Distilling Language-Image
Pre-training framework, through which we investigate how to distill a light VLP
model. Specifically, we dissect the model distillation from multiple
dimensions, such as the architecture characteristics of different modules and
the information transfer of different modalities. We conduct comprehensive
experiments and provide insights on distilling a light but performant VLP
model. Experimental results reveal that DLIP can achieve a state-of-the-art
accuracy/efficiency trade-off across diverse cross-modal tasks, e.g.,
image-text retrieval, image captioning and visual question answering. For
example, DLIP compresses BLIP by 1.9x, from 213M to 108M parameters, while
achieving comparable or better performance. Furthermore, DLIP succeeds in
retaining more than 95% of the performance with 22.4% parameters and 24.8%
FLOPs compared to the teacher model and accelerates inference speed by 2.7x
Data-Efficient Image Quality Assessment with Attention-Panel Decoder
Blind Image Quality Assessment (BIQA) is a fundamental task in computer
vision, which however remains unresolved due to the complex distortion
conditions and diversified image contents. To confront this challenge, we in
this paper propose a novel BIQA pipeline based on the Transformer architecture,
which achieves an efficient quality-aware feature representation with much
fewer data. More specifically, we consider the traditional fine-tuning in BIQA
as an interpretation of the pre-trained model. In this way, we further
introduce a Transformer decoder to refine the perceptual information of the CLS
token from different perspectives. This enables our model to establish the
quality-aware feature manifold efficiently while attaining a strong
generalization capability. Meanwhile, inspired by the subjective evaluation
behaviors of human, we introduce a novel attention panel mechanism, which
improves the model performance and reduces the prediction uncertainty
simultaneously. The proposed BIQA method maintains a lightweight design with
only one layer of the decoder, yet extensive experiments on eight standard BIQA
datasets (both synthetic and authentic) demonstrate its superior performance to
the state-of-the-art BIQA methods, i.e., achieving the SRCC values of 0.875
(vs. 0.859 in LIVEC) and 0.980 (vs. 0.969 in LIVE).Comment: Accepted by AAAI 202
Binarized Neural Architecture Search
Neural architecture search (NAS) can have a significant impact in computer
vision by automatically designing optimal neural network architectures for
various tasks. A variant, binarized neural architecture search (BNAS), with a
search space of binarized convolutions, can produce extremely compressed
models. Unfortunately, this area remains largely unexplored. BNAS is more
challenging than NAS due to the learning inefficiency caused by optimization
requirements and the huge architecture space. To address these issues, we
introduce channel sampling and operation space reduction into a differentiable
NAS to significantly reduce the cost of searching. This is accomplished through
a performance-based strategy used to abandon less potential operations. Two
optimization methods for binarized neural networks are used to validate the
effectiveness of our BNAS. Extensive experiments demonstrate that the proposed
BNAS achieves a performance comparable to NAS on both CIFAR and ImageNet
databases. An accuracy of vs. is achieved on the CIFAR-10
dataset, but with a significantly compressed model, and a faster search
than the state-of-the-art PC-DARTS
Dynamic Distribution Pruning for Efficient Network Architecture Search
Network architectures obtained by Neural Architecture Search (NAS) have shown
state-of-the-art performance in various computer vision tasks. Despite the
exciting progress, the computational complexity of the forward-backward
propagation and the search process makes it difficult to apply NAS in practice.
In particular, most previous methods require thousands of GPU days for the
search process to converge. In this paper, we propose a dynamic distribution
pruning method towards extremely efficient NAS, which samples architectures
from a joint categorical distribution. The search space is dynamically pruned
every a few epochs to update this distribution, and the optimal neural
architecture is obtained when there is only one structure remained. We conduct
experiments on two widely-used datasets in NAS. On CIFAR-10, the optimal
structure obtained by our method achieves the state-of-the-art \% test
error, while the search process is more than times faster (only
GPU hours on a Tesla V100) than the state-of-the-art NAS algorithms. On
ImageNet, our model achieves 75.2\% top-1 accuracy under the MobileNet
settings, with a time cost of only GPU days that is acceleration
over the fastest NAS algorithm. The code is available at \url{
https://github.com/tanglang96/DDPNAS
AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration
Diffusion models are emerging expressive generative models, in which a large
number of time steps (inference steps) are required for a single image
generation. To accelerate such tedious process, reducing steps uniformly is
considered as an undisputed principle of diffusion models. We consider that
such a uniform assumption is not the optimal solution in practice; i.e., we can
find different optimal time steps for different models. Therefore, we propose
to search the optimal time steps sequence and compressed model architecture in
a unified framework to achieve effective image generation for diffusion models
without any further training. Specifically, we first design a unified search
space that consists of all possible time steps and various architectures. Then,
a two stage evolutionary algorithm is introduced to find the optimal solution
in the designed search space. To further accelerate the search process, we
employ FID score between generated and real samples to estimate the performance
of the sampled examples. As a result, the proposed method is (i).training-free,
obtaining the optimal time steps and model architecture without any training
process; (ii). orthogonal to most advanced diffusion samplers and can be
integrated to gain better sample quality. (iii). generalized, where the
searched time steps and architectures can be directly applied on different
diffusion models with the same guidance scale. Experimental results show that
our method achieves excellent performance by using only a few time steps, e.g.
17.86 FID score on ImageNet 64 64 with only four steps, compared to
138.66 with DDIM. The code is available at
https://github.com/lilijiangg/AutoDiffusion