16 research outputs found
Revisiting Vision Transformer from the View of Path Ensemble
Vision Transformers (ViTs) are normally regarded as a stack of transformer
layers. In this work, we propose a novel view of ViTs showing that they can be
seen as ensemble networks containing multiple parallel paths with different
lengths. Specifically, we equivalently transform the traditional cascade of
multi-head self-attention (MSA) and feed-forward network (FFN) into three
parallel paths in each transformer layer. Then, we utilize the identity
connection in our new transformer form and further transform the ViT into an
explicit multi-path ensemble network. From the new perspective, these paths
perform two functions: the first is to provide the feature for the classifier
directly, and the second is to provide the lower-level feature representation
for subsequent longer paths. We investigate the influence of each path for the
final prediction and discover that some paths even pull down the performance.
Therefore, we propose the path pruning and EnsembleScale skills for
improvement, which cut out the underperforming paths and re-weight the ensemble
components, respectively, to optimize the path combination and make the short
paths focus on providing high-quality representation for subsequent paths. We
also demonstrate that our path combination strategies can help ViTs go deeper
and act as high-pass filters to filter out partial low-frequency signals. To
further enhance the representation of paths served for subsequent paths,
self-distillation is applied to transfer knowledge from the long paths to the
short paths. This work calls for more future research to explain and design
ViTs from new perspectives.Comment: Accepted by ICCV 2023, oral presentatio
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
Public large-scale text-to-image diffusion models, such as Stable Diffusion,
have gained significant attention from the community. These models can be
easily customized for new concepts using low-rank adaptations (LoRAs). However,
the utilization of multiple concept LoRAs to jointly support multiple
customized concepts presents a challenge. We refer to this scenario as
decentralized multi-concept customization, which involves single-client concept
tuning and center-node concept fusion. In this paper, we propose a new
framework called Mix-of-Show that addresses the challenges of decentralized
multi-concept customization, including concept conflicts resulting from
existing single-client LoRA tuning and identity loss during model fusion.
Mix-of-Show adopts an embedding-decomposed LoRA (ED-LoRA) for single-client
tuning and gradient fusion for the center node to preserve the in-domain
essence of single concepts and support theoretically limitless concept fusion.
Additionally, we introduce regionally controllable sampling, which extends
spatially controllable sampling (e.g., ControlNet and T2I-Adaptor) to address
attribute binding and missing object problems in multi-concept sampling.
Extensive experiments demonstrate that Mix-of-Show is capable of composing
multiple customized concepts with high fidelity, including characters, objects,
and scenes
Synergistic melanoma cell death mediated by inhibition of both MCL1 and BCL2 in high-risk tumors driven by NF1/PTEN loss
SRSF5‐Mediated Alternative Splicing of M Gene is Essential for Influenza A Virus Replication: A Host‐Directed Target Against Influenza Virus
Abstract: Splicing of influenza A virus (IAV) RNA is an essential process in the viral life cycle that involves the co‐opting of host factors. Here, it is demonstrated that induction of host serine and arginine‐rich splicing factor 5 (SRSF5) by IAV facilitated viral replication by enhancing viral M mRNA splicing. Mechanistically, SRSF5 with its RRM2 domain directly bounds M mRNA at conserved sites (M mRNA position 163, 709, and 712), and interacts with U1 small nuclear ribonucleoprotein (snRNP) to promote M mRNA splicing and M2 production. Mutations introduced to the three binding sites, without changing amino acid code, significantly attenuates virus replication and pathogenesis in vivo. Likewise, SRSF5 conditional knockout in the lung protects mice against lethal IAV challenge. Furthermore, anidulafungin, an approved antifungal drug, is identified as an inhibitor of SRSF5 that effectively blocks IAV replication in vitro and in vivo. In conclusion, SRSF5 as an activator of M mRNA splicing promotes IAV replication and is a host‐derived antiviral target
Nonlinear Deblurring for Low-Light Saturated Image
Single image deblurring has achieved significant progress for natural daytime images. Saturation is a common phenomenon in blurry images, due to the low light conditions and long exposure times. However, conventional linear deblurring methods usually deal with natural blurry images well but result in severe ringing artifacts when recovering low-light saturated blurry images. To solve this problem, we formulate the saturation deblurring problem as a nonlinear model, in which all the saturated and unsaturated pixels are modeled adaptively. Specifically, we additionally introduce a nonlinear function to the convolution operator to accommodate the procedure of the saturation in the presence of the blurring. The proposed method has two advantages over previous methods. On the one hand, the proposed method achieves the same high quality of restoring the natural image as seen in conventional deblurring methods, while also reducing the estimation errors in saturated areas and suppressing ringing artifacts. On the other hand, compared with the recent saturated-based deblurring methods, the proposed method captures the formation of unsaturated and saturated degradations straightforwardly rather than with cumbersome and error-prone detection steps. Note that, this nonlinear degradation model can be naturally formulated into a maximum-a posterioriframework, and can be efficiently decoupled into several solvable sub-problems via the alternating direction method of multipliers (ADMM). Experimental results on both synthetic and real-world images demonstrate that the proposed deblurring algorithm outperforms the state-of-the-art low-light saturation-based deblurring methods
KVT: k-NN Attention for Boosting Vision Transformers
Convolutional Neural Networks (CNNs) have dominated computer vision for
years, due to its ability in capturing locality and translation invariance.
Recently, many vision transformer architectures have been proposed and they
show promising performance. A key component in vision transformers is the
fully-connected self-attention which is more powerful than CNNs in modelling
long range dependencies. However, since the current dense self-attention uses
all image patches (tokens) to compute attention matrix, it may neglect locality
of images patches and involve noisy tokens (e.g., clutter background and
occlusion), leading to a slow training process and potential degradation of
performance. To address these problems, we propose the -NN attention for
boosting vision transformers. Specifically, instead of involving all the tokens
for attention matrix calculation, we only select the top- similar tokens
from the keys for each query to compute the attention map. The proposed -NN
attention naturally inherits the local bias of CNNs without introducing
convolutional operations, as nearby tokens tend to be more similar than others.
In addition, the -NN attention allows for the exploration of long range
correlation and at the same time filters out irrelevant tokens by choosing the
most similar tokens from the entire image. Despite its simplicity, we verify,
both theoretically and empirically, that -NN attention is powerful in
speeding up training and distilling noise from input tokens. Extensive
experiments are conducted by using 11 different vision transformer
architectures to verify that the proposed -NN attention can work with any
existing transformer architectures to improve its prediction performance. The
codes are available at \url{https://github.com/damo-cv/KVT}.Comment: Accepted by ECCV 202
A Simple and Unified Tagging Model with Priming for Relational Structure Predictions
Relational structure extraction covers a wide range of tasks and plays an
important role in natural language processing. Recently, many approaches tend
to design sophisticated graphical models to capture the complex relations
between objects that are described in a sentence. In this work, we demonstrate
that simple tagging models can surprisingly achieve competitive performances
with a small trick -- priming. Tagging models with priming append information
about the operated objects to the input sequence of pretrained language model.
Making use of the contextualized nature of pretrained language model, the
priming approach help the contextualized representation of the sentence better
embed the information about the operated objects, hence, becomes more suitable
for addressing relational structure extraction. We conduct extensive
experiments on three different tasks that span ten datasets across five
different languages, and show that our model is a general and effective model,
despite its simplicity. We further carry out comprehensive analysis to
understand our model and propose an efficient approximation to our method,
which can perform almost the same performance but with faster inference speed
Dual-mode imaging and therapeutic effects of drug-loaded phase-transition nanoparticles combined with near-infrared laser and low-intensity ultrasound on ovarian cancer
Chemotherapy and photo-sonodynamic therapy (PSDT) can be combined through drug delivery nano-platforms to enhance the anti-tumor efficacy, however, which is limited by hypoxia in tumor, thereby causing chemotherapy resistance. Perfluoropentane (PFP) has the ability to carry oxygen and to enhance ultrasound or photoacoustic imaging after vaporization. Herein, we constructed a kind of nanoparticles (PTX/ICG and oxygen loaded PLGA nanoparticles (PIO_NPs)), which had PFP core carrying oxygen and PLGA shell loaded indocyanine green (ICG) and paclitaxel (PTX). PIO_NPs harbored good optical stability and the ability to transit phase. Moreover, it could rapidly release PTX and generate ROS under the mediation by near-infrared laser and low-intensity ultrasound. The PIO_NPs enhanced contrast of the ultrasound and PA imaging. In particular, PIO_NPs may be used to monitor and guide treatment for the accumulation of PIO_NPs at tumor site can be observed by PA imaging. Compared with PTX or other nanoparticles, PIO_NPs combined with laser and ultrasound (L.U) significantly induced apoptosis of SKOV3 cells and inhibited SKOV3 tumor growth. Therefore, PIO_NPs are of great potential in cancer imaging and therapy
A DAAM1 3′-UTR SNP mutation regulates breast cancer metastasis through affecting miR-208a-5p-DAAM1-RhoA axis
Abstract Background Dishevelled-associated activator of morphogenesis 1 (DAAM1) is a member of microfilament-related formins and mediates cell motility in breast cancer (BrCa). However, the genetic mutation status of DAAM1 mRNA and its correlation with pathological characteristics are still unclearly. Methods A patient cohort and BrCa cells were recruited to demonstrate the role of functional SNP in microRNA-208a-5p binding site of DAAM1 3′-UTR and underlying mechanism in BrCa metastasis. Results The expression and activation of DAAM1 increased markedly in lymphnode metastatic tissues. A genetic variant (rs79036859 A/G) was validated in the miR-208a-5p binding site of DAAM1 3′-UTR. The G genotype (AG/GG) was a risk genotype for the metastasis of BrCa by reducing binding affinity of miR-208a-5p for the DAAM1 3′-UTR. Furthermore, the miR-208a-5p expression level was significantly suppressed in lymphnode metastatic tissues compared with that in non-lymphnode metastatic tissues. Overexpression of miR-208a-5p inhibited DAAM1/RhoA signaling pathway, thereby leading to the decrease of the migratory ability. Conclusion Overall, the rs79036859 G variant of DAAM1 3′-UTR was identified as a relevant role in BrCa metastasis via the diversity of miR-208a-5p binding affinity