203 research outputs found
Exploiting Category Names for Few-Shot Classification with Vision-Language Models
Vision-language foundation models pretrained on large-scale data provide a
powerful tool for many visual understanding tasks. Notably, many
vision-language models build two encoders (visual and textual) that can map two
modalities into the same embedding space. As a result, the learned
representations achieve good zero-shot performance on tasks like image
classification. However, when there are only a few examples per category, the
potential of large vision-language models is often underperformed, mainly due
to the gap between a large number of parameters and a relatively small amount
of training data. This paper shows that we can significantly improve the
performance of few-shot classification by using the category names to
initialize the classification head. With the proposed category name
initialization method, our model obtains the state-of-the-art performance on a
number of few-shot image classification benchmarks (e.g., 87.37% on ImageNet
and 96.08% on Stanford Cars, both using five-shot learning)
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
Generative training has been demonstrated to be powerful for building
visual-language models. However, on zero-shot discriminative benchmarks, there
is still a performance gap between models trained with generative and
discriminative objectives. In this paper, we aim to narrow this gap by
improving the efficacy of generative training on classification tasks, without
any finetuning processes or additional modules.
Specifically, we focus on narrowing the gap between the generative captioner
and the CLIP classifier. We begin by analysing the predictions made by the
captioner and classifier and observe that the caption generation inherits the
distribution bias from the language model trained with pure text modality,
making it less grounded on the visual signal. To tackle this problem, we
redesign the scoring objective for the captioner to alleviate the
distributional bias and focus on measuring the gain of information brought by
the visual inputs. We further design a generative training objective to match
the evaluation objective. We name our model trained and evaluated from the
novel procedures as Information Gain (IG) captioner. We pretrain the models on
the public Laion-5B dataset and perform a series of discriminative evaluations.
For the zero-shot classification on ImageNet, IG captioner achieves
improvements over the standard captioner, achieving comparable performances
with the CLIP classifier. IG captioner also demonstrated strong performance on
zero-shot image-text retrieval tasks on MSCOCO and Flickr30K. We hope this
paper inspires further research towards unifying generative and discriminative
training procedures for visual-language models
Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
This work explores an efficient approach to establish a foundational
video-text model for tasks including open-vocabulary video classification,
text-to-video retrieval, video captioning and video question-answering. We
present VideoCoCa that reuses a pretrained image-text contrastive captioner
(CoCa) model and adapt it to video-text tasks with minimal extra training.
While previous works adapt image-text models with various cross-frame fusion
modules (for example, cross-frame attention layer or perceiver resampler) and
finetune the modified architecture on video-text data, we surprisingly find
that the generative attentional pooling and contrastive attentional pooling
layers in the image-text CoCa design are instantly adaptable to ``flattened
frame embeddings'', yielding a strong zero-shot transfer baseline for many
video-text tasks. Specifically, the frozen image encoder of a pretrained
image-text CoCa takes each video frame as inputs and generates token
embeddings per frame for totally video frames. We flatten
token embeddings as a long sequence of frozen video representation and apply
CoCa's generative attentional pooling and contrastive attentional pooling on
top. All model weights including pooling layers are directly loaded from an
image-text CoCa pretrained model. Without any video or video-text data,
VideoCoCa's zero-shot transfer baseline already achieves state-of-the-art
results on zero-shot video classification on Kinetics 400/600/700, UCF101,
HMDB51, and Charades, as well as zero-shot text-to-video retrieval on MSR-VTT
and ActivityNet Captions. We also explore lightweight finetuning on top of
VideoCoCa, and achieve strong results on video question-answering (iVQA,
MSRVTT-QA, MSVD-QA) and video captioning (MSR-VTT, ActivityNet, Youcook2). Our
approach establishes a simple and effective video-text baseline for future
research.Comment: Technical repor
Coronin 1B Controls Endothelial Actin Dynamics at Cell-Cell Junctions and Is Required for Endothelial Network Assembly
Development and homeostasis of blood vessels critically depend on the regulation of endothelial cell-cell junctions. VE-cadherin (VEcad)-based cell-cell junctions are connected to the actin cytoskeleton and regulated by actin-binding proteins. Coronin 1B (Coro1B) is an actin binding protein that controls actin networks at classical lamellipodia. The role of Coro1B in endothelial cells (ECs) is not fully understood and investigated in this study. Here, we demonstrate that Coro1B is a novel component and regulator of cell-cell junctions in ECs. Immunofluorescence studies show that Coro1B colocalizes with VEcad at cell-cell junctions in monolayers of ECs. Live-cell imaging reveals that Coro1B is recruited to, and operated at actin-driven membrane protrusions at cell-cell junctions. Coro1B is recruited to cell-cell junctions via a mechanism that requires the relaxation of the actomyosin cytoskeleton. By analyzing the Coro1B interactome, we identify integrin-linked kinase (ILK) as new Coro1B-associated protein. Coro1B colocalizes with α-parvin, an interactor of ILK, at the leading edge of lamellipodia protrusions. Functional experiments reveal that depletion of Coro1B causes defects in the actin cytoskeleton and cell-cell junctions. Finally, in matrigel tube network assays, depletion of Coro1B results in reduced network complexity, tube number and tube length. Together, our findings point toward a critical role for Coro1B in the dynamic remodeling of endothelial cell-cell junctions and the assembly of endothelial networks
Enhancing heat stress tolerance in Lanzhou lily (Lilium davidii var. unicolor) with Trichokonins isolated from Trichoderma longibrachiatum SMF2
Lanzhou lily (Lilium davidii var. unicolor) is a renowned edible crop produced in China and relatively sensitive to high temperature (HT). Trichokonins (TKs) are antimicrobial peptaibols secreted from Trichoderma longibrachiatum strain SMF2. Here, we report that TKs application improves the thermotolerance of Lanzhou lily. The activity of the antioxidant enzyme system (SOD, CAT, and POD), the level of heat-resistance-associated phytohormones (ABA, SA, and JA), the relative water content (RWC), the content of chlorophyll (Chl), and the net photosynthetic rate (Pn) were promoted by TKs treatment in Lanzhou lily plants subjected to heat stress (HS). TKs treatment also mitigated cell injury as shown by a lower accumulation of malondialdehyde (MDA) and relative electrolyte leakage (REL) under HS conditions. RNA-seq data analysis showed that more than 4.5 times differentially expressed genes (DEGs) responded to TKs treatment under HS compared to non-HS, and TKs treatment reduced protein folding and enhanced cellular repair function under HS conditions. The analyses of DEGs involved in hormone (ABA, SA and JA) synthesis and signaling pathways suggested that TKs might improve Lanzhou lily heat tolerance by promoting ABA synthesis and signal transduction. TKs highly induced DEGs of the HSF-HSP pathway under HS, in which HSFA2 accounted for most of the HSF family. Furthermore, TKs treatment resulted in the upregulation of heat-protective genes LzDREB2B, LzHsfA2a, LzMBF1c, LzHsp90, and LzHsp70 involved in HSF-HSP signal pathway after long-term HS. LzHsfA2a-1 likely plays a key role in acquisition of TKs-induced thermotolerance of Lanzhou lily as evidenced by the sustained response to HS, the enhanced response to TKs treatment under long-term HS, and the high sequence similarity to LlHsfA2a which is a key regulator for the improvement of heat tolerance in Lilium longiflorum. Our results reveal the underlying mechanisms of TKs-mediated thermotolerance in Lanzhou lily and highlight an attractive approach to protecting crop plants from damage caused by HS in a global warming future
Effects of habitat differences on the scatter-hoarding behaviour of rodents (Mammalia, Rodentia) in temperate forests
To discover the differences in hoarding strategies of rodents for different seeds in different habitats, we labelled and released three different types of seeds, including Pinus koraiensis, Corylus mandshurica, and Quercus mongolica, in temperate forests of northeastern China and investigated the fate of seeds in four different habitats that included a broad-leaved forest, mixed-forest edge, mixed forest, and artificial larch forest. Our research showed that the hoarding strategy of rodents was found to vary substantially in different habitats. The survival curves of seeds from different habitats showed the same trend, but the rates of consumption in different habitats varied. More than 50% of the seeds in the four habitats were consumed by the tenth day. It took 20 days to consume more than 70% of the seeds. The rate of consumption of P. koraiensis seeds reached 96.70%; 99.09% of the C. mandshurica seeds were consumed, and 93.07% of the Q. mongolica seeds were consumed. The seeds were consumed most quickly in the artificial larch forest. In general, most of the early seeds were quickly devoured. After day 20, the consumption gradually decreased. Rodents found the seeds in the artificial larch forest in a shorter average time than those in the other types of forests. The average earliest discovery time was 1.4 ± 0.9 d (1–3 d). The average earliest discovery time in all the other three habitats exceeded 7 d. The median removal times (MRT) was distributed around the seeds at 14.24 ± 10.53 d (1–60 d). There were significant differences in the MRT among different habitats. It was shortest in the artificial larch forest at 7.67 ± 6.80 d (1–28 d). In contrast, the MRT in the broad-leaved forest was the longest at 17.52 ± 12.91 d (4–60 d). There were significant differences in the MRT between the artificial larch forest and the other habitats. There was less predation of the three types of seeds at the mixed-forest edge, and the most seeds were dispersed. The rates of predation of the P. koraiensis, C. mandshurica, and Q. mongolica seeds were 28.33%, 15.83%, and 44.0%, and 59.17%, 84.17%, and 48.0% of the seeds were dispersed, respectively. The average dispersal distances of all the seeds were less than 6 m, and the longest distance recorded was 18.66 m. The dispersal distances and burial depths differed significantly among the four types of habitats. The distance of seed dispersal was primarily distributed in 1–6 m
- …