203 research outputs found

    Exploiting Category Names for Few-Shot Classification with Vision-Language Models

    Full text link
    Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks. Notably, many vision-language models build two encoders (visual and textual) that can map two modalities into the same embedding space. As a result, the learned representations achieve good zero-shot performance on tasks like image classification. However, when there are only a few examples per category, the potential of large vision-language models is often underperformed, mainly due to the gap between a large number of parameters and a relatively small amount of training data. This paper shows that we can significantly improve the performance of few-shot classification by using the category names to initialize the classification head. With the proposed category name initialization method, our model obtains the state-of-the-art performance on a number of few-shot image classification benchmarks (e.g., 87.37% on ImageNet and 96.08% on Stanford Cars, both using five-shot learning)

    IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers

    Full text link
    Generative training has been demonstrated to be powerful for building visual-language models. However, on zero-shot discriminative benchmarks, there is still a performance gap between models trained with generative and discriminative objectives. In this paper, we aim to narrow this gap by improving the efficacy of generative training on classification tasks, without any finetuning processes or additional modules. Specifically, we focus on narrowing the gap between the generative captioner and the CLIP classifier. We begin by analysing the predictions made by the captioner and classifier and observe that the caption generation inherits the distribution bias from the language model trained with pure text modality, making it less grounded on the visual signal. To tackle this problem, we redesign the scoring objective for the captioner to alleviate the distributional bias and focus on measuring the gain of information brought by the visual inputs. We further design a generative training objective to match the evaluation objective. We name our model trained and evaluated from the novel procedures as Information Gain (IG) captioner. We pretrain the models on the public Laion-5B dataset and perform a series of discriminative evaluations. For the zero-shot classification on ImageNet, IG captioner achieves >18%> 18\% improvements over the standard captioner, achieving comparable performances with the CLIP classifier. IG captioner also demonstrated strong performance on zero-shot image-text retrieval tasks on MSCOCO and Flickr30K. We hope this paper inspires further research towards unifying generative and discriminative training procedures for visual-language models

    Human-System Integration

    Get PDF

    Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners

    Full text link
    This work explores an efficient approach to establish a foundational video-text model for tasks including open-vocabulary video classification, text-to-video retrieval, video captioning and video question-answering. We present VideoCoCa that reuses a pretrained image-text contrastive captioner (CoCa) model and adapt it to video-text tasks with minimal extra training. While previous works adapt image-text models with various cross-frame fusion modules (for example, cross-frame attention layer or perceiver resampler) and finetune the modified architecture on video-text data, we surprisingly find that the generative attentional pooling and contrastive attentional pooling layers in the image-text CoCa design are instantly adaptable to ``flattened frame embeddings'', yielding a strong zero-shot transfer baseline for many video-text tasks. Specifically, the frozen image encoder of a pretrained image-text CoCa takes each video frame as inputs and generates NN token embeddings per frame for totally TT video frames. We flatten N×TN \times T token embeddings as a long sequence of frozen video representation and apply CoCa's generative attentional pooling and contrastive attentional pooling on top. All model weights including pooling layers are directly loaded from an image-text CoCa pretrained model. Without any video or video-text data, VideoCoCa's zero-shot transfer baseline already achieves state-of-the-art results on zero-shot video classification on Kinetics 400/600/700, UCF101, HMDB51, and Charades, as well as zero-shot text-to-video retrieval on MSR-VTT and ActivityNet Captions. We also explore lightweight finetuning on top of VideoCoCa, and achieve strong results on video question-answering (iVQA, MSRVTT-QA, MSVD-QA) and video captioning (MSR-VTT, ActivityNet, Youcook2). Our approach establishes a simple and effective video-text baseline for future research.Comment: Technical repor

    Coronin 1B Controls Endothelial Actin Dynamics at Cell-Cell Junctions and Is Required for Endothelial Network Assembly

    Get PDF
    Development and homeostasis of blood vessels critically depend on the regulation of endothelial cell-cell junctions. VE-cadherin (VEcad)-based cell-cell junctions are connected to the actin cytoskeleton and regulated by actin-binding proteins. Coronin 1B (Coro1B) is an actin binding protein that controls actin networks at classical lamellipodia. The role of Coro1B in endothelial cells (ECs) is not fully understood and investigated in this study. Here, we demonstrate that Coro1B is a novel component and regulator of cell-cell junctions in ECs. Immunofluorescence studies show that Coro1B colocalizes with VEcad at cell-cell junctions in monolayers of ECs. Live-cell imaging reveals that Coro1B is recruited to, and operated at actin-driven membrane protrusions at cell-cell junctions. Coro1B is recruited to cell-cell junctions via a mechanism that requires the relaxation of the actomyosin cytoskeleton. By analyzing the Coro1B interactome, we identify integrin-linked kinase (ILK) as new Coro1B-associated protein. Coro1B colocalizes with α-parvin, an interactor of ILK, at the leading edge of lamellipodia protrusions. Functional experiments reveal that depletion of Coro1B causes defects in the actin cytoskeleton and cell-cell junctions. Finally, in matrigel tube network assays, depletion of Coro1B results in reduced network complexity, tube number and tube length. Together, our findings point toward a critical role for Coro1B in the dynamic remodeling of endothelial cell-cell junctions and the assembly of endothelial networks

    Enhancing heat stress tolerance in Lanzhou lily (Lilium davidii var. unicolor) with Trichokonins isolated from Trichoderma longibrachiatum SMF2

    Get PDF
    Lanzhou lily (Lilium davidii var. unicolor) is a renowned edible crop produced in China and relatively sensitive to high temperature (HT). Trichokonins (TKs) are antimicrobial peptaibols secreted from Trichoderma longibrachiatum strain SMF2. Here, we report that TKs application improves the thermotolerance of Lanzhou lily. The activity of the antioxidant enzyme system (SOD, CAT, and POD), the level of heat-resistance-associated phytohormones (ABA, SA, and JA), the relative water content (RWC), the content of chlorophyll (Chl), and the net photosynthetic rate (Pn) were promoted by TKs treatment in Lanzhou lily plants subjected to heat stress (HS). TKs treatment also mitigated cell injury as shown by a lower accumulation of malondialdehyde (MDA) and relative electrolyte leakage (REL) under HS conditions. RNA-seq data analysis showed that more than 4.5 times differentially expressed genes (DEGs) responded to TKs treatment under HS compared to non-HS, and TKs treatment reduced protein folding and enhanced cellular repair function under HS conditions. The analyses of DEGs involved in hormone (ABA, SA and JA) synthesis and signaling pathways suggested that TKs might improve Lanzhou lily heat tolerance by promoting ABA synthesis and signal transduction. TKs highly induced DEGs of the HSF-HSP pathway under HS, in which HSFA2 accounted for most of the HSF family. Furthermore, TKs treatment resulted in the upregulation of heat-protective genes LzDREB2B, LzHsfA2a, LzMBF1c, LzHsp90, and LzHsp70 involved in HSF-HSP signal pathway after long-term HS. LzHsfA2a-1 likely plays a key role in acquisition of TKs-induced thermotolerance of Lanzhou lily as evidenced by the sustained response to HS, the enhanced response to TKs treatment under long-term HS, and the high sequence similarity to LlHsfA2a which is a key regulator for the improvement of heat tolerance in Lilium longiflorum. Our results reveal the underlying mechanisms of TKs-mediated thermotolerance in Lanzhou lily and highlight an attractive approach to protecting crop plants from damage caused by HS in a global warming future

    Effects of habitat differences on the scatter-hoarding behaviour of rodents (Mammalia, Rodentia) in temperate forests

    Get PDF
    To discover the differences in hoarding strategies of rodents for different seeds in different habitats, we labelled and released three different types of seeds, including Pinus koraiensis, Corylus mandshurica, and Quercus mongolica, in temperate forests of northeastern China and investigated the fate of seeds in four different habitats that included a broad-leaved forest, mixed-forest edge, mixed forest, and artificial larch forest. Our research showed that the hoarding strategy of rodents was found to vary substantially in different habitats. The survival curves of seeds from different habitats showed the same trend, but the rates of consumption in different habitats varied. More than 50% of the seeds in the four habitats were consumed by the tenth day. It took 20 days to consume more than 70% of the seeds. The rate of consumption of P. koraiensis seeds reached 96.70%; 99.09% of the C. mandshurica seeds were consumed, and 93.07% of the Q. mongolica seeds were consumed. The seeds were consumed most quickly in the artificial larch forest. In general, most of the early seeds were quickly devoured. After day 20, the consumption gradually decreased. Rodents found the seeds in the artificial larch forest in a shorter average time than those in the other types of forests. The average earliest discovery time was 1.4 ± 0.9 d (1–3 d). The average earliest discovery time in all the other three habitats exceeded 7 d. The median removal times (MRT) was distributed around the seeds at 14.24 ± 10.53 d (1–60 d). There were significant differences in the MRT among different habitats. It was shortest in the artificial larch forest at 7.67 ± 6.80 d (1–28 d). In contrast, the MRT in the broad-leaved forest was the longest at 17.52 ± 12.91 d (4–60 d). There were significant differences in the MRT between the artificial larch forest and the other habitats. There was less predation of the three types of seeds at the mixed-forest edge, and the most seeds were dispersed. The rates of predation of the P. koraiensis, C. mandshurica, and Q. mongolica seeds were 28.33%, 15.83%, and 44.0%, and 59.17%, 84.17%, and 48.0% of the seeds were dispersed, respectively. The average dispersal distances of all the seeds were less than 6 m, and the longest distance recorded was 18.66 m. The dispersal distances and burial depths differed significantly among the four types of habitats. The distance of seed dispersal was primarily distributed in 1–6 m
    • …
    corecore