194 research outputs found

    Prismer: A Vision-Language Model with An Ensemble of Experts

    Full text link
    Recent vision-language models have shown impressive multi-modal generation capabilities. However, typically they require training huge models on massive datasets. As a more scalable alternative, we introduce Prismer, a data- and parameter-efficient vision-language model that leverages an ensemble of domain experts. Prismer only requires training of a small number of components, with the majority of network weights inherited from readily-available, pre-trained domain experts, and kept frozen during training. By leveraging experts from a wide range of domains, we show that Prismer can efficiently pool this expert knowledge and adapt it to various vision-language reasoning tasks. In our experiments, we show that Prismer achieves fine-tuned and few-shot learning performance which is competitive with current state-of-the-art models, whilst requiring up to two orders of magnitude less training data. Code is available at https://github.com/NVlabs/prismer.Comment: Tech Report. Project Page: https://shikun.io/projects/prismer Code: https://github.com/NVlabs/prismer v2: fixed incorrect training cost estimate and zero-shot NoCaps performance of SimVL

    Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

    Full text link
    Evaluating the robustness of a defense model is a challenging task in adversarial robustness research. Obfuscated gradients, a type of gradient masking, have previously been found to exist in many defense methods and cause a false signal of robustness. In this paper, we identify a more subtle situation called Imbalanced Gradients that can also cause overestimated adversarial robustness. The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards to a suboptimal direction. To exploit imbalanced gradients, we formulate a Margin Decomposition (MD) attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately via a two-stage process. We also propose a MultiTargeted and an ensemble version of our MD attack. By investigating 17 defense models proposed since 2018, we find that 6 models are susceptible to imbalanced gradients and our MD attack can decrease their robustness evaluated by the best baseline standalone attack by another 2%. We also provide an in-depth analysis of the likely causes of imbalanced gradients and effective countermeasures.Comment: 19 pages, 7 figue

    Demonstration of MaskSearch: Efficiently Querying Image Masks for Machine Learning Workflows

    Full text link
    We demonstrate MaskSearch, a system designed to accelerate queries over databases of image masks generated by machine learning models. MaskSearch formalizes and accelerates a new category of queries for retrieving images and their corresponding masks based on mask properties, which support various applications, from identifying spurious correlations learned by models to exploring discrepancies between model saliency and human attention. This demonstration makes the following contributions:(1) the introduction of MaskSearch's graphical user interface (GUI), which enables interactive exploration of image databases through mask properties, (2) hands-on opportunities for users to explore MaskSearch's capabilities and constraints within machine learning workflows, and (3) an opportunity for conference attendees to understand how MaskSearch accelerates queries over image masks

    Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

    Full text link
    Augmenting pretrained language models (LMs) with a vision encoder (e.g., Flamingo) has obtained the state-of-the-art results in image-to-text generation. However, these models store all the knowledge within their parameters, thus often requiring enormous model parameters to model the abundant visual concepts and very rich textual descriptions. Additionally, they are inefficient in incorporating new data, requiring a computational-expensive fine-tuning process. In this work, we introduce a Retrieval-augmented Visual Language Model, Re-ViLM, built upon the Flamingo, that supports retrieving the relevant knowledge from the external database for zero and in-context few-shot image-to-text generations. By storing certain knowledge explicitly in the external database, our approach reduces the number of model parameters and can easily accommodate new data during evaluation by simply updating the database. We also construct an interleaved image and text data that facilitates in-context few-shot learning capabilities. We demonstrate that Re-ViLM significantly boosts performance for image-to-text generation tasks, especially for zero-shot and few-shot generation in out-of-domain settings with 4 times less parameters compared with baseline methods.Comment: Findings of EMNLP 202

    A novel high-DPI and monodisperse droplet inkjet printhead with the piezoelectric cutter

    Get PDF
    High dots per inch (DPI) is the core index of inkjet printer, which is hindered by satellite ink droplet. Herein, we propose a novel high-DPI and monodisperse droplet inkjet printhead with the piezoelectric cutter. The as-established model has optimized the inkjet printhead structural parameters, actuating and cutting signal waveforms. The cutter element achieves moving the break-up point to the middle of the ink column, reducing the length of tail and generating a monodisperse droplet. Additionally, the cutter consistently reduces the droplet length with different ink properties including viscosity, density, surface tension, and contact angle, exhibiting high applicability. The research results provide an in-depth study on the design of high-DPI and monodisperse inkjet printheads, offering an efficient approach to improve inkjet printhead performance

    A novel high-DPI and monodisperse droplet inkjet printhead with the piezoelectric cutter

    Get PDF
    High dots per inch (DPI) is the core index of inkjet printer, which is hindered by satellite ink droplet. Herein, we propose a novel high-DPI and monodisperse droplet inkjet printhead with the piezoelectric cutter. The as-established model has optimized the inkjet printhead structural parameters, actuating and cutting signal waveforms. The cutter element achieves moving the break-up point to the middle of the ink column, reducing the length of tail and generating a monodisperse droplet. Additionally, the cutter consistently reduces the droplet length with different ink properties including viscosity, density, surface tension, and contact angle, exhibiting high applicability. The research results provide an in-depth study on the design of high-DPI and monodisperse inkjet printheads, offering an efficient approach to improve inkjet printhead performance

    SRT1720 Alleviates ANIT-Induced Cholestasis in a Mouse Model

    Get PDF
    Intrahepatic cholestasis is a kind of clinical syndrome along with hepatotoxicity which caused by intrahepatic and systemic accumulations of bile acid. There are several crucial generating factors of the pathogenesis of cholestasis, such as inflammation, dysregulation of bile acid transporters and oxidative stress. SIRT1 is regarded as a class III histone deacetylase (HDAC). According to a set of researches, SIRT1 is one of the most important factors which can regulate the hepatic bile acid metabolism. SRT1720 is a kind of activator of SIRT1 which is 1000 times more potent than resveratrol, and this paper is aimed to study its protective influence on hepatotoxicity and cholestasis induced by alpha-naphthylisothiocyanate (ANIT) in mice. The findings revealed that SRT1720 treatment increased FXR and Nrf2 gene expressions to shield against hepatotoxicity and cholestasis induced by ANIT. The mRNA levels of hepatic bile acid transporters were also altered by SRT1720. Furthermore, SRT1720 enhanced the antioxidative system by increasing Nrf2, SOD, GCLc, GCLm, Nqo1, and HO-1 gene expressions. In conclusion, a protective influence could be provided by SRT1720 to cure ANIT-induced hepatotoxicity and cholestasis, which was partly through FXR and Nrf2 activations. These results indicated that SIRT1 could be regarded as a therapeutic target to cure the cholestasis
    corecore