248 research outputs found

    CLIPAG: Towards Generator-Free Text-to-Image Generation

    Full text link
    Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human perception and pose semantic meanings. While this phenomenon has gained significant research attention, it was solely studied in the context of unimodal vision-only architectures. In this work, we extend the study of PAG to Vision-Language architectures, which form the foundations for diverse image-text tasks and applications. Through an adversarial robustification finetuning of CLIP, we demonstrate that robust Vision-Language models exhibit PAG in contrast to their vanilla counterparts. This work reveals the merits of CLIP with PAG (CLIPAG) in several vision-language generative tasks. Notably, we show that seamlessly integrating CLIPAG in a "plug-n-play" manner leads to substantial improvements in vision-language generative applications. Furthermore, leveraging its PAG property, CLIPAG enables text-to-image generation without any generative model, which typically requires huge generators

    A Tight Competitive Ratio for Online Submodular Welfare Maximization

    Get PDF
    In this paper we consider the online Submodular Welfare (SW) problem. In this problem we are given n bidders each equipped with a general non-negative (not necessarily monotone) submodular utility and m items that arrive online. The goal is to assign each item, once it arrives, to a bidder or discard it, while maximizing the sum of utilities. When an adversary determines the items\u27 arrival order we present a simple randomized algorithm that achieves a tight competitive ratio of 1/4. The algorithm is a specialization of an algorithm due to [Harshaw-Kazemi-Feldman-Karbasi MOR`22], who presented the previously best known competitive ratio of 3-2?2? 0.171573 to the problem. When the items\u27 arrival order is uniformly random, we present a competitive ratio of ? 0.27493, improving the previously known 1/4 guarantee. Our approach for the latter result is based on a better analysis of the (offline) Residual Random Greedy (RRG) algorithm of [Buchbinder-Feldman-Naor-Schwartz SODA`14], which we believe might be of independent interest

    A Tight Competitive Ratio for Online Submodular Welfare Maximization

    Full text link
    In this paper we consider the online Submodular Welfare (SW) problem. In this problem we are given nn bidders each equipped with a general (not necessarily monotone) submodular utility and mm items that arrive online. The goal is to assign each item, once it arrives, to a bidder or discard it, while maximizing the sum of utilities. When an adversary determines the items' arrival order we present a simple randomized algorithm that achieves a tight competitive ratio of \nicefrac{1}{4}. The algorithm is a specialization of an algorithm due to [Harshaw-Kazemi-Feldman-Karbasi MOR`22], who presented the previously best known competitive ratio of 3βˆ’22β‰ˆ0.1715733-2\sqrt{2}\approx 0.171573 to the problem. When the items' arrival order is uniformly random, we present a competitive ratio of β‰ˆ0.27493\approx 0.27493, improving the previously known \nicefrac{1}{4} guarantee. Our approach for the latter result is based on a better analysis of the (offline) Residual Random Greedy (RRG) algorithm of [Buchbinder-Feldman-Naor-Schwartz SODA`14], which we believe might be of independent interest

    FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

    Full text link
    Image captioning is a central task in computer vision which has experienced substantial progress following the advent of vision-language pre-training techniques. In this paper, we highlight a frequently overlooked limitation of captioning models that often fail to capture semantically significant elements. This drawback can be traced back to the text-image datasets; while their captions typically offer a general depiction of image content, they frequently omit salient details. To mitigate this limitation, we propose FuseCap - a novel method for enriching captions with additional visual information, obtained from vision experts, such as object detectors, attribute recognizers, and Optical Character Recognizers (OCR). Our approach fuses the outputs of such vision experts with the original caption using a large language model (LLM), yielding enriched captions that present a comprehensive image description. We validate the effectiveness of the proposed caption enrichment method through both quantitative and qualitative analysis. Our method is then used to curate the training set of a captioning model based BLIP which surpasses current state-of-the-art approaches in generating accurate and detailed captions while using significantly fewer parameters and training data. As additional contributions, we provide a dataset comprising of 12M image-enriched caption pairs and show that the proposed method largely improves image-text retrieval

    Classifier Robustness Enhancement Via Test-Time Transformation

    Full text link
    It has been recently discovered that adversarially trained classifiers exhibit an intriguing property, referred to as perceptually aligned gradients (PAG). PAG implies that the gradients of such classifiers possess a meaningful structure, aligned with human perception. Adversarial training is currently the best-known way to achieve classification robustness under adversarial attacks. The PAG property, however, has yet to be leveraged for further improving classifier robustness. In this work, we introduce Classifier Robustness Enhancement Via Test-Time Transformation (TETRA) -- a novel defense method that utilizes PAG, enhancing the performance of trained robust classifiers. Our method operates in two phases. First, it modifies the input image via a designated targeted adversarial attack into each of the dataset's classes. Then, it classifies the input image based on the distance to each of the modified instances, with the assumption that the shortest distance relates to the true class. We show that the proposed method achieves state-of-the-art results and validate our claim through extensive experiments on a variety of defense methods, classifier architectures, and datasets. We also empirically demonstrate that TETRA can boost the accuracy of any differentiable adversarial training classifier across a variety of attacks, including ones unseen at training. Specifically, applying TETRA leads to substantial improvement of up to +23%+23\%, +20%+20\%, and +26%+26\% on CIFAR10, CIFAR100, and ImageNet, respectively

    CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

    Full text link
    Reading text in real-world scenarios often requires understanding the context surrounding it, especially when dealing with poor-quality text. However, current scene text recognizers are unaware of the bigger picture as they operate on cropped text images. In this study, we harness the representative capabilities of modern vision-language models, such as CLIP, to provide scene-level information to the crop-based recognizer. We achieve this by fusing a rich representation of the entire image, obtained from the vision-language model, with the recognizer word-level features via a gated cross-attention mechanism. This component gradually shifts to the context-enhanced representation, allowing for stable fine-tuning of a pretrained recognizer. We demonstrate the effectiveness of our model-agnostic framework, CLIPTER (CLIP TExt Recognition), on leading text recognition architectures and achieve state-of-the-art results across multiple benchmarks. Furthermore, our analysis highlights improved robustness to out-of-vocabulary words and enhanced generalization in low-data regimes.Comment: Accepted for publication by ICCV 202

    Use of a cancer registry is preferable to a direct-to-community approach for recruitment to a cohort study of wellbeing in women newly diagnosed with invasive breast cancer

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Breast cancer (BC) mortality is declining such that the number of survivors of BC in the community is increasing. BC survivors report a range of sequelae from their cancer and its management beyond the period of their immediate treatment. Previous studies to document these have generally been small, clinic-based or commenced years after diagnosis. We have recruited a large cohort of women newly diagnosed with invasive BC from the community who will be followed for five years in order to systematically document the physical, psychological and socio-economic consequences of BC and its treatment. The aim of this manuscript is to describe the issues encountered in the recruitment of this community-based study population.</p> <p>Methods</p> <p>Women residing in the southern Australian state of Victoria newly diagnosed with invasive BC were recruited to this cohort study using two approaches: directly from the community using an advertising campaign and contemporaneously using an invitation to participate from the Victorian Cancer Registry (VCR).</p> <p>Results</p> <p>Over the two and half year recruitment period, 2135 women were recruited and agreed to receive the enrollment questionnaire (EQ). Of these, 1684 women were eligible and completed an EQ, with the majority of participants having been recruited through the VCR (n = 1321). Only 16% of women contacted by the VCR actively refused participation following a letter of invitation and phone follow-up. The age distribution and tumour characteristics of participants are consistent with state-wide data and their residential postcodes include 400 of a possible 699. Recruitment through a direct community awareness program aimed at women with newly diagnosed invasive BC was difficult, labour-intensive and expensive. Barriers to the recruitment process were identified.</p> <p>Conclusion</p> <p>Most of the women in this study were recruited through a state-based cancer registry. Limitations to recruitment occurred because we required questionnaires to be completed within 12 months of diagnosis in a setting where there is several months delay in notification of new cases to the Registry. Characteristics of the cohort suggest that it is generally representative of women in the state of Victoria newly diagnosed with BC.</p

    Insight into the Mechanisms of Adenovirus Capsid Disassembly from Studies of Defensin Neutralization

    Get PDF
    Defensins are effectors of the innate immune response with potent antibacterial activity. Their role in antiviral immunity, particularly for non-enveloped viruses, is poorly understood. We recently found that human alpha-defensins inhibit human adenovirus (HAdV) by preventing virus uncoating and release of the endosomalytic protein VI during cell entry. Consequently, AdV remains trapped in the endosomal/lysosomal pathway rather than trafficking to the nucleus. To gain insight into the mechanism of defensin-mediated neutralization, we analyzed the specificity of the AdV-defensin interaction. Sensitivity to alpha-defensin neutralization is a common feature of HAdV species A, B1, B2, C, and E, whereas species D and F are resistant. Thousands of defensin molecules bind with low micromolar affinity to a sensitive serotype, but only a low level of binding is observed to resistant serotypes. Neutralization is dependent upon a correctly folded defensin molecule, suggesting that specific molecular interactions occur with the virion. CryoEM structural studies and protein sequence analysis led to a hypothesis that neutralization determinants are located in a region spanning the fiber and penton base proteins. This model was supported by infectivity studies using virus chimeras comprised of capsid proteins from sensitive and resistant serotypes. These findings suggest a mechanism in which defensin binding to critical sites on the AdV capsid prevents vertex removal and thereby blocks subsequent steps in uncoating that are required for release of protein VI and endosomalysis during infection. In addition to informing the mechanism of defensin-mediated neutralization of a non-enveloped virus, these studies provide insight into the mechanism of AdV uncoating and suggest new strategies to disrupt this process and inhibit infection
    • …
    corecore