248 research outputs found
CLIPAG: Towards Generator-Free Text-to-Image Generation
Perceptually Aligned Gradients (PAG) refer to an intriguing property observed
in robust image classification models, wherein their input gradients align with
human perception and pose semantic meanings. While this phenomenon has gained
significant research attention, it was solely studied in the context of
unimodal vision-only architectures. In this work, we extend the study of PAG to
Vision-Language architectures, which form the foundations for diverse
image-text tasks and applications. Through an adversarial robustification
finetuning of CLIP, we demonstrate that robust Vision-Language models exhibit
PAG in contrast to their vanilla counterparts. This work reveals the merits of
CLIP with PAG (CLIPAG) in several vision-language generative tasks. Notably, we
show that seamlessly integrating CLIPAG in a "plug-n-play" manner leads to
substantial improvements in vision-language generative applications.
Furthermore, leveraging its PAG property, CLIPAG enables text-to-image
generation without any generative model, which typically requires huge
generators
A Tight Competitive Ratio for Online Submodular Welfare Maximization
In this paper we consider the online Submodular Welfare (SW) problem. In this problem we are given n bidders each equipped with a general non-negative (not necessarily monotone) submodular utility and m items that arrive online. The goal is to assign each item, once it arrives, to a bidder or discard it, while maximizing the sum of utilities. When an adversary determines the items\u27 arrival order we present a simple randomized algorithm that achieves a tight competitive ratio of 1/4. The algorithm is a specialization of an algorithm due to [Harshaw-Kazemi-Feldman-Karbasi MOR`22], who presented the previously best known competitive ratio of 3-2?2? 0.171573 to the problem. When the items\u27 arrival order is uniformly random, we present a competitive ratio of ? 0.27493, improving the previously known 1/4 guarantee. Our approach for the latter result is based on a better analysis of the (offline) Residual Random Greedy (RRG) algorithm of [Buchbinder-Feldman-Naor-Schwartz SODA`14], which we believe might be of independent interest
A Tight Competitive Ratio for Online Submodular Welfare Maximization
In this paper we consider the online Submodular Welfare (SW) problem. In this
problem we are given bidders each equipped with a general (not necessarily
monotone) submodular utility and items that arrive online. The goal is to
assign each item, once it arrives, to a bidder or discard it, while maximizing
the sum of utilities. When an adversary determines the items' arrival order we
present a simple randomized algorithm that achieves a tight competitive ratio
of \nicefrac{1}{4}. The algorithm is a specialization of an algorithm due to
[Harshaw-Kazemi-Feldman-Karbasi MOR`22], who presented the previously best
known competitive ratio of to the problem. When
the items' arrival order is uniformly random, we present a competitive ratio of
, improving the previously known \nicefrac{1}{4} guarantee.
Our approach for the latter result is based on a better analysis of the
(offline) Residual Random Greedy (RRG) algorithm of
[Buchbinder-Feldman-Naor-Schwartz SODA`14], which we believe might be of
independent interest
FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions
Image captioning is a central task in computer vision which has experienced
substantial progress following the advent of vision-language pre-training
techniques. In this paper, we highlight a frequently overlooked limitation of
captioning models that often fail to capture semantically significant elements.
This drawback can be traced back to the text-image datasets; while their
captions typically offer a general depiction of image content, they frequently
omit salient details. To mitigate this limitation, we propose FuseCap - a novel
method for enriching captions with additional visual information, obtained from
vision experts, such as object detectors, attribute recognizers, and Optical
Character Recognizers (OCR). Our approach fuses the outputs of such vision
experts with the original caption using a large language model (LLM), yielding
enriched captions that present a comprehensive image description. We validate
the effectiveness of the proposed caption enrichment method through both
quantitative and qualitative analysis. Our method is then used to curate the
training set of a captioning model based BLIP which surpasses current
state-of-the-art approaches in generating accurate and detailed captions while
using significantly fewer parameters and training data. As additional
contributions, we provide a dataset comprising of 12M image-enriched caption
pairs and show that the proposed method largely improves image-text retrieval
Classifier Robustness Enhancement Via Test-Time Transformation
It has been recently discovered that adversarially trained classifiers
exhibit an intriguing property, referred to as perceptually aligned gradients
(PAG). PAG implies that the gradients of such classifiers possess a meaningful
structure, aligned with human perception. Adversarial training is currently the
best-known way to achieve classification robustness under adversarial attacks.
The PAG property, however, has yet to be leveraged for further improving
classifier robustness. In this work, we introduce Classifier Robustness
Enhancement Via Test-Time Transformation (TETRA) -- a novel defense method that
utilizes PAG, enhancing the performance of trained robust classifiers. Our
method operates in two phases. First, it modifies the input image via a
designated targeted adversarial attack into each of the dataset's classes.
Then, it classifies the input image based on the distance to each of the
modified instances, with the assumption that the shortest distance relates to
the true class. We show that the proposed method achieves state-of-the-art
results and validate our claim through extensive experiments on a variety of
defense methods, classifier architectures, and datasets. We also empirically
demonstrate that TETRA can boost the accuracy of any differentiable adversarial
training classifier across a variety of attacks, including ones unseen at
training. Specifically, applying TETRA leads to substantial improvement of up
to , , and on CIFAR10, CIFAR100, and ImageNet,
respectively
CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
Reading text in real-world scenarios often requires understanding the context
surrounding it, especially when dealing with poor-quality text. However,
current scene text recognizers are unaware of the bigger picture as they
operate on cropped text images. In this study, we harness the representative
capabilities of modern vision-language models, such as CLIP, to provide
scene-level information to the crop-based recognizer. We achieve this by fusing
a rich representation of the entire image, obtained from the vision-language
model, with the recognizer word-level features via a gated cross-attention
mechanism. This component gradually shifts to the context-enhanced
representation, allowing for stable fine-tuning of a pretrained recognizer. We
demonstrate the effectiveness of our model-agnostic framework, CLIPTER (CLIP
TExt Recognition), on leading text recognition architectures and achieve
state-of-the-art results across multiple benchmarks. Furthermore, our analysis
highlights improved robustness to out-of-vocabulary words and enhanced
generalization in low-data regimes.Comment: Accepted for publication by ICCV 202
Use of a cancer registry is preferable to a direct-to-community approach for recruitment to a cohort study of wellbeing in women newly diagnosed with invasive breast cancer
<p>Abstract</p> <p>Background</p> <p>Breast cancer (BC) mortality is declining such that the number of survivors of BC in the community is increasing. BC survivors report a range of sequelae from their cancer and its management beyond the period of their immediate treatment. Previous studies to document these have generally been small, clinic-based or commenced years after diagnosis. We have recruited a large cohort of women newly diagnosed with invasive BC from the community who will be followed for five years in order to systematically document the physical, psychological and socio-economic consequences of BC and its treatment. The aim of this manuscript is to describe the issues encountered in the recruitment of this community-based study population.</p> <p>Methods</p> <p>Women residing in the southern Australian state of Victoria newly diagnosed with invasive BC were recruited to this cohort study using two approaches: directly from the community using an advertising campaign and contemporaneously using an invitation to participate from the Victorian Cancer Registry (VCR).</p> <p>Results</p> <p>Over the two and half year recruitment period, 2135 women were recruited and agreed to receive the enrollment questionnaire (EQ). Of these, 1684 women were eligible and completed an EQ, with the majority of participants having been recruited through the VCR (n = 1321). Only 16% of women contacted by the VCR actively refused participation following a letter of invitation and phone follow-up. The age distribution and tumour characteristics of participants are consistent with state-wide data and their residential postcodes include 400 of a possible 699. Recruitment through a direct community awareness program aimed at women with newly diagnosed invasive BC was difficult, labour-intensive and expensive. Barriers to the recruitment process were identified.</p> <p>Conclusion</p> <p>Most of the women in this study were recruited through a state-based cancer registry. Limitations to recruitment occurred because we required questionnaires to be completed within 12 months of diagnosis in a setting where there is several months delay in notification of new cases to the Registry. Characteristics of the cohort suggest that it is generally representative of women in the state of Victoria newly diagnosed with BC.</p
Insight into the Mechanisms of Adenovirus Capsid Disassembly from Studies of Defensin Neutralization
Defensins are effectors of the innate immune response with potent antibacterial activity. Their role in antiviral immunity, particularly for non-enveloped viruses, is poorly understood. We recently found that human alpha-defensins inhibit human adenovirus (HAdV) by preventing virus uncoating and release of the endosomalytic protein VI during cell entry. Consequently, AdV remains trapped in the endosomal/lysosomal pathway rather than trafficking to the nucleus. To gain insight into the mechanism of defensin-mediated neutralization, we analyzed the specificity of the AdV-defensin interaction. Sensitivity to alpha-defensin neutralization is a common feature of HAdV species A, B1, B2, C, and E, whereas species D and F are resistant. Thousands of defensin molecules bind with low micromolar affinity to a sensitive serotype, but only a low level of binding is observed to resistant serotypes. Neutralization is dependent upon a correctly folded defensin molecule, suggesting that specific molecular interactions occur with the virion. CryoEM structural studies and protein sequence analysis led to a hypothesis that neutralization determinants are located in a region spanning the fiber and penton base proteins. This model was supported by infectivity studies using virus chimeras comprised of capsid proteins from sensitive and resistant serotypes. These findings suggest a mechanism in which defensin binding to critical sites on the AdV capsid prevents vertex removal and thereby blocks subsequent steps in uncoating that are required for release of protein VI and endosomalysis during infection. In addition to informing the mechanism of defensin-mediated neutralization of a non-enveloped virus, these studies provide insight into the mechanism of AdV uncoating and suggest new strategies to disrupt this process and inhibit infection
- β¦