5 research outputs found
Chatting Makes Perfect -- Chat-based Image Retrieval
Chats emerge as an effective user-friendly approach for information
retrieval, and are successfully employed in many domains, such as customer
service, healthcare, and finance. However, existing image retrieval approaches
typically address the case of a single query-to-image round, and the use of
chats for image retrieval has been mostly overlooked. In this work, we
introduce ChatIR: a chat-based image retrieval system that engages in a
conversation with the user to elicit information, in addition to an initial
query, in order to clarify the user's search intent. Motivated by the
capabilities of today's foundation models, we leverage Large Language Models to
generate follow-up questions to an initial image description. These questions
form a dialog with the user in order to retrieve the desired image from a large
corpus. In this study, we explore the capabilities of such a system tested on a
large dataset and reveal that engaging in a dialog yields significant gains in
image retrieval. We start by building an evaluation pipeline from an existing
manually generated dataset and explore different modules and training
strategies for ChatIR. Our comparison includes strong baselines derived from
related applications trained with Reinforcement Learning. Our system is capable
of retrieving the target image from a pool of 50K images with over 78% success
rate after 5 dialogue rounds, compared to 75% when questions are asked by
humans, and 64% for a single shot text-to-image retrieval. Extensive
evaluations reveal the strong capabilities and examine the limitations of
CharIR under different settings
It is all about where you start: Text-to-image generation with seed selection
Text-to-image diffusion models can synthesize a large variety of concepts in
new compositions and scenarios. However, they still struggle with generating
uncommon concepts, rare unusual combinations, or structured concepts like hand
palms. Their limitation is partly due to the long-tail nature of their training
data: web-crawled data sets are strongly unbalanced, causing models to
under-represent concepts from the tail of the distribution. Here we
characterize the effect of unbalanced training data on text-to-image models and
offer a remedy. We show that rare concepts can be correctly generated by
carefully selecting suitable generation seeds in the noise space, a technique
that we call SeedSelect. SeedSelect is efficient and does not require
retraining the diffusion model. We evaluate the benefit of SeedSelect on a
series of problems. First, in few-shot semantic data augmentation, where we
generate semantically correct images for few-shot and long-tail benchmarks. We
show classification improvement on all classes, both from the head and tail of
the training data of diffusion models. We further evaluate SeedSelect on
correcting images of hands, a well-known pitfall of current diffusion models,
and show that it improves hand generation substantially
Watch Your Pose: Unsupervised Domain Adaption with Pose based Triplet Selection for Gait Recognition
Gait Recognition is a computer vision task aiming to identify people by their
walking patterns. Existing methods show impressive results on individual
datasets but lack the ability to generalize to unseen scenarios. Unsupervised
Domain Adaptation (UDA) tries to adapt a model, pre-trained in a supervised
manner on a source domain, to an unlabelled target domain. UDA for Gait
Recognition is still in its infancy and existing works proposed solutions to
limited scenarios. In this paper, we reveal a fundamental phenomenon in
adaptation of gait recognition models, in which the target domain is biased to
pose-based features rather than identity features, causing a significant
performance drop in the identification task. We suggest Gait Orientation-based
method for Unsupervised Domain Adaptation (GOUDA) to reduce this bias. To this
end, we present a novel Triplet Selection algorithm with a curriculum learning
framework, aiming to adapt the embedding space by pushing away samples of
similar poses and bringing closer samples of different poses. We provide
extensive experiments on four widely-used gait datasets, CASIA-B, OU-MVLP,
GREW, and Gait3D, and on three backbones, GaitSet, GaitPart, and GaitGL,
showing the superiority of our proposed method over prior works
Treatment of Higher-Risk Patients With an Indication for Revascularization: Evolution Within the Field of Contemporary Percutaneous Coronary Intervention
Patients with severe coronary artery disease with a clinical indication for revascularization but who are at high procedural risk because of patient comorbidities, complexity of coronary anatomy, and/or poor hemodynamics represent an understudied and potentially underserved patient population. Through advances in percutaneous interventional techniques and technologies and improvements in patient selection, current percutaneous coronary intervention may allow appropriate patients to benefit safely from revascularization procedures that might not have been offered in the past. The burgeoning interest in these procedures in some respects reflects an evolutionary step within the field of percutaneous coronary intervention. However, because of the clinical complexity of many of these patients and procedures, it is critical to develop dedicated specialists within interventional cardiology who are trained with the cognitive and technical skills to select these patients appropriately and to perform these procedures safely. Preprocedural issues such as multidisciplinary risk and treatment assessments are highly relevant to the successful treatment of these patients, and knowledge gaps and future directions to improve outcomes in this emerging area are discussed. Ultimately, an evolution of contemporary interventional cardiology is necessary to treat the increasingly higher-risk patients with whom we are confronted