5 research outputs found

    Chatting Makes Perfect -- Chat-based Image Retrieval

    Full text link
    Chats emerge as an effective user-friendly approach for information retrieval, and are successfully employed in many domains, such as customer service, healthcare, and finance. However, existing image retrieval approaches typically address the case of a single query-to-image round, and the use of chats for image retrieval has been mostly overlooked. In this work, we introduce ChatIR: a chat-based image retrieval system that engages in a conversation with the user to elicit information, in addition to an initial query, in order to clarify the user's search intent. Motivated by the capabilities of today's foundation models, we leverage Large Language Models to generate follow-up questions to an initial image description. These questions form a dialog with the user in order to retrieve the desired image from a large corpus. In this study, we explore the capabilities of such a system tested on a large dataset and reveal that engaging in a dialog yields significant gains in image retrieval. We start by building an evaluation pipeline from an existing manually generated dataset and explore different modules and training strategies for ChatIR. Our comparison includes strong baselines derived from related applications trained with Reinforcement Learning. Our system is capable of retrieving the target image from a pool of 50K images with over 78% success rate after 5 dialogue rounds, compared to 75% when questions are asked by humans, and 64% for a single shot text-to-image retrieval. Extensive evaluations reveal the strong capabilities and examine the limitations of CharIR under different settings

    It is all about where you start: Text-to-image generation with seed selection

    Full text link
    Text-to-image diffusion models can synthesize a large variety of concepts in new compositions and scenarios. However, they still struggle with generating uncommon concepts, rare unusual combinations, or structured concepts like hand palms. Their limitation is partly due to the long-tail nature of their training data: web-crawled data sets are strongly unbalanced, causing models to under-represent concepts from the tail of the distribution. Here we characterize the effect of unbalanced training data on text-to-image models and offer a remedy. We show that rare concepts can be correctly generated by carefully selecting suitable generation seeds in the noise space, a technique that we call SeedSelect. SeedSelect is efficient and does not require retraining the diffusion model. We evaluate the benefit of SeedSelect on a series of problems. First, in few-shot semantic data augmentation, where we generate semantically correct images for few-shot and long-tail benchmarks. We show classification improvement on all classes, both from the head and tail of the training data of diffusion models. We further evaluate SeedSelect on correcting images of hands, a well-known pitfall of current diffusion models, and show that it improves hand generation substantially

    Watch Your Pose: Unsupervised Domain Adaption with Pose based Triplet Selection for Gait Recognition

    Full text link
    Gait Recognition is a computer vision task aiming to identify people by their walking patterns. Existing methods show impressive results on individual datasets but lack the ability to generalize to unseen scenarios. Unsupervised Domain Adaptation (UDA) tries to adapt a model, pre-trained in a supervised manner on a source domain, to an unlabelled target domain. UDA for Gait Recognition is still in its infancy and existing works proposed solutions to limited scenarios. In this paper, we reveal a fundamental phenomenon in adaptation of gait recognition models, in which the target domain is biased to pose-based features rather than identity features, causing a significant performance drop in the identification task. We suggest Gait Orientation-based method for Unsupervised Domain Adaptation (GOUDA) to reduce this bias. To this end, we present a novel Triplet Selection algorithm with a curriculum learning framework, aiming to adapt the embedding space by pushing away samples of similar poses and bringing closer samples of different poses. We provide extensive experiments on four widely-used gait datasets, CASIA-B, OU-MVLP, GREW, and Gait3D, and on three backbones, GaitSet, GaitPart, and GaitGL, showing the superiority of our proposed method over prior works

    Treatment of Higher-Risk Patients With an Indication for Revascularization: Evolution Within the Field of Contemporary Percutaneous Coronary Intervention

    No full text
    Patients with severe coronary artery disease with a clinical indication for revascularization but who are at high procedural risk because of patient comorbidities, complexity of coronary anatomy, and/or poor hemodynamics represent an understudied and potentially underserved patient population. Through advances in percutaneous interventional techniques and technologies and improvements in patient selection, current percutaneous coronary intervention may allow appropriate patients to benefit safely from revascularization procedures that might not have been offered in the past. The burgeoning interest in these procedures in some respects reflects an evolutionary step within the field of percutaneous coronary intervention. However, because of the clinical complexity of many of these patients and procedures, it is critical to develop dedicated specialists within interventional cardiology who are trained with the cognitive and technical skills to select these patients appropriately and to perform these procedures safely. Preprocedural issues such as multidisciplinary risk and treatment assessments are highly relevant to the successful treatment of these patients, and knowledge gaps and future directions to improve outcomes in this emerging area are discussed. Ultimately, an evolution of contemporary interventional cardiology is necessary to treat the increasingly higher-risk patients with whom we are confronted
    corecore