1,154 research outputs found

    Ancient Coin Classification Using Graph Transduction Games

    Full text link
    Recognizing the type of an ancient coin requires theoretical expertise and years of experience in the field of numismatics. Our goal in this work is automatizing this time consuming and demanding task by a visual classification framework. Specifically, we propose to model ancient coin image classification using Graph Transduction Games (GTG). GTG casts the classification problem as a non-cooperative game where the players (the coin images) decide their strategies (class labels) according to the choices made by the others, which results with a global consensus at the final labeling. Experiments are conducted on the only publicly available dataset which is composed of 180 images of 60 types of Roman coins. We demonstrate that our approach outperforms the literature work on the same dataset with the classification accuracy of 73.6% and 87.3% when there are one and two images per class in the training set, respectively

    TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification

    Full text link
    Vision and Language Models (VLMs), such as CLIP, have enabled visual recognition of a potentially unlimited set of categories described by text prompts. However, for the best visual recognition performance, these models still require tuning to better fit the data distributions of the downstream tasks, in order to overcome the domain shift from the web-based pre-training data. Recently, it has been shown that it is possible to effectively tune VLMs without any paired data, and in particular to effectively improve VLMs visual recognition performance using text-only training data generated by Large Language Models (LLMs). In this paper, we dive deeper into this exciting text-only VLM training approach and explore ways it can be significantly further improved taking the specifics of the downstream task into account when sampling text data from LLMs. In particular, compared to the SOTA text-only VLM training approach, we demonstrate up to 8.4% performance improvement in (cross) domain-specific adaptation, up to 8.7% improvement in fine-grained recognition, and 3.1% overall average improvement in zero-shot classification compared to strong baselines.Comment: Code is available at: https://github.com/jmiemirza/TA
    • …
    corecore