1,154 research outputs found
Ancient Coin Classification Using Graph Transduction Games
Recognizing the type of an ancient coin requires theoretical expertise and
years of experience in the field of numismatics. Our goal in this work is
automatizing this time consuming and demanding task by a visual classification
framework. Specifically, we propose to model ancient coin image classification
using Graph Transduction Games (GTG). GTG casts the classification problem as a
non-cooperative game where the players (the coin images) decide their
strategies (class labels) according to the choices made by the others, which
results with a global consensus at the final labeling. Experiments are
conducted on the only publicly available dataset which is composed of 180
images of 60 types of Roman coins. We demonstrate that our approach outperforms
the literature work on the same dataset with the classification accuracy of
73.6% and 87.3% when there are one and two images per class in the training
set, respectively
TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification
Vision and Language Models (VLMs), such as CLIP, have enabled visual
recognition of a potentially unlimited set of categories described by text
prompts. However, for the best visual recognition performance, these models
still require tuning to better fit the data distributions of the downstream
tasks, in order to overcome the domain shift from the web-based pre-training
data. Recently, it has been shown that it is possible to effectively tune VLMs
without any paired data, and in particular to effectively improve VLMs visual
recognition performance using text-only training data generated by Large
Language Models (LLMs). In this paper, we dive deeper into this exciting
text-only VLM training approach and explore ways it can be significantly
further improved taking the specifics of the downstream task into account when
sampling text data from LLMs. In particular, compared to the SOTA text-only VLM
training approach, we demonstrate up to 8.4% performance improvement in (cross)
domain-specific adaptation, up to 8.7% improvement in fine-grained recognition,
and 3.1% overall average improvement in zero-shot classification compared to
strong baselines.Comment: Code is available at: https://github.com/jmiemirza/TA
- …