14 research outputs found
Detecting People in Artwork with CNNs
CNNs have massively improved performance in object detection in photographs.
However research into object detection in artwork remains limited. We show
state-of-the-art performance on a challenging dataset, People-Art, which
contains people from photos, cartoons and 41 different artwork movements. We
achieve this high performance by fine-tuning a CNN for this task, thus also
demonstrating that training CNNs on photos results in overfitting for photos:
only the first three or four layers transfer from photos to artwork. Although
the CNN's performance is the highest yet, it remains less than 60\% AP,
suggesting further work is needed for the cross-depiction problem. The final
publication is available at Springer via
http://dx.doi.org/10.1007/978-3-319-46604-0_57Comment: 14 pages, plus 3 pages of references; 7 figures in ECCV 2016
Workshop
Improving Performance of Object Detection using the Mechanisms of Visual Recognition in Humans
Object recognition systems are usually trained and evaluated on high
resolution images. However, in real world applications, it is common that the
images have low resolutions or have small sizes. In this study, we first track
the performance of the state-of-the-art deep object recognition network,
Faster- RCNN, as a function of image resolution. The results reveals negative
effects of low resolution images on recognition performance. They also show
that different spatial frequencies convey different information about the
objects in recognition process. It means multi-resolution recognition system
can provides better insight into optimal selection of features that results in
better recognition of objects. This is similar to the mechanisms of the human
visual systems that are able to implement multi-scale representation of a
visual scene simultaneously. Then, we propose a multi-resolution object
recognition framework rather than a single-resolution network. The proposed
framework is evaluated on the PASCAL VOC2007 database. The experimental results
show the performance of our adapted multi-resolution Faster-RCNN framework
outperforms the single-resolution Faster-RCNN on input images with various
resolutions with an increase in the mean Average Precision (mAP) of 9.14%
across all resolutions and 1.2% on the full-spectrum images. Furthermore, the
proposed model yields robustness of the performance over a wide range of
spatial frequencies
Linking Art through Human Poses
We address the discovery of composition transfer in artworks based on their
visual content. Automated analysis of large art collections, which are growing
as a result of art digitization among museums and galleries, is an important
tool for art history and assists cultural heritage preservation. Modern image
retrieval systems offer good performance on visually similar artworks, but fail
in the cases of more abstract composition transfer. The proposed approach links
artworks through a pose similarity of human figures depicted in images. Human
figures are the subject of a large fraction of visual art from middle ages to
modernity and their distinctive poses were often a source of inspiration among
artists. The method consists of two steps -- fast pose matching and robust
spatial verification. We experimentally show that explicit human pose matching
is superior to standard content-based image retrieval methods on a manually
annotated art composition transfer dataset
DEArt: Dataset of European Art
Large datasets that were made publicly available to the research community
over the last 20 years have been a key enabling factor for the advances in deep
learning algorithms for NLP or computer vision. These datasets are generally
pairs of aligned image / manually annotated metadata, where images are
photographs of everyday life. Scholarly and historical content, on the other
hand, treat subjects that are not necessarily popular to a general audience,
they may not always contain a large number of data points, and new data may be
difficult or impossible to collect. Some exceptions do exist, for instance,
scientific or health data, but this is not the case for cultural heritage (CH).
The poor performance of the best models in computer vision - when tested over
artworks - coupled with the lack of extensively annotated datasets for CH, and
the fact that artwork images depict objects and actions not captured by
photographs, indicate that a CH-specific dataset would be highly valuable for
this community. We propose DEArt, at this point primarily an object detection
and pose classification dataset meant to be a reference for paintings between
the XIIth and the XVIIIth centuries. It contains more than 15000 images, about
80% non-iconic, aligned with manual annotations for the bounding boxes
identifying all instances of 69 classes as well as 12 possible poses for boxes
identifying human-like objects. Of these, more than 50 classes are CH-specific
and thus do not appear in other datasets; these reflect imaginary beings,
symbolic entities and other categories related to art. Additionally, existing
datasets do not include pose annotations. Our results show that object
detectors for the cultural heritage domain can achieve a level of precision
comparable to state-of-art models for generic images via transfer learning.Comment: VISART VI. Workshop at the European Conference of Computer Vision
(ECCV
Robustness of SAM: Segment Anything Under Corruptions and Beyond
Segment anything model (SAM), as the name suggests, is claimed to be capable
of cutting out any object. SAM is a vision foundation model which demonstrates
impressive zero-shot transfer performance with the guidance of a prompt.
However, there is currently a lack of comprehensive evaluation of its
robustness performance under various types of corruptions. Prior works show
that SAM is biased towards texture (style) rather than shape, motivated by
which we start by investigating SAM's robustness against style transfer, which
is synthetic corruption. With the effect of corruptions interpreted as a style
change, we further evaluate its robustness on 15 common corruptions with 5
severity levels for each real-world corruption. Beyond the corruptions, we
further evaluate the SAM robustness on local occlusion and adversarial
perturbations. Overall, this work provides a comprehensive empirical study on
the robustness of the SAM under corruptions and beyond.Comment: 16page
An analysis of the transfer learning of convolutional neural networks for artistic images
Transfer learning from huge natural image datasets, fine-tuning of deep
neural networks and the use of the corresponding pre-trained networks have
become de facto the core of art analysis applications. Nevertheless, the
effects of transfer learning are still poorly understood. In this paper, we
first use techniques for visualizing the network internal representations in
order to provide clues to the understanding of what the network has learned on
artistic images. Then, we provide a quantitative analysis of the changes
introduced by the learning process thanks to metrics in both the feature and
parameter spaces, as well as metrics computed on the set of maximal activation
images. These analyses are performed on several variations of the transfer
learning procedure. In particular, we observed that the network could
specialize some pre-trained filters to the new image modality and also that
higher layers tend to concentrate classes. Finally, we have shown that a double
fine-tuning involving a medium-size artistic dataset can improve the
classification on smaller datasets, even when the task changes.Comment: Accepted at Workshop on Fine Art Pattern Extraction and Recognition
(FAPER), ICPR, 202
A Data Set and a Convolutional Model for Iconography Classification in Paintings
Iconography in art is the discipline that studies the visual content of
artworks to determine their motifs and themes andto characterize the way these
are represented. It is a subject of active research for a variety of purposes,
including the interpretation of meaning, the investigation of the origin and
diffusion in time and space of representations, and the study of influences
across artists and art works. With the proliferation of digital archives of art
images, the possibility arises of applying Computer Vision techniques to the
analysis of art images at an unprecedented scale, which may support iconography
research and education. In this paper we introduce a novel paintings data set
for iconography classification and present the quantitativeand qualitative
results of applying a Convolutional Neural Network (CNN) classifier to the
recognition of the iconography of artworks. The proposed classifier achieves
good performances (71.17% Precision, 70.89% Recall, 70.25% F1-Score and 72.73%
Average Precision) in the task of identifying saints in Christian religious
paintings, a task made difficult by the presence of classes with very similar
visual features. Qualitative analysis of the results shows that the CNN focuses
on the traditional iconic motifs that characterize the representation of each
saint and exploits such hints to attain correct identification. The ultimate
goal of our work is to enable the automatic extraction, decomposition, and
comparison of iconography elements to support iconographic studies and
automatic art work annotation.Comment: Published at ACM Journal on Computing and Cultural Heritage (JOCCH)
https://doi.org/10.1145/345888