60 research outputs found
Deep learning approaches to pattern extraction and recognition in paintings and drawings: an overview
This paper provides an overview of some of the most relevant deep learning approaches to pattern extraction and recognition in visual arts, particularly painting and drawing. Recent advances in deep learning and computer vision, coupled with the growing availability of large digitized visual art collections, have opened new opportunities for computer science researchers to assist the art community with automatic tools to analyse and further understand visual arts. Among other benefits, a deeper understanding of visual arts has the potential to make them more accessible to a wider population, ultimately supporting the spread of culture
Automatic Image Captioning with Style
This thesis connects two core topics in machine learning, vision
and language. The problem of choice is image caption generation:
automatically constructing natural language descriptions of image
content. Previous research into image caption generation has
focused on generating purely descriptive captions; I focus on
generating visually relevant captions with a distinct linguistic
style. Captions with style have the potential to ease
communication and add a new layer of personalisation.
First, I consider naming variations in image captions, and
propose a method for predicting context-dependent names that
takes into account visual and linguistic information. This method
makes use of a large-scale image caption dataset, which I also
use to explore naming conventions and report naming conventions
for hundreds of animal classes. Next I propose the SentiCap
model, which relies on recent advances in artificial neural
networks to generate visually relevant image captions with
positive or negative sentiment. To balance descriptiveness and
sentiment, the SentiCap model dynamically switches between two
recurrent neural networks, one tuned for descriptive words and
one for sentiment words. As the first published model for
generating captions with sentiment, SentiCap has influenced a
number of subsequent works. I then investigate the sub-task of
modelling styled sentences without images. The specific task
chosen is sentence simplification: rewriting news article
sentences to make them easier to understand.
For this task I design a neural sequence-to-sequence model that
can work with
limited training data, using novel adaptations for word copying
and sharing
word embeddings. Finally, I present SemStyle, a system for
generating visually
relevant image captions in the style of an arbitrary text corpus.
A shared term
space allows a neural network for vision and content planning to
communicate
with a network for styled language generation. SemStyle achieves
competitive
results in human and automatic evaluations of descriptiveness and
style.
As a whole, this thesis presents two complete systems for styled
caption generation that are first of their kind and demonstrate,
for the first time, that automatic style transfer for image
captions is achievable. Contributions also include novel ideas
for object naming and sentence simplification. This thesis opens
up inquiries into highly personalised image captions; large scale
visually grounded concept naming; and more generally, styled text
generation with content control
Prediction of emotion distribution of images based on weighted K-nearest neighbor-attention mechanism
Existing methods for classifying image emotions often overlook the subjective impact emotions evoke in observers, focusing primarily on emotion categories. However, this approach falls short in meeting practical needs as it neglects the nuanced emotional responses captured within an image. This study proposes a novel approach employing the weighted closest neighbor algorithm to predict the discrete distribution of emotion in abstract paintings. Initially, emotional features are extracted from the images and assigned varying K-values. Subsequently, an encoder-decoder architecture is utilized to derive sentiment features from abstract paintings, augmented by a pre-trained model to enhance classification model generalization and convergence speed. By incorporating a blank attention mechanism into the decoder and integrating it with the encoder's output sequence, the semantics of abstract painting images are learned, facilitating precise and sensible emotional understanding. Experimental results demonstrate that the classification algorithm, utilizing the attention mechanism, achieves a higher accuracy of 80.7% compared to current methods. This innovative approach successfully addresses the intricate challenge of discerning emotions in abstract paintings, underscoring the significance of considering subjective emotional responses in image classification. The integration of advanced techniques such as weighted closest neighbor algorithm and attention mechanisms holds promise for enhancing the comprehension and classification of emotional content in visual art
Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics
This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p
Recognizing the artistic style of fine art paintings with deep learning for an augmented reality application
The rapid digitalization of artwork collections in libraries, museums, galleries,
and art centers has resulted in a growing interest in developing autonomous systems
capable of understanding art concepts and categorizing fine art paintings as it became
difficult to manually manipulate the content of these collections. However, the task of
automatic categorization comes with significant challenges due to the subjective interpretation
and perception of art elements and the reliance on accurate annotations
provided by art experts. As in recent years, deep learning approaches and computer
vision techniques have shown remarkable performance in automating painting
classification; this research aims to develop efficient deep learning systems that can
automatically classify the artistic style of fine-art paintings. In this thesis, we investigate
the effectiveness of seven pre-trained EfficientNet models for identifying the style
of a painting and propose custom models based on pre-trained EfficientNet architectures.
In addition, we analyzed the impact of deep retraining the last eight layers
on the performance of the custom models. The experimental results on the standard
fine art painting classification dataset, Painting-91 indicate that deep retraining of
the last eight layers of the custom models yields the best performance, achieving a 5%
improvement compared to the base models. This demonstrates the effectiveness of
leveraging pre-trained EfficientNet models for automatic artistic style identification in
paintings. Moreover, the study presents a framework that compares the performance
of six pre-trained convolutional neural networks (Xception, ResNet50, InceptionV3,
InceptionResNetV2, DenseNet121, and EfficientNet B3) for identifying artistic styles
in paintings. Notably, Xception architecture is employed for this purpose for the first
time. Furthermore, the impact of different optimizers (SGD, RMSprop, and Adam)
and two learning rates (1e-2 and 1e-4) on model performance is studied using transfer
learning. The experiments on two different art classification datasets, Pandora18k
and Painting-91 revealed that InceptionResNetV2 achieves the highest accuracy for
style classification on both datasets when trained with the Adam optimizer and a
learning rate of 1e-4. Integrating deep learning algorithms and transfer learning techniques
into fine art painting analysis and classification offers promising avenues for
automating style identification tasks. The proposed models and findings contribute
to the development of automatic methods that enable the art community to efficiently
analyze and categorize the vast number of digital paintings available on the internet
Authentication of Amadeo de Souza-Cardoso Paintings and Drawings With Deep Learning
Art forgery has a long-standing history that can be traced back to the Roman period and
has become more rampant as the art market continues prospering. Reports disclosed that
uncountable artworks circulating on the art market could be fake. Even some principal
art museums and galleries could be exhibiting a good percentage of fake artworks. It
is therefore substantially important to conserve cultural heritage, safeguard the interest
of both the art market and the artists, as well as the integrity of artists’ legacies. As a
result, art authentication has been one of the most researched and well-documented fields
due to the ever-growing commercial art market in the past decades. Over the past years,
the employment of computer science in the art world has flourished as it continues to
stimulate interest in both the art world and the artificial intelligence arena. In particular, the
implementation of Artificial Intelligence, namely Deep Learning algorithms and Neural
Networks, has proved to be of significance for specialised image analysis. This research
encompassed multidisciplinary studies on chemistry, physics, art and computer science.
More specifically, the work presents a solution to the problem of authentication of heritage
artwork by Amadeo de Souza-Cardoso, namely paintings, through the use of artificial
intelligence algorithms. First, an authenticity estimation is obtained based on processing of
images through a deep learning model that analyses the brushstroke features of a painting.
Iterative, multi-scale analysis of the images is used to cover the entire painting and produce
an overall indication of authenticity. Second, a mixed input, deep learning model is
proposed to analyse pigments in a painting. This solves the image colour segmentation
and pigment classification problem using hyperspectral imagery. The result is used to
provide an indication of authenticity based on pigment classification and correlation with
chemical data obtained via XRF analysis. Further algorithms developed include a deep
learning model that tackles the pigment unmixing problem based on hyperspectral data.
Another algorithm is a deep learning model that estimates hyperspectral images from
sRGB images. Based on the established algorithms and results obtained, two applications
were developed. First, an Augmented Reality mobile application specifically for the
visualisation of pigments in the artworks by Amadeo. The mobile application targets the
general public, i.e., art enthusiasts, museum visitors, art lovers or art experts. And second, a desktop application with multiple purposes, such as the visualisation of pigments and
hyperspectral data. This application is designed for art specialists, i.e., conservators and
restorers. Due to the special circumstances of the pandemic, trials on the usage of these
applications were only performed within the Department of Conservation and Restoration
at NOVA University Lisbon, where both applications received positive feedback.A falsificação de arte tem uma história de longa data que remonta ao período romano
e tornou-se mais desenfreada à medida que o mercado de arte continua a prosperar.
Relatórios revelaram que inúmeras obras de arte que circulam no mercado de arte podem
ser falsas. Mesmo alguns dos principais museus e galerias de arte poderiam estar exibindo
uma boa porcentagem de obras de arte falsas. Por conseguinte, é extremamente importante
conservar o património cultural, salvaguardar os interesses do mercado da arte e dos artis-
tas, bem como a integridade dos legados dos artistas. Como resultado, a autenticação de
arte tem sido um dos campos mais pesquisados e bem documentados devido ao crescente
mercado de arte comercial nas últimas décadas.Nos últimos anos, o emprego da ciência
da computação no mundo da arte floresceu à medida que continua a estimular o interesse
no mundo da arte e na arena da inteligência artificial. Em particular, a implementação da
Inteligência Artificial, nomeadamente algoritmos de aprendizagem profunda (ou Deep
Learning) e Redes Neuronais, tem-se revelado importante para a análise especializada de
imagens.Esta investigação abrangeu estudos multidisciplinares em química, física, arte e
informática. Mais especificamente, o trabalho apresenta uma solução para o problema da
autenticação de obras de arte patrimoniais de Amadeo de Souza-Cardoso, nomeadamente
pinturas, através da utilização de algoritmos de inteligência artificial. Primeiro, uma esti-
mativa de autenticidade é obtida com base no processamento de imagens através de um
modelo de aprendizagem profunda que analisa as características de pincelada de uma
pintura. A análise iterativa e multiescala das imagens é usada para cobrir toda a pintura e
produzir uma indicação geral de autenticidade. Em segundo lugar, um modelo misto de
entrada e aprendizagem profunda é proposto para analisar pigmentos em uma pintura.
Isso resolve o problema de segmentação de cores de imagem e classificação de pigmentos
usando imagens hiperespectrais. O resultado é usado para fornecer uma indicação de
autenticidade com base na classificação do pigmento e correlação com dados químicos
obtidos através da análise XRF. Outros algoritmos desenvolvidos incluem um modelo
de aprendizagem profunda que aborda o problema da desmistura de pigmentos com
base em dados hiperespectrais. Outro algoritmo é um modelo de aprendizagem profunda
estabelecidos e nos resultados obtidos, foram desenvolvidas duas aplicações. Primeiro,
uma aplicação móvel de Realidade Aumentada especificamente para a visualização de
pigmentos nas obras de Amadeo. A aplicação móvel destina-se ao público em geral, ou
seja, entusiastas da arte, visitantes de museus, amantes da arte ou especialistas em arte.
E, em segundo lugar, uma aplicação de ambiente de trabalho com múltiplas finalidades,
como a visualização de pigmentos e dados hiperespectrais. Esta aplicação é projetada para
especialistas em arte, ou seja, conservadores e restauradores. Devido às circunstâncias
especiais da pandemia, os ensaios sobre a utilização destas aplicações só foram realizados
no âmbito do Departamento de Conservação e Restauro da Universidade NOVA de Lisboa,
onde ambas as candidaturas receberam feedback positivo
Pathway to Future Symbiotic Creativity
This report presents a comprehensive view of our vision on the development
path of the human-machine symbiotic art creation. We propose a classification
of the creative system with a hierarchy of 5 classes, showing the pathway of
creativity evolving from a mimic-human artist (Turing Artists) to a Machine
artist in its own right. We begin with an overview of the limitations of the
Turing Artists then focus on the top two-level systems, Machine Artists,
emphasizing machine-human communication in art creation. In art creation, it is
necessary for machines to understand humans' mental states, including desires,
appreciation, and emotions, humans also need to understand machines' creative
capabilities and limitations. The rapid development of immersive environment
and further evolution into the new concept of metaverse enable symbiotic art
creation through unprecedented flexibility of bi-directional communication
between artists and art manifestation environments. By examining the latest
sensor and XR technologies, we illustrate the novel way for art data collection
to constitute the base of a new form of human-machine bidirectional
communication and understanding in art creation. Based on such communication
and understanding mechanisms, we propose a novel framework for building future
Machine artists, which comes with the philosophy that a human-compatible AI
system should be based on the "human-in-the-loop" principle rather than the
traditional "end-to-end" dogma. By proposing a new form of inverse
reinforcement learning model, we outline the platform design of machine
artists, demonstrate its functions and showcase some examples of technologies
we have developed. We also provide a systematic exposition of the ecosystem for
AI-based symbiotic art form and community with an economic model built on NFT
technology. Ethical issues for the development of machine artists are also
discussed
Creativity and Machine Learning: a Survey
There is a growing interest in the area of machine learning and creativity.
This survey presents an overview of the history and the state of the art of
computational creativity theories, machine learning techniques, including
generative deep learning, and corresponding automatic evaluation methods. After
presenting a critical discussion of the key contributions in this area, we
outline the current research challenges and emerging opportunities in this
field.Comment: 25 pages, 3 figures, 2 table
Machine Learning for handwriting text recognition in historical documents
Olmos
ABSTRACT
In this thesis, we focus on the handwriting text recognition task over historical
documents that are difficult to read for any person that is not an expert in ancient
languages and writing style.
We aim to take advantage and improve the neural networks architectures and
techniques that other authors are proposing for handwriting text recognition in
modern handwritten documents. These models perform this task very precisely
when a large amount of data is available. However, the low availability of labeled
data is a widespread problem in historical documents. The type of writing is
singular, and it is pretty expensive to hire an expert to transcribe a large number
of pages.
After investigating and analyzing the state-of-the-art, we propose the efficient
application of methods such as transfer learning and data augmentation. We also
contribute an algorithm for purging mislabeled samples that affect the learning of
models. Finally, we develop a variational auto encoder method for generating
synthetic samples of handwritten text images for data augmentation.
Experiments are performed on various historical handwritten text databases to
validate the performance of the proposed algorithms. The various included
analyses focus on the evolution of the character and word error rate (CER and
WER) as we increase the training dataset.
One of the most important results is the participation in a contest for transcription
of historical handwritten text. The organizers provided us with a dataset of
documents to train the model, then just a few labeled pages of 5 new documents
were handled to adjust the solution further. Finally, the transcription of nonlabeled
images was requested to evaluate the algorithm. Our method raked
second in this contest
- …