60 research outputs found

    Deep learning approaches to pattern extraction and recognition in paintings and drawings: an overview

    Get PDF
    This paper provides an overview of some of the most relevant deep learning approaches to pattern extraction and recognition in visual arts, particularly painting and drawing. Recent advances in deep learning and computer vision, coupled with the growing availability of large digitized visual art collections, have opened new opportunities for computer science researchers to assist the art community with automatic tools to analyse and further understand visual arts. Among other benefits, a deeper understanding of visual arts has the potential to make them more accessible to a wider population, ultimately supporting the spread of culture

    Automatic Image Captioning with Style

    Get PDF
    This thesis connects two core topics in machine learning, vision and language. The problem of choice is image caption generation: automatically constructing natural language descriptions of image content. Previous research into image caption generation has focused on generating purely descriptive captions; I focus on generating visually relevant captions with a distinct linguistic style. Captions with style have the potential to ease communication and add a new layer of personalisation. First, I consider naming variations in image captions, and propose a method for predicting context-dependent names that takes into account visual and linguistic information. This method makes use of a large-scale image caption dataset, which I also use to explore naming conventions and report naming conventions for hundreds of animal classes. Next I propose the SentiCap model, which relies on recent advances in artificial neural networks to generate visually relevant image captions with positive or negative sentiment. To balance descriptiveness and sentiment, the SentiCap model dynamically switches between two recurrent neural networks, one tuned for descriptive words and one for sentiment words. As the first published model for generating captions with sentiment, SentiCap has influenced a number of subsequent works. I then investigate the sub-task of modelling styled sentences without images. The specific task chosen is sentence simplification: rewriting news article sentences to make them easier to understand. For this task I design a neural sequence-to-sequence model that can work with limited training data, using novel adaptations for word copying and sharing word embeddings. Finally, I present SemStyle, a system for generating visually relevant image captions in the style of an arbitrary text corpus. A shared term space allows a neural network for vision and content planning to communicate with a network for styled language generation. SemStyle achieves competitive results in human and automatic evaluations of descriptiveness and style. As a whole, this thesis presents two complete systems for styled caption generation that are first of their kind and demonstrate, for the first time, that automatic style transfer for image captions is achievable. Contributions also include novel ideas for object naming and sentence simplification. This thesis opens up inquiries into highly personalised image captions; large scale visually grounded concept naming; and more generally, styled text generation with content control

    Prediction of emotion distribution of images based on weighted K-nearest neighbor-attention mechanism

    Get PDF
    Existing methods for classifying image emotions often overlook the subjective impact emotions evoke in observers, focusing primarily on emotion categories. However, this approach falls short in meeting practical needs as it neglects the nuanced emotional responses captured within an image. This study proposes a novel approach employing the weighted closest neighbor algorithm to predict the discrete distribution of emotion in abstract paintings. Initially, emotional features are extracted from the images and assigned varying K-values. Subsequently, an encoder-decoder architecture is utilized to derive sentiment features from abstract paintings, augmented by a pre-trained model to enhance classification model generalization and convergence speed. By incorporating a blank attention mechanism into the decoder and integrating it with the encoder's output sequence, the semantics of abstract painting images are learned, facilitating precise and sensible emotional understanding. Experimental results demonstrate that the classification algorithm, utilizing the attention mechanism, achieves a higher accuracy of 80.7% compared to current methods. This innovative approach successfully addresses the intricate challenge of discerning emotions in abstract paintings, underscoring the significance of considering subjective emotional responses in image classification. The integration of advanced techniques such as weighted closest neighbor algorithm and attention mechanisms holds promise for enhancing the comprehension and classification of emotional content in visual art

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    Recognizing the artistic style of fine art paintings with deep learning for an augmented reality application

    Get PDF
    The rapid digitalization of artwork collections in libraries, museums, galleries, and art centers has resulted in a growing interest in developing autonomous systems capable of understanding art concepts and categorizing fine art paintings as it became difficult to manually manipulate the content of these collections. However, the task of automatic categorization comes with significant challenges due to the subjective interpretation and perception of art elements and the reliance on accurate annotations provided by art experts. As in recent years, deep learning approaches and computer vision techniques have shown remarkable performance in automating painting classification; this research aims to develop efficient deep learning systems that can automatically classify the artistic style of fine-art paintings. In this thesis, we investigate the effectiveness of seven pre-trained EfficientNet models for identifying the style of a painting and propose custom models based on pre-trained EfficientNet architectures. In addition, we analyzed the impact of deep retraining the last eight layers on the performance of the custom models. The experimental results on the standard fine art painting classification dataset, Painting-91 indicate that deep retraining of the last eight layers of the custom models yields the best performance, achieving a 5% improvement compared to the base models. This demonstrates the effectiveness of leveraging pre-trained EfficientNet models for automatic artistic style identification in paintings. Moreover, the study presents a framework that compares the performance of six pre-trained convolutional neural networks (Xception, ResNet50, InceptionV3, InceptionResNetV2, DenseNet121, and EfficientNet B3) for identifying artistic styles in paintings. Notably, Xception architecture is employed for this purpose for the first time. Furthermore, the impact of different optimizers (SGD, RMSprop, and Adam) and two learning rates (1e-2 and 1e-4) on model performance is studied using transfer learning. The experiments on two different art classification datasets, Pandora18k and Painting-91 revealed that InceptionResNetV2 achieves the highest accuracy for style classification on both datasets when trained with the Adam optimizer and a learning rate of 1e-4. Integrating deep learning algorithms and transfer learning techniques into fine art painting analysis and classification offers promising avenues for automating style identification tasks. The proposed models and findings contribute to the development of automatic methods that enable the art community to efficiently analyze and categorize the vast number of digital paintings available on the internet

    Authentication of Amadeo de Souza-Cardoso Paintings and Drawings With Deep Learning

    Get PDF
    Art forgery has a long-standing history that can be traced back to the Roman period and has become more rampant as the art market continues prospering. Reports disclosed that uncountable artworks circulating on the art market could be fake. Even some principal art museums and galleries could be exhibiting a good percentage of fake artworks. It is therefore substantially important to conserve cultural heritage, safeguard the interest of both the art market and the artists, as well as the integrity of artists’ legacies. As a result, art authentication has been one of the most researched and well-documented fields due to the ever-growing commercial art market in the past decades. Over the past years, the employment of computer science in the art world has flourished as it continues to stimulate interest in both the art world and the artificial intelligence arena. In particular, the implementation of Artificial Intelligence, namely Deep Learning algorithms and Neural Networks, has proved to be of significance for specialised image analysis. This research encompassed multidisciplinary studies on chemistry, physics, art and computer science. More specifically, the work presents a solution to the problem of authentication of heritage artwork by Amadeo de Souza-Cardoso, namely paintings, through the use of artificial intelligence algorithms. First, an authenticity estimation is obtained based on processing of images through a deep learning model that analyses the brushstroke features of a painting. Iterative, multi-scale analysis of the images is used to cover the entire painting and produce an overall indication of authenticity. Second, a mixed input, deep learning model is proposed to analyse pigments in a painting. This solves the image colour segmentation and pigment classification problem using hyperspectral imagery. The result is used to provide an indication of authenticity based on pigment classification and correlation with chemical data obtained via XRF analysis. Further algorithms developed include a deep learning model that tackles the pigment unmixing problem based on hyperspectral data. Another algorithm is a deep learning model that estimates hyperspectral images from sRGB images. Based on the established algorithms and results obtained, two applications were developed. First, an Augmented Reality mobile application specifically for the visualisation of pigments in the artworks by Amadeo. The mobile application targets the general public, i.e., art enthusiasts, museum visitors, art lovers or art experts. And second, a desktop application with multiple purposes, such as the visualisation of pigments and hyperspectral data. This application is designed for art specialists, i.e., conservators and restorers. Due to the special circumstances of the pandemic, trials on the usage of these applications were only performed within the Department of Conservation and Restoration at NOVA University Lisbon, where both applications received positive feedback.A falsificação de arte tem uma história de longa data que remonta ao período romano e tornou-se mais desenfreada à medida que o mercado de arte continua a prosperar. Relatórios revelaram que inúmeras obras de arte que circulam no mercado de arte podem ser falsas. Mesmo alguns dos principais museus e galerias de arte poderiam estar exibindo uma boa porcentagem de obras de arte falsas. Por conseguinte, é extremamente importante conservar o património cultural, salvaguardar os interesses do mercado da arte e dos artis- tas, bem como a integridade dos legados dos artistas. Como resultado, a autenticação de arte tem sido um dos campos mais pesquisados e bem documentados devido ao crescente mercado de arte comercial nas últimas décadas.Nos últimos anos, o emprego da ciência da computação no mundo da arte floresceu à medida que continua a estimular o interesse no mundo da arte e na arena da inteligência artificial. Em particular, a implementação da Inteligência Artificial, nomeadamente algoritmos de aprendizagem profunda (ou Deep Learning) e Redes Neuronais, tem-se revelado importante para a análise especializada de imagens.Esta investigação abrangeu estudos multidisciplinares em química, física, arte e informática. Mais especificamente, o trabalho apresenta uma solução para o problema da autenticação de obras de arte patrimoniais de Amadeo de Souza-Cardoso, nomeadamente pinturas, através da utilização de algoritmos de inteligência artificial. Primeiro, uma esti- mativa de autenticidade é obtida com base no processamento de imagens através de um modelo de aprendizagem profunda que analisa as características de pincelada de uma pintura. A análise iterativa e multiescala das imagens é usada para cobrir toda a pintura e produzir uma indicação geral de autenticidade. Em segundo lugar, um modelo misto de entrada e aprendizagem profunda é proposto para analisar pigmentos em uma pintura. Isso resolve o problema de segmentação de cores de imagem e classificação de pigmentos usando imagens hiperespectrais. O resultado é usado para fornecer uma indicação de autenticidade com base na classificação do pigmento e correlação com dados químicos obtidos através da análise XRF. Outros algoritmos desenvolvidos incluem um modelo de aprendizagem profunda que aborda o problema da desmistura de pigmentos com base em dados hiperespectrais. Outro algoritmo é um modelo de aprendizagem profunda estabelecidos e nos resultados obtidos, foram desenvolvidas duas aplicações. Primeiro, uma aplicação móvel de Realidade Aumentada especificamente para a visualização de pigmentos nas obras de Amadeo. A aplicação móvel destina-se ao público em geral, ou seja, entusiastas da arte, visitantes de museus, amantes da arte ou especialistas em arte. E, em segundo lugar, uma aplicação de ambiente de trabalho com múltiplas finalidades, como a visualização de pigmentos e dados hiperespectrais. Esta aplicação é projetada para especialistas em arte, ou seja, conservadores e restauradores. Devido às circunstâncias especiais da pandemia, os ensaios sobre a utilização destas aplicações só foram realizados no âmbito do Departamento de Conservação e Restauro da Universidade NOVA de Lisboa, onde ambas as candidaturas receberam feedback positivo

    Pathway to Future Symbiotic Creativity

    Full text link
    This report presents a comprehensive view of our vision on the development path of the human-machine symbiotic art creation. We propose a classification of the creative system with a hierarchy of 5 classes, showing the pathway of creativity evolving from a mimic-human artist (Turing Artists) to a Machine artist in its own right. We begin with an overview of the limitations of the Turing Artists then focus on the top two-level systems, Machine Artists, emphasizing machine-human communication in art creation. In art creation, it is necessary for machines to understand humans' mental states, including desires, appreciation, and emotions, humans also need to understand machines' creative capabilities and limitations. The rapid development of immersive environment and further evolution into the new concept of metaverse enable symbiotic art creation through unprecedented flexibility of bi-directional communication between artists and art manifestation environments. By examining the latest sensor and XR technologies, we illustrate the novel way for art data collection to constitute the base of a new form of human-machine bidirectional communication and understanding in art creation. Based on such communication and understanding mechanisms, we propose a novel framework for building future Machine artists, which comes with the philosophy that a human-compatible AI system should be based on the "human-in-the-loop" principle rather than the traditional "end-to-end" dogma. By proposing a new form of inverse reinforcement learning model, we outline the platform design of machine artists, demonstrate its functions and showcase some examples of technologies we have developed. We also provide a systematic exposition of the ecosystem for AI-based symbiotic art form and community with an economic model built on NFT technology. Ethical issues for the development of machine artists are also discussed

    Creativity and Machine Learning: a Survey

    Full text link
    There is a growing interest in the area of machine learning and creativity. This survey presents an overview of the history and the state of the art of computational creativity theories, machine learning techniques, including generative deep learning, and corresponding automatic evaluation methods. After presenting a critical discussion of the key contributions in this area, we outline the current research challenges and emerging opportunities in this field.Comment: 25 pages, 3 figures, 2 table

    Machine Learning for handwriting text recognition in historical documents

    Get PDF
    Olmos ABSTRACT In this thesis, we focus on the handwriting text recognition task over historical documents that are difficult to read for any person that is not an expert in ancient languages and writing style. We aim to take advantage and improve the neural networks architectures and techniques that other authors are proposing for handwriting text recognition in modern handwritten documents. These models perform this task very precisely when a large amount of data is available. However, the low availability of labeled data is a widespread problem in historical documents. The type of writing is singular, and it is pretty expensive to hire an expert to transcribe a large number of pages. After investigating and analyzing the state-of-the-art, we propose the efficient application of methods such as transfer learning and data augmentation. We also contribute an algorithm for purging mislabeled samples that affect the learning of models. Finally, we develop a variational auto encoder method for generating synthetic samples of handwritten text images for data augmentation. Experiments are performed on various historical handwritten text databases to validate the performance of the proposed algorithms. The various included analyses focus on the evolution of the character and word error rate (CER and WER) as we increase the training dataset. One of the most important results is the participation in a contest for transcription of historical handwritten text. The organizers provided us with a dataset of documents to train the model, then just a few labeled pages of 5 new documents were handled to adjust the solution further. Finally, the transcription of nonlabeled images was requested to evaluate the algorithm. Our method raked second in this contest
    corecore