6,692 research outputs found
Deep learning approaches to pattern extraction and recognition in paintings and drawings: an overview
This paper provides an overview of some of the most relevant deep learning approaches to pattern extraction and recognition in visual arts, particularly painting and drawing. Recent advances in deep learning and computer vision, coupled with the growing availability of large digitized visual art collections, have opened new opportunities for computer science researchers to assist the art community with automatic tools to analyse and further understand visual arts. Among other benefits, a deeper understanding of visual arts has the potential to make them more accessible to a wider population, ultimately supporting the spread of culture
Visual Analytics for the Exploratory Analysis and Labeling of Cultural Data
Cultural data can come in various forms and modalities, such as text traditions, artworks, music, crafted objects, or even as intangible heritage such as biographies of people, performing arts, cultural customs and rites.
The assignment of metadata to such cultural heritage objects is an important task that people working in galleries, libraries, archives, and museums (GLAM) do on a daily basis.
These rich metadata collections are used to categorize, structure, and study collections, but can also be used to apply computational methods.
Such computational methods are in the focus of Computational and Digital Humanities projects and research.
For the longest time, the digital humanities community has focused on textual corpora, including text mining, and other natural language processing techniques.
Although some disciplines of the humanities, such as art history and archaeology have a long history of using visualizations.
In recent years, the digital humanities community has started to shift the focus to include other modalities, such as audio-visual data.
In turn, methods in machine learning and computer vision have been proposed for the specificities of such corpora.
Over the last decade, the visualization community has engaged in several collaborations with the digital humanities, often with a focus on exploratory or comparative analysis of the data at hand.
This includes both methods and systems that support classical Close Reading of the material and Distant Reading methods that give an overview of larger collections, as well as methods in between, such as Meso Reading.
Furthermore, a wider application of machine learning methods can be observed on cultural heritage collections.
But they are rarely applied together with visualizations to allow for further perspectives on the collections in a visual analytics or human-in-the-loop setting.
Visual analytics can help in the decision-making process by guiding domain experts through the collection of interest.
However, state-of-the-art supervised machine learning methods are often not applicable to the collection of interest due to missing ground truth.
One form of ground truth are class labels, e.g., of entities depicted in an image collection, assigned to the individual images.
Labeling all objects in a collection is an arduous task when performed manually, because cultural heritage collections contain a wide variety of different objects with plenty of details.
A problem that arises with these collections curated in different institutions is that not always a specific standard is followed, so the vocabulary used can drift apart from another, making it difficult to combine the data from these institutions for large-scale analysis.
This thesis presents a series of projects that combine machine learning methods with interactive visualizations for the exploratory analysis and labeling of cultural data.
First, we define cultural data with regard to heritage and contemporary data, then we look at the state-of-the-art of existing visualization, computer vision, and visual analytics methods and projects focusing on cultural data collections.
After this, we present the problems addressed in this thesis and their solutions, starting with a series of visualizations to explore different facets of rap lyrics and rap artists with a focus on text reuse.
Next, we engage in a more complex case of text reuse, the collation of medieval vernacular text editions.
For this, a human-in-the-loop process is presented that applies word embeddings and interactive visualizations to perform textual alignments on under-resourced languages supported by labeling of the relations between lines and the relations between words.
We then switch the focus from textual data to another modality of cultural data by presenting a Virtual Museum that combines interactive visualizations and computer vision in order to explore a collection of artworks.
With the lessons learned from the previous projects, we engage in the labeling and analysis of medieval illuminated manuscripts and so combine some of the machine learning methods and visualizations that were used for textual data with computer vision methods.
Finally, we give reflections on the interdisciplinary projects and the lessons learned, before we discuss existing challenges when working with cultural heritage data from the computer science perspective to outline potential research directions for machine learning and visual analytics of cultural heritage data
Learning to Read L'Infinito: Handwritten Text Recognition with Synthetic Training Data
Deep learning-based approaches to Handwritten Text Recognition (HTR) have shown remarkable results on publicly available large datasets, both modern and historical. However, it is often the case that historical manuscripts are preserved in small collections, most of the time with unique characteristics in terms of paper support, author handwriting style, and language. State-of-the-art HTR approaches struggle to obtain good performance on such small manuscript collections, for which few training samples are available. In this paper, we focus on HTR on small historical datasets and propose a new historical dataset, which we call Leopardi, with the typical characteristics of small manuscript collections, consisting of letters by the poet Giacomo Leopardi, and devise strategies to deal with the training data scarcity scenario. In particular, we explore the use of carefully designed but cost-effective synthetic data for pre-training HTR models to be applied to small single-author manuscripts. Extensive experiments validate the suitability of the proposed approach, and both the Leopardi dataset and synthetic data will be available to favor further research in this direction
Controlled Vocabulary Enhancement through Crowdsourcing: Project Andvari, Micropasts, and Public Quality Assurance
Proceedings paper published by Society of American Archivists. Presented at conference in 2015 in Cleveland, OH (http://www2.archivists.org/proceedings/research-forum/2015/agenda#papers). Published by SAA in 2016.This paper presents an experimental approach of using crowdsourcing to test controlled vocabularies for digital collections of cultural objects. For a digital humanities initiative project, Project Andvari, which is intended to create a digital portal of early medieval northern European artifacts, it was recognized that there was a need to develop a semantically structured iconographic thesaurus to describe the iconographic content of distributed artefactual collections from a variety of contributing institutions. This paper discusses a workflow of planning and development process of controlled vocabularies for the project and a testing process of the vocabularies to determine both the usability of controlled vocabularies and the feasibility of quality assurance approach. This paper demonstrates an applicability of crowdsourcing in developing controlled vocabularies
Recommended from our members
Sharing and reusing rich media: lessons from The Open University
OpenCourseWare and Open Educational Resources comprise many types of assets including rich media. However dynamic rich media offer different opportunities and challenges for learners, teachers and higher education institutions alike than do more static items such as text. The Open University in the UK has been extensively developing and using rich media in its distance teaching programmes since it was established in 1969, often in partnership with the BBC. As new media technologies have arrived so has the capabilities of The Open University to create rich media. This paper describes these developments and then discusses the approaches required to guide them in a way that both serves the university and the wider higher education community. It concludes that rich media are an essential part of the developing OCW/OER landscape and that openly sharing them brings defined benefits to an HEI beyond their traditional student body
Accessible Cultural Heritage through Explainable Artificial Intelligence
International audienceEthics Guidelines for Trustworthy AI advocate for AI technology that is, among other things, more inclusive. Explainable AI (XAI) aims at making state of the art opaque models more transparent, and defends AI-based outcomes endorsed with a rationale explanation, i.e., an explanation that has as target the non-technical users. XAI and Responsible AI principles defend the fact that the audience expertise should be included in the evaluation of explainable AI systems. However, AI has not yet reached all public and audiences , some of which may need it the most. One example of domain where accessibility has not much been influenced by the latest AI advances is cultural heritage. We propose including minorities as special user and evaluator of the latest XAI techniques. In order to define catalytic scenarios for collaboration and improved user experience, we pose some challenges and research questions yet to address by the latest AI models likely to be involved in such synergy
Towards generating and evaluating iconographic image captions of artworks
To automatically generate accurate and meaningful textual descriptions of images is an ongoing research challenge. Recently, a lot of progress has been made by adopting multimodal deep learning approaches for integrating vision and language. However, the task of developing image captioning models is most commonly addressed using datasets of natural images, while not many contributions have been made in the domain of artwork images. One of the main reasons for that is the lack of large-scale art datasets of adequate image-text pairs. Another reason is the fact that generating accurate descriptions of artwork images is particularly challenging because descriptions of artworks are more complex and can include multiple levels of interpretation. It is therefore also especially difficult to effectively evaluate generated captions of artwork images. The aim of this work is to address some of those challenges by utilizing a large-scale dataset of artwork images annotated with concepts from the Iconclass classification system. Using this dataset, a captioning model is developed by fine-tuning a transformer-based vision-language pretrained model. Due to the complex relations between image and text pairs in the domain of artwork images, the generated captions are evaluated using several quantitative and qualitative approaches. The performance is assessed using standard image captioning metrics and a recently introduced reference-free metric. The quality of the generated captions and the model’s capacity to generalize to new data is explored by employing the model to another art dataset to compare the relation between commonly generated captions and the genre of artworks. The overall results suggest that the model can generate meaningful captions that indicate a stronger relevance to the art historical context, particularly in comparison to captions obtained from models trained only on natural image datasets
- …