49 research outputs found

    Visual Analytics for the Exploratory Analysis and Labeling of Cultural Data

    Get PDF
    Cultural data can come in various forms and modalities, such as text traditions, artworks, music, crafted objects, or even as intangible heritage such as biographies of people, performing arts, cultural customs and rites. The assignment of metadata to such cultural heritage objects is an important task that people working in galleries, libraries, archives, and museums (GLAM) do on a daily basis. These rich metadata collections are used to categorize, structure, and study collections, but can also be used to apply computational methods. Such computational methods are in the focus of Computational and Digital Humanities projects and research. For the longest time, the digital humanities community has focused on textual corpora, including text mining, and other natural language processing techniques. Although some disciplines of the humanities, such as art history and archaeology have a long history of using visualizations. In recent years, the digital humanities community has started to shift the focus to include other modalities, such as audio-visual data. In turn, methods in machine learning and computer vision have been proposed for the specificities of such corpora. Over the last decade, the visualization community has engaged in several collaborations with the digital humanities, often with a focus on exploratory or comparative analysis of the data at hand. This includes both methods and systems that support classical Close Reading of the material and Distant Reading methods that give an overview of larger collections, as well as methods in between, such as Meso Reading. Furthermore, a wider application of machine learning methods can be observed on cultural heritage collections. But they are rarely applied together with visualizations to allow for further perspectives on the collections in a visual analytics or human-in-the-loop setting. Visual analytics can help in the decision-making process by guiding domain experts through the collection of interest. However, state-of-the-art supervised machine learning methods are often not applicable to the collection of interest due to missing ground truth. One form of ground truth are class labels, e.g., of entities depicted in an image collection, assigned to the individual images. Labeling all objects in a collection is an arduous task when performed manually, because cultural heritage collections contain a wide variety of different objects with plenty of details. A problem that arises with these collections curated in different institutions is that not always a specific standard is followed, so the vocabulary used can drift apart from another, making it difficult to combine the data from these institutions for large-scale analysis. This thesis presents a series of projects that combine machine learning methods with interactive visualizations for the exploratory analysis and labeling of cultural data. First, we define cultural data with regard to heritage and contemporary data, then we look at the state-of-the-art of existing visualization, computer vision, and visual analytics methods and projects focusing on cultural data collections. After this, we present the problems addressed in this thesis and their solutions, starting with a series of visualizations to explore different facets of rap lyrics and rap artists with a focus on text reuse. Next, we engage in a more complex case of text reuse, the collation of medieval vernacular text editions. For this, a human-in-the-loop process is presented that applies word embeddings and interactive visualizations to perform textual alignments on under-resourced languages supported by labeling of the relations between lines and the relations between words. We then switch the focus from textual data to another modality of cultural data by presenting a Virtual Museum that combines interactive visualizations and computer vision in order to explore a collection of artworks. With the lessons learned from the previous projects, we engage in the labeling and analysis of medieval illuminated manuscripts and so combine some of the machine learning methods and visualizations that were used for textual data with computer vision methods. Finally, we give reflections on the interdisciplinary projects and the lessons learned, before we discuss existing challenges when working with cultural heritage data from the computer science perspective to outline potential research directions for machine learning and visual analytics of cultural heritage data

    The Prime Machine: a user-friendly corpus tool for English language teaching and self-tutoring based on the Lexical Priming theory of language

    Get PDF
    This thesis presents the design and evaluation of a new concordancer called The Prime Machine which has been developed as an English language learning and teaching tool. The software has been designed to provide learners with a multitude of examples from corpus texts and additional information about the contextual environment in which words and combinations of words tend to occur. The prevailing view of how language operates has been that grammar and lexis are separate systems and sentences can be constructed merely by choosing any syntactic structure and slotting in vocabulary. Over the last few decades, however, corpus linguistics has presented challenges to this view of language, drawing on evidence which can be found in the patterning of language choices in texts. Nevertheless, despite some reports of success from researchers in this area, only a limited number of teachers and learners of second language seem to make direct use of corpus software tools. The desire to develop a new corpus tool grew out of professional experience as an English language teacher and manager in China. This thesis begins by introducing some background information about the role of English in international higher education and the language learning context in China, and then goes on to describe the software architecture and the process by which corpus texts are transformed from their raw state into rows of data in a sophisticated database to be accessed by the concordancer. It then introduces innovations including several aspects of the search screen interface, the concordance line display and the use of collocation data. The software provides a rich learning platform for language learners to independently look up and compare similar words, different word forms, different collocations and the same words across two corpora. Underpinning the design is a view of language which draws on Michael Hoey's theory of Lexical Priming. The software is designed to make it possible to see tendencies of words and phrases which are not usually apparent in either dictionary examples or the output from other concordancing software. The design features are considered from a pedagogical perspective, focusing on English for Academic Purposes and including important software design principles from Computer Aided Language Learning. Through a small evaluation involving undergraduate students, the software has been shown to have great potential as a tool for the writing process. It is believed that The Prime Machine will be a very useful corpus tool which, while simple to operate, provides a wealth of information for English language teaching and self-tutoring

    From Index Locorum to Citation Network: an Approach to the Automatic Extraction of Canonical References and its Applications to the Study of Classical Texts

    Get PDF
    My research focusses on the automatic extraction of canonical references from publications in Classics. Such references are the standard way of citing classical texts and are found in great numbers throughout monographs, journal articles and commentaries. In chapters 1 and 2 I argue for the importance of canonical citations and for the need to capture them automatically. Their importance and function is to signal text passages that are studied and discussed, often in relation to one another as can be seen in parallel passages found in modern commentaries. Scholars in the field have long been exploiting this kind of information by manually creating indexes of cited passages, the so-called indices locorum. However, the challenge we now face is find new ways of indexing and retrieving information contained in the growing volume of digital archives and libraries. Chapters 3 and 4 look at how this problem can be tackled by translating the extraction of canonical citations into a computationally solvable problem. The approach I developed consists of treating the extraction of such citations as a problem of named entity extraction. This problem can be solved with some degree of accuracy by applying and adapting methods of Natural Language Processing. In this part of the dissertation I discuss the implementation of this approach as a working prototype and an evaluation of its performance. Once canonical references have been extracted from texts, the web of relations between documents that they create can be represented as a network. This network can then be searched, manipulated, visualised and analysed in various ways. In chapter 5 I focus specifically on how this network can be leveraged to search through bodies of secondary literature. Finally in chapter 6 I discuss how my work opens up new research perspectives in terms of visualisation, analysis and the application of such automatically extracted citation networks

    Design and Instantiation of an Interactive Multidimensional Ontology for Game Design Elements – a Design and Behavioral Approach

    Get PDF
    While games and play are commonly perceived as leisure tools, focus on the strategic implementation of isolated gameful elements outside of games has risen in recent years under the term gamification. Given their ease of implementation and impact in competitive games, a small set of game design elements, namely points, badges, and leaderboards, initially dominated research and practice. However, these elements reflect only a small group of components that game designers use to achieve positive outcomes in their systems. Current research has shifted towards focusing on the game design process instead of the isolated implementation of single elements under the term gameful design. But the problem of a tendency toward a monocultural selection of prominent design elements persists in-game and gameful design, preventing the method from reaching its full potential. This dissertation addresses this problem by designing and developing a digital, interactive game design element ontology that scholars and practitioners can use to make more informed and inspired decisions in creating gameful solutions to their problems. The first part of this work is concerned with the collation and development of the digital ontology. First, two datasets were collated from game design and gamification literature (game design elements and playing motivations). Next, four explorative studies were conducted to add user-relevant metadata and connect their items into an ontological structure. The first two studies use card sorting to assess game theory frameworks regarding their suitability as foundational categories for the game design element dataset and to gain an overview of different viewpoints from which categorizations can be derived. The second set of studies builds on an explorative method of matching dataset entries via their descriptive keywords to arrive at a connected graph. The first of these studies connects items of the playing motivations dataset with themselves, while the second connects them with an additional dataset of human needs. The first part closes with the documentation of the design and development of the tool Kubun, reporting on the outcome of its evaluation via iterative expert interviews and a field study. The results suggest that the tool serves its preset goals of affording intuitive browsing for dedicated searches and serendipitous findings. While the first part of this work reports on the top-down development process of the ontology and related navigation tool, the second part presents an in-depth research of specific learning-oriented game design elements to complement the overall research goal through a complementary bottom-up approach. Therein, two studies on learning-oriented game design elements are reported regarding their effect on performance, long-term learning outcome, and knowledge transfer. The studies are conducted with a game dedicated to teaching correct waste sorting. The first study focuses on a reward-based game design element in terms of its motivatory effect on perfect play. The second study evaluates two learning-enhancing game design elements, repeat, and look-up, in terms of their contribution to a long-term learning outcome. The comprehensive insights gained through the in-depth research manifest in the design of a module dedicated to reporting research outcomes in the ontology. The dissertation concludes with a discussion on the studies’ varying limitations and an outlook on pathways for future research

    Community-driven & Work-integrated Creation, Use and Evolution of Ontological Knowledge Structures

    Get PDF

    K + K = 120 : Papers dedicated to László Kálmán and András Kornai on the occasion of their 60th birthdays

    Get PDF

    From social tagging to polyrepresentation: a study of expert annotating behavior of moving images

    Get PDF
    Mención Internacional en el título de doctorThis thesis investigates “nichesourcing” (De Boer, Hildebrand, et al., 2012), an emergent initiative of cultural heritage crowdsoucing in which niches of experts are involved in the annotating tasks. This initiative is studied in relation to moving image annotation, and in the context of audiovisual heritage, more specifically, within the sector of film archives. The work presents a case study of film and media scholars to investigate the types of annotations and attribute descriptions that they could eventually contribute, as well as the information needs, and seeking and searching behaviors of this group, in order to determine what the role of the different types of annotations in supporting their expert tasks would be. The study is composed of three independent but interconnected studies using a mixed methodology and an interpretive approach. It uses concepts from the information behavior discipline, and the "Integrated Information Seeking and Retrieval Framework" (IS&R) (Ingwersen and Järvelin, 2005) as guidance for the investigation. The findings show that there are several types of annotations that moving image experts could contribute to a nichesourcing initiative, of which time-based tags are only one of the possibilities. The findings also indicate that for the different foci in film and media research, in-depth indexing at the content level is only needed for supporting a specific research focus, for supporting research in other domains, or for engaging broader audiences. The main implications at the level of information infrastructure are the requirement for more varied annotating support, more interoperability among existing metadata standards and frameworks, and the need for guidelines about crowdsoucing and nichesourcing implementation in the audiovisual heritage sector. This research presents contributions to the studies of social tagging applied to moving images, to the discipline of information behavior, by proposing new concepts related to the area of use behavior, and to the concept of “polyrepresentation” (Ingwersen, 1992, 1996) applied to the humanities domain.Esta tesis investiga la iniciativa del nichesourcing (De Boer, Hildebrand, et al., 2012), como una forma de crowdsoucing en sector del patrimonio cultural, en la cuál grupos de expertos participan en las tareas de anotación de las colecciones. El ámbito de aplicación es la anotación de las imágenes en movimiento en el contexto del patrimonio audiovisual, más específicamente, en el caso de los archivos fílmicos. El trabajo presenta un estudio de caso aplicado a un dominio específico de expertos en el ámbito audiovisual: los académicos de cine y medios. El análisis se centra en dos aspectos específicos del problema: los tipos de anotaciones y atributos en las descripciones que podrían obtenerse de este nicho de expertos; y en las necesidades de información y el comportamiento informacional de dicho grupo, con el fin de determinar cuál es el rol de los diferentes tipos de anotaciones en sus tareas de investigación. La tesis se compone de tres estudios independientes e interconectados; se usa una metodología mixta e interpretativa. El marco teórico se compone de conceptos del área de estudios de comportamiento informacional (“information behavior”) y del “Marco integrado de búsqueda y recuperación de la información” ("Integrated Information Seeking and Retrieval Framework" (IS&R)) propuesto por Ingwersen y Järvelin (2005), que sirven de guía para la investigación. Los hallazgos indican que existen diversas formas de anotación de la imagen en movimiento que podrían generarse a partir de las contribuciones de expertos, de las cuáles las etiquetas a nivel de plano son sólo una de las posibilidades. Igualmente, se identificaron diversos focos de investigación en el área académica de cine y medios. La indexación detallada de contenidos sólo es requerida por uno de esos grupos y por investigadores de otras disciplinas, o como forma de involucrar audiencias más amplias. Las implicaciones más relevantes, a nivel de la infraestructura informacional, se refieren a los requisitos de soporte a formas más variadas de anotación, el requisito de mayor interoperabilidad de los estándares y marcos de metadatos, y la necesidad de publicación de guías de buenas prácticas sobre de cómo implementar iniciativas de crowdsoucing o nichesourcing en el sector del patrimonio audiovisual. Este trabajo presenta aportes a la investigación sobre el etiquetado social aplicado a las imágenes en movimiento, a la disciplina de estudios del comportamiento informacional, a la que se proponen nuevos conceptos relacionados con el área de uso de la información, y al concepto de “poli-representación” (Ingwersen, 1992, 1996) en las disciplinas humanísticas.Programa Oficial de Doctorado en Documentación: Archivos y Bibliotecas en el Entorno DigitalPresidente: Peter Emil Rerup Ingwersen.- Secretario: Antonio Hernández Pérez.- Vocal: Nils Phar

    B!SON: A Tool for Open Access Journal Recommendation

    Get PDF
    Finding a suitable open access journal to publish scientific work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of Predatory Publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. It is developed based on a systematic requirements analysis, built on open data, gives publisher-independent recommendations and works across domains. It suggests open access journals based on title, abstract and references provided by the user. The recommendation quality has been evaluated using a large test set of 10,000 articles. Development by two German scientific libraries ensures the longevity of the project
    corecore