21 research outputs found

    VideoTag: Encouraging the Effective Tagging of Internet Videos Through Tagging Games

    Get PDF
    A thesis submitted in partial fulfillment of the requirements of the University of Wolverhampton for the degree of Doctor of PhilosophyAbstract The tags and descriptions entered by video owners in video sharing sites are typically inadequate for retrieval purposes, yet the majority of video search still uses this text. This problem is escalating due to the ease with which users can self-publish videos, generating masses that are poorly labelled and poorly described. This thesis investigates how users tag videos and whether video tagging games can solve this problem by generating useful sets of tags. A preliminary study investigated tags in two social video sharing sites, YouTube and Viddler. YouTube contained many irrelevant tags because the system does not encourage users to tag their videos and does not promote tags as useful. In contrast, using tags as the sole means of categorisation in Viddler motivated users to enter a higher proportion of relevant tags. Poor tags were found in both systems, however, highlighting the need to improve video tagging. In order to give users incentives to tag videos, the VideoTag project in this thesis developed two tagging games, Golden Tag and Top Tag, and one non-game tagging system, Simply Tag, and conducted two experiments with them. In the first experiment VideoTag was a portal to play video tagging games whereas in the second experiment it was a portal to curate collections of special interest videos. Users preferred to tag videos using games, generating tags that were relevant to the videos and that covered a range of tag types that were descriptive of the video content at a predominately specific, objective level. Users were motivated by interest in the content rather than by game elements, and content had an effect on the tag types used. In each experiment, users predominately tagged videos using objective language, with a tendency to use specific rather than basic tags. There was a significant difference between the types of tags entered in the games and in Simply Tag, with more basic, objective vocabulary entered into the games and more specific, objective language entered into the non-game system. Subjective tags were rare but were more frequent in Simply Tag. Gameplay also had an influence on the types of tags entered; Top Tag generated more basic tags and Golden Tag generated more specific and subjective tags. Users were not attracted to use VideoTag by the games alone. Game mechanics had little impact on motivations to use the system. VideoTag used YouTube videos, but could not upload the tags to YouTube and so users could see no benefit for the tags they entered, reducing participation. Specific interest content was more of a motivator for use than games or tagging and that this warrants further research. In the current game-saturated climate, gamification of a video tagging system may therefore be most successful for collections of videos that already have a committed user base.University of Wolverhampto

    Towards Inclusion in Museums: Multisensory and Cross-Modal Translations/Interpretations of Visual Artworks

    Get PDF
    Access to art and cultural works is a fundamental human right, irrespective of abilities and human differences. However, traditional museum experiences heavily rely on visual perception, which creates barriers for visitors—especially for those who are unable to access art through sight. How can visual art be “translated” into other modalities, and what might be their affordances, limitations, and impact? This qualitative investigation focused on a graduate course on multisensory museum experiences embedded within a unique partnership between the Art Gallery of Ontario and OCAD University. Observations and interviews with students, instructors, museum visitors, and stakeholders (including community members with vision impairments and museum professionals) revealed: a range of translation/interpretation strategies, from “literal” (mapping visually perceived spatial properties of artworks to non-visual perceptual modalities) to “constructivist” (non-literal mappings that aim to engender audience memories that are akin to what might have inspired the original artwork); transformative student journeys, such as building meaningful connections with art; and significant impact on diverse audiences and students. This study revealed promising directions for inclusive museums, a preliminary technical language to support the design of translations/ interpretations, and a need for theoretically informed and tested standards to guide these designs and practices

    Participatory Sensing and Crowdsourcing in Urban Environment

    Get PDF
    With an increasing number of people who live in cities, urban mobility becomes one of the most important research fields in the so-called smart city environments. Urban mobility can be defined as the ability of people to move around the city, living and interacting with the space. For these reasons, urban accessibility represents a primary factor to keep into account for social inclusion and for the effective exercise of citizenship. In this thesis, we researched how to use crowdsourcing and participative sensing to effectively and efficiently collect data about aPOIs (accessible Point Of Interests) with the aim of obtaining an updated, trusted and completed accessible map of the urban environment. The data gathered in such a way, was integrated with data retrieved from external open dataset and used in computing personalized accessible urban paths. In order to deeply investigate the issues related to this research, we designed and prototyped mPASS, a context-aware and location-based accessible way-finding system

    An interdisciplinary concept for human-centered explainable artificial intelligence - Investigating the impact of explainable AI on end-users

    Get PDF
    Since the 1950s, Artificial Intelligence (AI) applications have captivated people. However, this fascination has always been accompanied by disillusionment about the limitations of this technology. Today, machine learning methods such as Deep Neural Networks (DNN) are successfully used in various tasks. However, these methods also have limitations: Their complexity makes their decisions no longer comprehensible to humans - they are black-boxes. The research branch of Explainable AI (XAI) has addressed this problem by investigating how to make AI decisions comprehensible. This desire is not new. In the 1970s, developers of intrinsic explainable AI approaches, so-called white-boxes (e.g., rule-based systems), were dealing with AI explanations. Nowadays, with the increased use of AI systems in all areas of life, the design of comprehensible systems has become increasingly important. Developing such systems is part of Human-Centred AI (HCAI) research, which integrates human needs and abilities in the design of AI interfaces. For this, an understanding is needed of how humans perceive XAI and how AI explanations influence the interaction between humans and AI. One of the open questions concerns the investigation of XAI for end-users, i.e., people who have no expertise in AI but interact with such systems or are impacted by the system's decisions. This dissertation investigates the impact of different levels of interactive XAI of white- and black-box AI systems on end-users perceptions. Based on an interdisciplinary concept presented in this work, it is examined how the content, type, and interface of explanations of DNN (black box) and rule-based systems (white box) are perceived by end-users. How XAI influences end-users mental models, trust, self-efficacy, cognitive workload, and emotional state regarding the AI system is the centre of the investigation. At the beginning of the dissertation, general concepts regarding AI, explanations, and psychological constructs of mental models, trust, self-efficacy, cognitive load, and emotions are introduced. Subsequently, related work regarding the design and investigation of XAI for users is presented. This serves as a basis for the concept of a Human-Centered Explainable AI (HC-XAI) presented in this dissertation, which combines an XAI design approach with user evaluations. The author pursues an interdisciplinary approach that integrates knowledge from the research areas of (X)AI, Human-Computer Interaction, and Psychology. Based on this interdisciplinary concept, a five-step approach is derived and applied to illustrative surveys and experiments in the empirical part of this dissertation. To illustrate the first two steps, a persona approach for HC-XAI is presented, and based on that, a template for designing personas is provided. To illustrate the usage of the template, three surveys are presented that ask end-users about their attitudes and expectations towards AI and XAI. The personas generated from the survey data indicate that end-users often lack knowledge of XAI and that their perception of it depends on demographic and personality-related characteristics. Steps three to five deal with the design of XAI for concrete applications. For this, different levels of interactive XAI are presented and investigated in experiments with end-users. For this purpose, two rule-based systems (i.e., white-box) and four systems based on DNN (i.e., black-box) are used. These are applied for three purposes: Cooperation & collaboration, education, and medical decision support. Six user studies were conducted for this purpose, which differed in the interactivity of the XAI system used. The results show that end-users trust and mental models of AI depend strongly on the context of use and the design of the explanation itself. For example, explanations that a virtual agent mediates are shown to promote trust. The content and type of explanations are also perceived differently by users. The studies also show that end-users in different application contexts of XAI feel the desire for interactive explanations. The dissertation concludes with a summary of the scientific contribution, points out limitations of the presented work, and gives an outlook on possible future research topics to integrate explanations into everyday AI systems and thus enable the comprehensible handling of AI for all people.Seit den 1950er Jahren haben Anwendungen der Künstlichen Intelligenz (KI) die Menschen in ihren Bann gezogen. Diese Faszination wurde jedoch stets von Ernüchterung über die Grenzen dieser Technologie begleitet. Heute werden Methoden des maschinellen Lernens wie Deep Neural Networks (DNN) erfolgreich für verschiedene Aufgaben eingesetzt. Doch auch diese Methoden haben ihre Grenzen: Durch ihre Komplexität sind ihre Entscheidungen für den Menschen nicht mehr nachvollziehbar - sie sind Black-Boxes. Der Forschungszweig der Erklärbaren KI (engl. XAI) hat sich diesem Problem angenommen und untersucht, wie man KI-Entscheidungen nachvollziehbar machen kann. Dieser Wunsch ist nicht neu. In den 1970er Jahren beschäftigten sich die Entwickler von intrinsisch erklärbaren KI-Ansätzen, so genannten White-Boxes (z. B. regelbasierte Systeme), mit KI-Erklärungen. Heutzutage, mit dem zunehmenden Einsatz von KI-Systemen in allen Lebensbereichen, wird die Gestaltung nachvollziehbarer Systeme immer wichtiger. Die Entwicklung solcher Systeme ist Teil der Menschzentrierten KI (engl. HCAI) Forschung, die menschliche Bedürfnisse und Fähigkeiten in die Gestaltung von KI-Schnittstellen integriert. Dafür ist ein Verständnis darüber erforderlich, wie Menschen XAI wahrnehmen und wie KI-Erklärungen die Interaktion zwischen Mensch und KI beeinflussen. Eine der offenen Fragen betrifft die Untersuchung von XAI für Endnutzer, d.h. Menschen, die keine Expertise in KI haben, aber mit solchen Systemen interagieren oder von deren Entscheidungen betroffen sind. In dieser Dissertation wird untersucht, wie sich verschiedene Stufen interaktiver XAI von White- und Black-Box-KI-Systemen auf die Wahrnehmung der Endnutzer auswirken. Basierend auf einem interdisziplinären Konzept, das in dieser Arbeit vorgestellt wird, wird untersucht, wie der Inhalt, die Art und die Schnittstelle von Erklärungen von DNN (Black-Box) und regelbasierten Systemen (White-Box) von Endnutzern wahrgenommen werden. Wie XAI die mentalen Modelle, das Vertrauen, die Selbstwirksamkeit, die kognitive Belastung und den emotionalen Zustand der Endnutzer in Bezug auf das KI-System beeinflusst, steht im Mittelpunkt der Untersuchung. Zu Beginn der Arbeit werden allgemeine Konzepte zu KI, Erklärungen und psychologische Konstrukte von mentalen Modellen, Vertrauen, Selbstwirksamkeit, kognitiver Belastung und Emotionen vorgestellt. Anschließend werden verwandte Arbeiten bezüglich dem Design und der Untersuchung von XAI für Nutzer präsentiert. Diese dienen als Grundlage für das in dieser Dissertation vorgestellte Konzept einer Menschzentrierten Erklärbaren KI (engl. HC-XAI), das einen XAI-Designansatz mit Nutzerevaluationen kombiniert. Die Autorin verfolgt einen interdisziplinären Ansatz, der Wissen aus den Forschungsbereichen (X)AI, Mensch-Computer-Interaktion und Psychologie integriert. Auf der Grundlage dieses interdisziplinären Konzepts wird ein fünfstufiger Ansatz abgeleitet und im empirischen Teil dieser Arbeit auf exemplarische Umfragen und Experimente und angewendet. Zur Veranschaulichung der ersten beiden Schritte wird ein Persona-Ansatz für HC-XAI vorgestellt und darauf aufbauend eine Vorlage für den Entwurf von Personas bereitgestellt. Um die Verwendung der Vorlage zu veranschaulichen, werden drei Umfragen präsentiert, in denen Endnutzer zu ihren Einstellungen und Erwartungen gegenüber KI und XAI befragt werden. Die aus den Umfragedaten generierten Personas zeigen, dass es den Endnutzern oft an Wissen über XAI mangelt und dass ihre Wahrnehmung dessen von demografischen und persönlichkeitsbezogenen Merkmalen abhängt. Die Schritte drei bis fünf befassen sich mit der Gestaltung von XAI für konkrete Anwendungen. Hierzu werden verschiedene Stufen interaktiver XAI vorgestellt und in Experimenten mit Endanwendern untersucht. Zu diesem Zweck werden zwei regelbasierte Systeme (White-Box) und vier auf DNN basierende Systeme (Black-Box) verwendet. Diese werden für drei Zwecke eingesetzt: Kooperation & Kollaboration, Bildung und medizinische Entscheidungsunterstützung. Hierzu wurden sechs Nutzerstudien durchgeführt, die sich in der Interaktivität des verwendeten XAI-Systems unterschieden. Die Ergebnisse zeigen, dass das Vertrauen und die mentalen Modelle der Endnutzer in KI stark vom Nutzungskontext und der Gestaltung der Erklärung selbst abhängen. Es hat sich beispielsweise gezeigt, dass Erklärungen, die von einem virtuellen Agenten vermittelt werden, das Vertrauen fördern. Auch der Inhalt und die Art der Erklärungen werden von den Nutzern unterschiedlich wahrgenommen. Die Studien zeigen zudem, dass Endnutzer in unterschiedlichen Anwendungskontexten von XAI den Wunsch nach interaktiven Erklärungen verspüren. Die Dissertation schließt mit einer Zusammenfassung des wissenschaftlichen Beitrags, weist auf Grenzen der vorgestellten Arbeit hin und gibt einen Ausblick auf mögliche zukünftige Forschungsthemen, um Erklärungen in alltägliche KI-Systeme zu integrieren und damit den verständlichen Umgang mit KI für alle Menschen zu ermöglichen

    Crowd-supervised training of spoken language systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 155-166).Spoken language systems are often deployed with static speech recognizers. Only rarely are parameters in the underlying language, lexical, or acoustic models updated on-the-fly. In the few instances where parameters are learned in an online fashion, developers traditionally resort to unsupervised training techniques, which are known to be inferior to their supervised counterparts. These realities make the development of spoken language interfaces a difficult and somewhat ad-hoc engineering task, since models for each new domain must be built from scratch or adapted from a previous domain. This thesis explores an alternative approach that makes use of human computation to provide crowd-supervised training for spoken language systems. We explore human-in-the-loop algorithms that leverage the collective intelligence of crowds of non-expert individuals to provide valuable training data at a very low cost for actively deployed spoken language systems. We also show that in some domains the crowd can be incentivized to provide training data for free, as a byproduct of interacting with the system itself. Through the automation of crowdsourcing tasks, we construct and demonstrate organic spoken language systems that grow and improve without the aid of an expert. Techniques that rely on collecting data remotely from non-expert users, however, are subject to the problem of noise. This noise can sometimes be heard in audio collected from poor microphones or muddled acoustic environments. Alternatively, noise can take the form of corrupt data from a worker trying to game the system - for example, a paid worker tasked with transcribing audio may leave transcripts blank in hopes of receiving a speedy payment. We develop strategies to mitigate the effects of noise in crowd-collected data and analyze their efficacy. This research spans a number of different application domains of widely-deployed spoken language interfaces, but maintains the common thread of improving the speech recognizer's underlying models with crowd-supervised training algorithms. We experiment with three central components of a speech recognizer: the language model, the lexicon, and the acoustic model. For each component, we demonstrate the utility of a crowd-supervised training framework. For the language model and lexicon, we explicitly show that this framework can be used hands-free, in two organic spoken language systems.by Ian C. McGraw.Ph.D

    From social tagging to polyrepresentation: a study of expert annotating behavior of moving images

    Get PDF
    Mención Internacional en el título de doctorThis thesis investigates “nichesourcing” (De Boer, Hildebrand, et al., 2012), an emergent initiative of cultural heritage crowdsoucing in which niches of experts are involved in the annotating tasks. This initiative is studied in relation to moving image annotation, and in the context of audiovisual heritage, more specifically, within the sector of film archives. The work presents a case study of film and media scholars to investigate the types of annotations and attribute descriptions that they could eventually contribute, as well as the information needs, and seeking and searching behaviors of this group, in order to determine what the role of the different types of annotations in supporting their expert tasks would be. The study is composed of three independent but interconnected studies using a mixed methodology and an interpretive approach. It uses concepts from the information behavior discipline, and the "Integrated Information Seeking and Retrieval Framework" (IS&R) (Ingwersen and Järvelin, 2005) as guidance for the investigation. The findings show that there are several types of annotations that moving image experts could contribute to a nichesourcing initiative, of which time-based tags are only one of the possibilities. The findings also indicate that for the different foci in film and media research, in-depth indexing at the content level is only needed for supporting a specific research focus, for supporting research in other domains, or for engaging broader audiences. The main implications at the level of information infrastructure are the requirement for more varied annotating support, more interoperability among existing metadata standards and frameworks, and the need for guidelines about crowdsoucing and nichesourcing implementation in the audiovisual heritage sector. This research presents contributions to the studies of social tagging applied to moving images, to the discipline of information behavior, by proposing new concepts related to the area of use behavior, and to the concept of “polyrepresentation” (Ingwersen, 1992, 1996) applied to the humanities domain.Esta tesis investiga la iniciativa del nichesourcing (De Boer, Hildebrand, et al., 2012), como una forma de crowdsoucing en sector del patrimonio cultural, en la cuál grupos de expertos participan en las tareas de anotación de las colecciones. El ámbito de aplicación es la anotación de las imágenes en movimiento en el contexto del patrimonio audiovisual, más específicamente, en el caso de los archivos fílmicos. El trabajo presenta un estudio de caso aplicado a un dominio específico de expertos en el ámbito audiovisual: los académicos de cine y medios. El análisis se centra en dos aspectos específicos del problema: los tipos de anotaciones y atributos en las descripciones que podrían obtenerse de este nicho de expertos; y en las necesidades de información y el comportamiento informacional de dicho grupo, con el fin de determinar cuál es el rol de los diferentes tipos de anotaciones en sus tareas de investigación. La tesis se compone de tres estudios independientes e interconectados; se usa una metodología mixta e interpretativa. El marco teórico se compone de conceptos del área de estudios de comportamiento informacional (“information behavior”) y del “Marco integrado de búsqueda y recuperación de la información” ("Integrated Information Seeking and Retrieval Framework" (IS&R)) propuesto por Ingwersen y Järvelin (2005), que sirven de guía para la investigación. Los hallazgos indican que existen diversas formas de anotación de la imagen en movimiento que podrían generarse a partir de las contribuciones de expertos, de las cuáles las etiquetas a nivel de plano son sólo una de las posibilidades. Igualmente, se identificaron diversos focos de investigación en el área académica de cine y medios. La indexación detallada de contenidos sólo es requerida por uno de esos grupos y por investigadores de otras disciplinas, o como forma de involucrar audiencias más amplias. Las implicaciones más relevantes, a nivel de la infraestructura informacional, se refieren a los requisitos de soporte a formas más variadas de anotación, el requisito de mayor interoperabilidad de los estándares y marcos de metadatos, y la necesidad de publicación de guías de buenas prácticas sobre de cómo implementar iniciativas de crowdsoucing o nichesourcing en el sector del patrimonio audiovisual. Este trabajo presenta aportes a la investigación sobre el etiquetado social aplicado a las imágenes en movimiento, a la disciplina de estudios del comportamiento informacional, a la que se proponen nuevos conceptos relacionados con el área de uso de la información, y al concepto de “poli-representación” (Ingwersen, 1992, 1996) en las disciplinas humanísticas.Programa Oficial de Doctorado en Documentación: Archivos y Bibliotecas en el Entorno DigitalPresidente: Peter Emil Rerup Ingwersen.- Secretario: Antonio Hernández Pérez.- Vocal: Nils Phar

    MediaSync: Handbook on Multimedia Synchronization

    Get PDF
    This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences
    corecore