2,320 research outputs found

    Towards ethical multimodal systems

    Full text link
    Generative AI systems (ChatGPT, DALL-E, etc) are expanding into multiple areas of our lives, from art Rombach et al. [2021] to mental health Rob Morris and Kareem Kouddous [2022]; their rapidly growing societal impact opens new opportunities, but also raises ethical concerns. The emerging field of AI alignment aims to make AI systems reflect human values. This paper focuses on evaluating the ethics of multimodal AI systems involving both text and images - a relatively under-explored area, as most alignment work is currently focused on language models. We first create a multimodal ethical database from human feedback on ethicality. Then, using this database, we develop algorithms, including a RoBERTa-large classifier and a multilayer perceptron, to automatically assess the ethicality of system responses.Comment: 5 pages, multimodal ethical dataset building, accepted in the NeurIPS 2023 MP2 worksho

    COMM Notation for Specifying Collaborative and MultiModal Interactive Systems

    Get PDF
    International audienceMulti-user multimodal interactive systems involve multiple users that can use multiple interaction modalities. Although multi-user multimodal systems are becoming more prevalent (especially multimodal systems involving multitouch surfaces), their design is still ad-hoc without properly keeping track of the design process. Addressing this issue of lack of design tools for multi-user multimodal systems, we present the COMM (Collaborative and MultiModal) notation and its on-line editor for specifying multi-user multimodal interactive systems. Extending the CTT notation, the salient features of the COMM notation include the concepts of interactive role and modal task as well as a refinement of the temporal operators applied to tasks using the Allen relationships. A multimodal military command post for the control of unmanned aerial vehicles (UAV) by two operators is used to illustrate the discussion

    Interactive-predictive neural multimodal systems

    Full text link
    [EN] Despite the advances achieved by neural models in sequence to sequence learning, exploited in a variety of tasks, they still make errors. In many use cases, these are corrected by a human expert in a posterior revision process. The interactive-predictive framework aims to minimize the human effort spent on this process by considering partial corrections for iteratively refining the hypothesis. In this work, we generalize the interactive-predictive approach, typically applied in to machine translation field, to tackle other multimodal problems namely, image and video captioning. We study the application of this framework to multimodal neural sequence to sequence models. We show that, following this framework, we approximately halve the effort spent for correcting the outputs generated by the automatic systems. Moreover, we deploy our systems in a publicly accessible demonstration, that allows to better understand the behavior of the interactive-predictive framework.The research leading to these results has received funding from MINECO under grant IDIFEDER/2018/025 Sistemas de fabricacion inteligentes para la industria 4.0, action co-funded by the European Regional Development Fund 2014-2020 (FEDER), and from the European Commission under grant H2020, reference 825111 (DeepHealth). We also acknowledge NVIDIA Corporation for the donation of GPUs used in this work.Peris, Á.; Casacuberta Nolla, F. (2019). Interactive-predictive neural multimodal systems. Springer. 16-28. https://doi.org/978-3-030-31332-6_2S162

    Temporal Alignment Using the Incremental Unit Framework

    Get PDF
    We propose a method for temporal alignments--a precondition of meaningful fusions--of multimodal systems, using the incremental unit dialogue system framework, which gives the system flexibility in how it handles alignment: either by delaying a modality for a specified amount of time, or by revoking (i.e., backtracking) processed information so multiple information sources can be processed jointly. We evaluate our approach in an offline experiment with multimodal data and find that using the incremental framework is flexible and shows promise as a solution to the problem of temporal alignment in multimodal systems

    Natural language in multimedia / multimodal systems

    Get PDF
    corecore