Search CORE

2,320 research outputs found

Towards ethical multimodal systems

Author: Aïmeur Esma
Rish Irina
Roger Alexis
Publication venue
Publication date: 14/11/2023
Field of study

Generative AI systems (ChatGPT, DALL-E, etc) are expanding into multiple areas of our lives, from art Rombach et al. [2021] to mental health Rob Morris and Kareem Kouddous [2022]; their rapidly growing societal impact opens new opportunities, but also raises ethical concerns. The emerging field of AI alignment aims to make AI systems reflect human values. This paper focuses on evaluating the ethics of multimodal AI systems involving both text and images - a relatively under-explored area, as most alignment work is currently focused on language models. We first create a multimodal ethical database from human feedback on ethicality. Then, using this database, we develop algorithms, including a RoBERTa-large classifier and a multilayer perceptron, to automatically assess the ethicality of system responses.Comment: 5 pages, multimodal ethical dataset building, accepted in the NeurIPS 2023 MP2 worksho

arXiv.org e-Print Archive

COMM Notation for Specifying Collaborative and MultiModal Interactive Systems

Author: Jourde Frédéric
Laurillau Yann
Nigay Laurence
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

International audienceMulti-user multimodal interactive systems involve multiple users that can use multiple interaction modalities. Although multi-user multimodal systems are becoming more prevalent (especially multimodal systems involving multitouch surfaces), their design is still ad-hoc without properly keeping track of the design process. Addressing this issue of lack of design tools for multi-user multimodal systems, we present the COMM (Collaborative and MultiModal) notation and its on-line editor for specifying multi-user multimodal interactive systems. Extending the CTT notation, the salient features of the COMM notation include the concepts of interactive role and modal task as well as a refinement of the temporal operators applied to tasks using the Allen relationships. A multimodal military command post for the control of unmanned aerial vehicles (UAV) by two operators is used to illustrate the discussion

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Interactive-predictive neural multimodal systems

Author: A Lavie
A Peris
AH Toselli
Alex Graves
G Foster
J Nielsen
L Quirós
M Bolaños
M Hodosh
S Barrachina
S Hochreiter
V Alabau
Y LeCun
Á Peris
Á Peris
Álvaro Peris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/05/2019
Field of study

[EN] Despite the advances achieved by neural models in sequence to sequence learning, exploited in a variety of tasks, they still make errors. In many use cases, these are corrected by a human expert in a posterior revision process. The interactive-predictive framework aims to minimize the human effort spent on this process by considering partial corrections for iteratively refining the hypothesis. In this work, we generalize the interactive-predictive approach, typically applied in to machine translation field, to tackle other multimodal problems namely, image and video captioning. We study the application of this framework to multimodal neural sequence to sequence models. We show that, following this framework, we approximately halve the effort spent for correcting the outputs generated by the automatic systems. Moreover, we deploy our systems in a publicly accessible demonstration, that allows to better understand the behavior of the interactive-predictive framework.The research leading to these results has received funding from MINECO under grant IDIFEDER/2018/025 Sistemas de fabricacion inteligentes para la industria 4.0, action co-funded by the European Regional Development Fund 2014-2020 (FEDER), and from the European Commission under grant H2020, reference 825111 (DeepHealth). We also acknowledge NVIDIA Corporation for the donation of GPUs used in this work.Peris, Á.; Casacuberta Nolla, F. (2019). Interactive-predictive neural multimodal systems. Springer. 16-28. https://doi.org/978-3-030-31332-6_2S162

arXiv.org e-Print Archive

Crossref

RiuNet

Recommended from our members

A multimodal restaurant finder for semantic web

Author: He Yulan
Hui Siu Cheung
Quan Thanh Tho
Publication venue
Publication date: 01/01/2007
Field of study

Multimodal dialogue systems provide multiple modalities in the form of speech, mouse clicking, drawing or touch that can enhance human-computer interaction. However, one of the drawbacks of the existing multimodal systems is that they are highly domain-speciﬁc and they do not allow information to be shared across different providers. In this paper, we propose a semantic multimodal system, called Semantic Restaurant Finder, for the Semantic Web in which the restaurant information in different city/country/language are constructed as ontologies to allow the information to be sharable. From the Semantic Restaurant Finder, users can make use of the semantic restaurant knowledge distributed from different locations on the Internet to ﬁnd the desired restaurants

Open Research Online (The Open University)

Temporal Alignment Using the Incremental Unit Framework

Author: Aist Gregory
Aist Gregory
Asri Layla El
Kennington Casey
Tokunaga Takenobu
Publication venue: 'IUScholarWorks'
Publication date: 01/01/2017
Field of study

We propose a method for temporal alignments--a precondition of meaningful fusions--of multimodal systems, using the incremental unit dialogue system framework, which gives the system flexibility in how it handles alignment: either by delaying a modality for a specified amount of time, or by revoking (i.e., backtracking) processed information so multiple information sources can be processed jointly. We evaluate our approach in an offline experiment with multimodal data and find that using the incremental framework is flexible and shows promise as a solution to the problem of temporal alignment in multimodal systems

Crossref

Boise State University - ScholarWorks

Natural language in multimedia / multimodal systems

Author: André Elisabeth
Publication venue
Publication date: 04/02/2019
Field of study

OPUS Augsburg