2,320 research outputs found
Towards ethical multimodal systems
Generative AI systems (ChatGPT, DALL-E, etc) are expanding into multiple
areas of our lives, from art Rombach et al. [2021] to mental health Rob Morris
and Kareem Kouddous [2022]; their rapidly growing societal impact opens new
opportunities, but also raises ethical concerns. The emerging field of AI
alignment aims to make AI systems reflect human values. This paper focuses on
evaluating the ethics of multimodal AI systems involving both text and images -
a relatively under-explored area, as most alignment work is currently focused
on language models. We first create a multimodal ethical database from human
feedback on ethicality. Then, using this database, we develop algorithms,
including a RoBERTa-large classifier and a multilayer perceptron, to
automatically assess the ethicality of system responses.Comment: 5 pages, multimodal ethical dataset building, accepted in the NeurIPS
2023 MP2 worksho
COMM Notation for Specifying Collaborative and MultiModal Interactive Systems
International audienceMulti-user multimodal interactive systems involve multiple users that can use multiple interaction modalities. Although multi-user multimodal systems are becoming more prevalent (especially multimodal systems involving multitouch surfaces), their design is still ad-hoc without properly keeping track of the design process. Addressing this issue of lack of design tools for multi-user multimodal systems, we present the COMM (Collaborative and MultiModal) notation and its on-line editor for specifying multi-user multimodal interactive systems. Extending the CTT notation, the salient features of the COMM notation include the concepts of interactive role and modal task as well as a refinement of the temporal operators applied to tasks using the Allen relationships. A multimodal military command post for the control of unmanned aerial vehicles (UAV) by two operators is used to illustrate the discussion
Interactive-predictive neural multimodal systems
[EN] Despite the advances achieved by neural models in sequence
to sequence learning, exploited in a variety of tasks, they still make errors.
In many use cases, these are corrected by a human expert in a posterior
revision process. The interactive-predictive framework aims to minimize
the human effort spent on this process by considering partial corrections
for iteratively refining the hypothesis. In this work, we generalize the
interactive-predictive approach, typically applied in to machine translation field, to tackle other multimodal problems namely, image and video
captioning. We study the application of this framework to multimodal
neural sequence to sequence models. We show that, following this framework, we approximately halve the effort spent for correcting the outputs
generated by the automatic systems. Moreover, we deploy our systems
in a publicly accessible demonstration, that allows to better understand
the behavior of the interactive-predictive framework.The research leading to these results has received funding from MINECO under grant
IDIFEDER/2018/025 Sistemas de fabricacion inteligentes para la industria 4.0,
action co-funded by the European Regional Development Fund 2014-2020 (FEDER),
and from the European Commission under grant H2020, reference 825111 (DeepHealth). We also acknowledge NVIDIA Corporation for the donation of GPUs used
in this work.Peris, Á.; Casacuberta Nolla, F. (2019). Interactive-predictive neural multimodal systems. Springer. 16-28. https://doi.org/978-3-030-31332-6_2S162
Recommended from our members
A multimodal restaurant finder for semantic web
Multimodal dialogue systems provide multiple modalities in the form of speech, mouse clicking, drawing or touch that can enhance human-computer interaction. However, one of the drawbacks of the existing multimodal systems is that they are highly domain-specific and they do not allow information to be shared across different providers. In this paper, we propose a semantic multimodal system, called Semantic Restaurant Finder, for the Semantic Web in which the restaurant information in different city/country/language are constructed as ontologies to allow the information to be sharable. From the Semantic Restaurant Finder, users can make use of the semantic restaurant knowledge distributed from different locations on the Internet to find the desired restaurants
Temporal Alignment Using the Incremental Unit Framework
We propose a method for temporal alignments--a precondition of meaningful fusions--of multimodal systems, using the incremental unit dialogue system framework, which gives the system flexibility in how it handles alignment: either by delaying a modality for a specified amount of time, or by revoking (i.e., backtracking) processed information so multiple information sources can be processed jointly. We evaluate our approach in an offline experiment with multimodal data and find that using the incremental framework is flexible and shows promise as a solution to the problem of temporal alignment in multimodal systems
- …