1,749 research outputs found
Investigating non-classical correlations between decision fused multi-modal documents
Correlation has been widely used to facilitate various information retrieval methods such as query expansion, relevance feedback, document clustering, and multi-modal fusion. Especially, correlation and independence are important issues when fusing different modalities that influence a multi-modal information retrieval process. The basic idea of correlation is that an observable can help predict or enhance another observable. In quantum mechanics, quantum correlation, called entanglement, is a sort of correlation between the observables measured in atomic-size particles when these particles are not necessarily collected in ensembles. In this paper, we examine a multimodal fusion scenario that might be similar to that encountered in physics by firstly measuring two observables (i.e., text-based relevance and image-based relevance) of a multi-modal document without counting on an ensemble of multi-modal documents already labeled in terms of these two variables. Then, we investigate the existence of non-classical correlations between pairs of multi-modal documents. Despite there are some basic differences between entanglement and classical correlation encountered in the macroscopic world, we investigate the existence of this kind of non-classical correlation through the Bell inequality violation. Here, we experimentally test several novel association methods in a small-scale experiment. However, in the current experiment we did not find any violation of the Bell inequality. Finally, we present a series of interesting discussions, which may provide theoretical and empirical insights and inspirations for future development of this direction
A Survey of Quantum Theory Inspired Approaches to Information Retrieval
Since 2004, researchers have been using the mathematical framework of Quantum Theory (QT) in Information Retrieval (IR). QT offers a generalized probability and logic framework. Such a framework has been shown capable of unifying the representation, ranking and user cognitive aspects of IR, and helpful in developing more dynamic, adaptive and context-aware IR systems. Although Quantum-inspired IR is still a growing area, a wide array of work in different aspects of IR has been done and produced promising results. This paper presents a survey of the research done in this area, aiming to show the landscape of the field and draw a road-map of future directions
Recommended from our members
Quantum Cognitively Motivated Context-Aware Multimodal Representation Learning for Human Language Analysis
A long-standing goal in the field of Artificial Intelligence (AI) is to develop systems that can perceive and understand human multimodal language. This requires both the consideration of context in the form of surrounding utterances in a conversation, i.e., context modelling, as well as the impact of different modalities (e.g., linguistic, visual acoustic), i.e., multimodal fusion. In the last few years, significant strides have been made towards the interpretation of human language due to simultaneous advancement in deep learning, data gathering and computing infrastructure. AI models have been investigated to either model interactions across distinct modalities, i.e., linguistic, visual and acoustic, or model interactions across parties in a conversation, achieving unprecedented levels of performance. However, AI models are often designed with only performance as their design target, leaving aside other essential factors such as transparency, interpretability, and how humans understand and reason about cognitive states.
In line with this observation, in this dissertation, we develop quantum probabilistic neural models and techniques that allow us to capture rational and irrational cognitive biases, without requiring a priori understanding and identification of them. First, we present a comprehensive empirical comparison of state-of-the-art (SOTA) modality fusion strategies for video sentiment analysis. The findings provide us helpful insights into the development of more effective modality fusion models incorporating quantum-inspired components. Second, we introduce an end-to-end complex-valued neural model for video sentiment analysis, simulating quantum procedural steps, outside of physics, into the neural network modelling paradigm. Third, we investigate non-classical correlations across different modalities. In particular, we describe a methodology to model interactions between image and text for an information retrieval scenario. The results provide us with theoretical and empirical insights to develop a transparent end-to-end probabilistic neural model for video emotion detection in conversations, capturing non-classical correlations across distinct modalities. Fourth, we introduce a theoretical framework to model user's cognitive states underlying their multimodal decision perspectives, and propose a methodology to capture interference of modalities in decision making.
Overall, we show that our models advance the SOTA on various affective analysis tasks, achieve high transparency due to the mapping to quantum physics meanings, and improve post-hoc interpretability, unearthing useful and explainable knowledge about cross-modal interactions
Challenges in Multimodal Data Fusion
International audienceIn various disciplines, information about the same phenomenon can be acquired from different types of detectors, at different conditions, different observations times, in multiple experiments or subjects, etc. We use the term "modality" to denote each such type of acquisition framework. Due to the rich characteristics of natural phenomena, as well as of the environments in which they occur, it is rare that a single modality can provide complete knowledge of the phenomenon of interest. The increasing availability of several modalities at once introduces new degrees of freedom, which raise questions beyond those related to exploiting each modality separately. It is the aim of this paper to evoke and promote various challenges in multimodal data fusion at the conceptual level, without focusing on any specific model, method or application
Recommended from our members
An Entanglement-driven Fusion Neural Network for Video Sentiment Analysis
Video data is multimodal in its nature, where an utterance can involve linguistic, visual and acoustic information. Therefore, a key challenge for video sentiment analysis is how to combine different modalities for sentiment recognition effectively. The latest neural network approaches achieve state-of-the-art performance, but they neglect to a large degree of how humans understand and reason about sentiment states. By contrast, recent advances in quantum probabilistic neural models have achieved comparable performance to the state-of-the-art, yet with better transparency and increased level of interpretability. However, the existing quantum-inspired models treat quantum states as either a classical mixture or as a separable tensor product across modalities, without triggering their interactions in a way that they are correlated or non-separable (i.e., entangled). This means that the current models have not fully exploited the expressive power of quantum probabilities. To fill this gap, we propose a transparent quantum probabilistic neural model. The model induces different modalities to interact in such a way that they may not be separable, encoding crossmodal information in the form of non-classical correlations. Comprehensive evaluation on two benchmarking datasets for video sentiment analysis shows that the model achieves significant performance improvement. We also show that the degree of non-separability between modalities optimizes the post-hoc interpretability
Binding and unbinding the auditory and visual streams in the McGurk effect
International audienceSubjects presented with coherent auditory and visual streams generally fuse them into a single per- cept. This results in enhanced intelligibility in noise, or in visual modification of the auditory per- cept in the McGurk effect. It is classically considered that processing is done independently in the auditory and visual systems before interaction occurs at a certain representational stage, resulting in an integrated percept. However, some behavioral and neurophysiological data suggest the existence of a two-stage process. A first stage would involve binding together the appropriate pieces of audio and video information before fusion per se in a second stage. Then it should be possible to design experiments leading to unbinding . It is shown here that if a given McGurk stimulus is preceded by an incoherent audiovisual context, the amount of McGurk effect is largely reduced. Various kinds of incoherent contexts (acoustic syllables dubbed on video sentences or phonetic or temporal modi- fications of the acoustic content of a regular sequence of audiovisual syllables) can significantly reduce the McGurk effect even when they are short (less than 4s). The data are interpreted in the framework of a two-stage "binding and fusion" model for audiovisual speech perception
Emotion Quantification Using Variational Quantum State Fidelity Estimation
Sentiment analysis has been instrumental in developing artificial intelligence when applied to various domains. However, most sentiments and emotions are temporal and often exist in a complex manner. Several emotions can be experienced at the same time. Instead of recognizing only categorical information about emotions, there is a need to understand and quantify the intensity of emotions. The proposed research intends to investigate a quantum-inspired approach for quantifying emotional intensities in runtime. The inspiration comes from manifesting human cognition and decision-making capabilities, which may adopt a brief explanation through quantum theory. Quantum state fidelity was used to characterize states and estimate emotion intensities rendered by subjects from the Amsterdam Dynamic Facial Expression Set (ADFES) dataset. The Quantum variational classifier technique was used to perform this experiment on the IBM Quantum Experience platform. The proposed method successfully quantifies the intensities of joy, sadness, contempt, anger, surprise, and fear emotions of labelled subjects from the ADFES dataset
Adaptive multimodal fusion based similarity measures in music information retrieval
Ph.DDOCTOR OF PHILOSOPH
Multimodal sentiment analysis in real-life videos
This thesis extends the emerging field of multimodal sentiment analysis of real-life videos, taking two components into consideration: the emotion and the emotion's target.
The emotion component of media is traditionally represented as a segment-based intensity model of emotion classes. This representation is replaced here by a value- and time-continuous view. Adjacent research fields, such as affective computing, have largely neglected the linguistic information available from automatic transcripts of audio-video material. As is demonstrated here, this text modality is well-suited for time- and value-continuous prediction. Moreover, source-specific problems, such as trustworthiness, have been largely unexplored so far.
This work examines perceived trustworthiness of the source, and its quantification, in user-generated video data and presents a possible modelling path. Furthermore, the transfer between the continuous and discrete emotion representations is explored in order to summarise the emotional context at a segment level.
The other component deals with the target of the emotion, for example, the topic the speaker is addressing. Emotion targets in a video dataset can, as is shown here, be coherently extracted based on automatic transcripts without limiting a priori parameters, such as the expected number of targets. Furthermore, alternatives to purely linguistic investigation in predicting targets, such as knowledge-bases and multimodal systems, are investigated.
A new dataset is designed for this investigation, and, in conjunction with proposed novel deep neural networks, extensive experiments are conducted to explore the components described above.
The developed systems show robust prediction results and demonstrate strengths of the respective modalities, feature sets, and modelling techniques. Finally, foundations are laid for cross-modal information prediction systems with applications to the correction of corrupted in-the-wild signals from real-life videos
- …