2,129 research outputs found
General Debiasing for Multimodal Sentiment Analysis
Existing work on Multimodal Sentiment Analysis (MSA) utilizes multimodal
information for prediction yet unavoidably suffers from fitting the spurious
correlations between multimodal features and sentiment labels. For example, if
most videos with a blue background have positive labels in a dataset, the model
will rely on such correlations for prediction, while ``blue background'' is not
a sentiment-related feature. To address this problem, we define a general
debiasing MSA task, which aims to enhance the Out-Of-Distribution (OOD)
generalization ability of MSA models by reducing their reliance on spurious
correlations. To this end, we propose a general debiasing framework based on
Inverse Probability Weighting (IPW), which adaptively assigns small weights to
the samples with larger bias i.e., the severer spurious correlations). The key
to this debiasing framework is to estimate the bias of each sample, which is
achieved by two steps: 1) disentangling the robust features and biased features
in each modality, and 2) utilizing the biased features to estimate the bias.
Finally, we employ IPW to reduce the effects of large-biased samples,
facilitating robust feature learning for sentiment prediction. To examine the
model's generalization ability, we keep the original testing sets on two
benchmarks and additionally construct multiple unimodal and multimodal OOD
testing sets. The empirical results demonstrate the superior generalization
ability of our proposed framework. We have released the code and data to
facilitate the reproduction
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Unveiling the frontiers of deep learning: innovations shaping diverse domains
Deep learning (DL) enables the development of computer models that are
capable of learning, visualizing, optimizing, refining, and predicting data. In
recent years, DL has been applied in a range of fields, including audio-visual
data processing, agriculture, transportation prediction, natural language,
biomedicine, disaster management, bioinformatics, drug design, genomics, face
recognition, and ecology. To explore the current state of deep learning, it is
necessary to investigate the latest developments and applications of deep
learning in these disciplines. However, the literature is lacking in exploring
the applications of deep learning in all potential sectors. This paper thus
extensively investigates the potential applications of deep learning across all
major fields of study as well as the associated benefits and challenges. As
evidenced in the literature, DL exhibits accuracy in prediction and analysis,
makes it a powerful computational tool, and has the ability to articulate
itself and optimize, making it effective in processing data with no prior
training. Given its independence from training data, deep learning necessitates
massive amounts of data for effective analysis and processing, much like data
volume. To handle the challenge of compiling huge amounts of medical,
scientific, healthcare, and environmental data for use in deep learning, gated
architectures like LSTMs and GRUs can be utilized. For multimodal learning,
shared neurons in the neural network for all activities and specialized neurons
for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table
Six papers on computational methods for the analysis of structured and unstructured data in the economic domain
This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common
goal of promoting the stability of the financial system: systemic risk and bank supervision.
The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor
bank news and deep learning for text classification. New applications and variants of these
models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in
Italian, based on deep learning, to simplify future researches relying on sentiment analysis.
The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted
for inspection and descriptive tasks while deep learning has been applied more for predictive
(classification) problems. Overall, the integration of textual (unstructured) and numerical
(structured) information has proven useful for systemic risk and bank supervision related
analysis. The integration of textual data with numerical data in fact, has brought either to
higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events.This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common
goal of promoting the stability of the financial system: systemic risk and bank supervision.
The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor
bank news and deep learning for text classification. New applications and variants of these
models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in
Italian, based on deep learning, to simplify future researches relying on sentiment analysis.
The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted
for inspection and descriptive tasks while deep learning has been applied more for predictive
(classification) problems. Overall, the integration of textual (unstructured) and numerical
(structured) information has proven useful for systemic risk and bank supervision related
analysis. The integration of textual data with numerical data in fact, has brought either to
higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events
Multimodal sentiment analysis in real-life videos
This thesis extends the emerging field of multimodal sentiment analysis of real-life videos, taking two components into consideration: the emotion and the emotion's target.
The emotion component of media is traditionally represented as a segment-based intensity model of emotion classes. This representation is replaced here by a value- and time-continuous view. Adjacent research fields, such as affective computing, have largely neglected the linguistic information available from automatic transcripts of audio-video material. As is demonstrated here, this text modality is well-suited for time- and value-continuous prediction. Moreover, source-specific problems, such as trustworthiness, have been largely unexplored so far.
This work examines perceived trustworthiness of the source, and its quantification, in user-generated video data and presents a possible modelling path. Furthermore, the transfer between the continuous and discrete emotion representations is explored in order to summarise the emotional context at a segment level.
The other component deals with the target of the emotion, for example, the topic the speaker is addressing. Emotion targets in a video dataset can, as is shown here, be coherently extracted based on automatic transcripts without limiting a priori parameters, such as the expected number of targets. Furthermore, alternatives to purely linguistic investigation in predicting targets, such as knowledge-bases and multimodal systems, are investigated.
A new dataset is designed for this investigation, and, in conjunction with proposed novel deep neural networks, extensive experiments are conducted to explore the components described above.
The developed systems show robust prediction results and demonstrate strengths of the respective modalities, feature sets, and modelling techniques. Finally, foundations are laid for cross-modal information prediction systems with applications to the correction of corrupted in-the-wild signals from real-life videos
- …