1 research outputs found
Deep Co-Attention Network for Multi-View Subspace Learning
Many real-world applications involve data from multiple modalities and thus
exhibit the view heterogeneity. For example, user modeling on social media
might leverage both the topology of the underlying social network and the
content of the users' posts; in the medical domain, multiple views could be
X-ray images taken at different poses. To date, various techniques have been
proposed to achieve promising results, such as canonical correlation analysis
based methods, etc. In the meanwhile, it is critical for decision-makers to be
able to understand the prediction results from these methods. For example,
given the diagnostic result that a model provided based on the X-ray images of
a patient at different poses, the doctor needs to know why the model made such
a prediction. However, state-of-the-art techniques usually suffer from the
inability to utilize the complementary information of each view and to explain
the predictions in an interpretable manner.
To address these issues, in this paper, we propose a deep co-attention
network for multi-view subspace learning, which aims to extract both the common
information and the complementary information in an adversarial setting and
provide robust interpretations behind the prediction to the end-users via the
co-attention mechanism. In particular, it uses a novel cross reconstruction
loss and leverages the label information to guide the construction of the
latent representation by incorporating the classifier into our model. This
improves the quality of latent representation and accelerates the convergence
speed. Finally, we develop an efficient iterative algorithm to find the optimal
encoders and discriminator, which are evaluated extensively on synthetic and
real-world data sets. We also conduct a case study to demonstrate how the
proposed method robustly interprets the predictions on an image data set.Comment: Accepted by WWW2