6,126 research outputs found
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
Multimodal representation learning is gaining more and more interest within
the deep learning community. While bilinear models provide an interesting
framework to find subtle combination of modalities, their number of parameters
grows quadratically with the input dimensions, making their practical
implementation within classical deep learning pipelines challenging. In this
paper, we introduce BLOCK, a new multimodal fusion based on the
block-superdiagonal tensor decomposition. It leverages the notion of block-term
ranks, which generalizes both concepts of rank and mode ranks for tensors,
already used for multimodal fusion. It allows to define new ways for optimizing
the tradeoff between the expressiveness and complexity of the fusion model, and
is able to represent very fine interactions between modalities while
maintaining powerful mono-modal representations. We demonstrate the practical
interest of our fusion model by using BLOCK for two challenging tasks: Visual
Question Answering (VQA) and Visual Relationship Detection (VRD), where we
design end-to-end learnable architectures for representing relevant
interactions between modalities. Through extensive experiments, we show that
BLOCK compares favorably with respect to state-of-the-art multimodal fusion
models for both VQA and VRD tasks. Our code is available at
https://github.com/Cadene/block.bootstrap.pytorch
Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach
The increasing availability of temporal network data is calling for more
research on extracting and characterizing mesoscopic structures in temporal
networks and on relating such structure to specific functions or properties of
the system. An outstanding challenge is the extension of the results achieved
for static networks to time-varying networks, where the topological structure
of the system and the temporal activity patterns of its components are
intertwined. Here we investigate the use of a latent factor decomposition
technique, non-negative tensor factorization, to extract the community-activity
structure of temporal networks. The method is intrinsically temporal and allows
to simultaneously identify communities and to track their activity over time.
We represent the time-varying adjacency matrix of a temporal network as a
three-way tensor and approximate this tensor as a sum of terms that can be
interpreted as communities of nodes with an associated activity time series. We
summarize known computational techniques for tensor decomposition and discuss
some quality metrics that can be used to tune the complexity of the factorized
representation. We subsequently apply tensor factorization to a temporal
network for which a ground truth is available for both the community structure
and the temporal activity patterns. The data we use describe the social
interactions of students in a school, the associations between students and
school classes, and the spatio-temporal trajectories of students over time. We
show that non-negative tensor factorization is capable of recovering the class
structure with high accuracy. In particular, the extracted tensor components
can be validated either as known school classes, or in terms of correlated
activity patterns, i.e., of spatial and temporal coincidences that are
determined by the known school activity schedule
Hyperspectral Image Restoration via Total Variation Regularized Low-rank Tensor Decomposition
Hyperspectral images (HSIs) are often corrupted by a mixture of several types
of noise during the acquisition process, e.g., Gaussian noise, impulse noise,
dead lines, stripes, and many others. Such complex noise could degrade the
quality of the acquired HSIs, limiting the precision of the subsequent
processing. In this paper, we present a novel tensor-based HSI restoration
approach by fully identifying the intrinsic structures of the clean HSI part
and the mixed noise part respectively. Specifically, for the clean HSI part, we
use tensor Tucker decomposition to describe the global correlation among all
bands, and an anisotropic spatial-spectral total variation (SSTV)
regularization to characterize the piecewise smooth structure in both spatial
and spectral domains. For the mixed noise part, we adopt the norm
regularization to detect the sparse noise, including stripes, impulse noise,
and dead pixels. Despite that TV regulariztion has the ability of removing
Gaussian noise, the Frobenius norm term is further used to model heavy Gaussian
noise for some real-world scenarios. Then, we develop an efficient algorithm
for solving the resulting optimization problem by using the augmented Lagrange
multiplier (ALM) method. Finally, extensive experiments on simulated and
real-world noise HSIs are carried out to demonstrate the superiority of the
proposed method over the existing state-of-the-art ones.Comment: 15 pages, 20 figure
Group Membership Prediction
The group membership prediction (GMP) problem involves predicting whether or
not a collection of instances share a certain semantic property. For instance,
in kinship verification given a collection of images, the goal is to predict
whether or not they share a {\it familial} relationship. In this context we
propose a novel probability model and introduce latent {\em view-specific} and
{\em view-shared} random variables to jointly account for the view-specific
appearance and cross-view similarities among data instances. Our model posits
that data from each view is independent conditioned on the shared variables.
This postulate leads to a parametric probability model that decomposes group
membership likelihood into a tensor product of data-independent parameters and
data-dependent factors. We propose learning the data-independent parameters in
a discriminative way with bilinear classifiers, and test our prediction
algorithm on challenging visual recognition tasks such as multi-camera person
re-identification and kinship verification. On most benchmark datasets, our
method can significantly outperform the current state-of-the-art.Comment: accepted for ICCV 201
Detection of Review Abuse via Semi-Supervised Binary Multi-Target Tensor Decomposition
Product reviews and ratings on e-commerce websites provide customers with
detailed insights about various aspects of the product such as quality,
usefulness, etc. Since they influence customers' buying decisions, product
reviews have become a fertile ground for abuse by sellers (colluding with
reviewers) to promote their own products or to tarnish the reputation of
competitor's products. In this paper, our focus is on detecting such abusive
entities (both sellers and reviewers) by applying tensor decomposition on the
product reviews data. While tensor decomposition is mostly unsupervised, we
formulate our problem as a semi-supervised binary multi-target tensor
decomposition, to take advantage of currently known abusive entities. We
empirically show that our multi-target semi-supervised model achieves higher
precision and recall in detecting abusive entities as compared to unsupervised
techniques. Finally, we show that our proposed stochastic partial natural
gradient inference for our model empirically achieves faster convergence than
stochastic gradient and Online-EM with sufficient statistics.Comment: Accepted to the 25th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining, 2019. Contains supplementary material. arXiv admin note: text
overlap with arXiv:1804.0383
- …