127,431 research outputs found
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
Multimodal representation learning is gaining more and more interest within
the deep learning community. While bilinear models provide an interesting
framework to find subtle combination of modalities, their number of parameters
grows quadratically with the input dimensions, making their practical
implementation within classical deep learning pipelines challenging. In this
paper, we introduce BLOCK, a new multimodal fusion based on the
block-superdiagonal tensor decomposition. It leverages the notion of block-term
ranks, which generalizes both concepts of rank and mode ranks for tensors,
already used for multimodal fusion. It allows to define new ways for optimizing
the tradeoff between the expressiveness and complexity of the fusion model, and
is able to represent very fine interactions between modalities while
maintaining powerful mono-modal representations. We demonstrate the practical
interest of our fusion model by using BLOCK for two challenging tasks: Visual
Question Answering (VQA) and Visual Relationship Detection (VRD), where we
design end-to-end learnable architectures for representing relevant
interactions between modalities. Through extensive experiments, we show that
BLOCK compares favorably with respect to state-of-the-art multimodal fusion
models for both VQA and VRD tasks. Our code is available at
https://github.com/Cadene/block.bootstrap.pytorch
Instance Flow Based Online Multiple Object Tracking
We present a method to perform online Multiple Object Tracking (MOT) of known
object categories in monocular video data. Current Tracking-by-Detection MOT
approaches build on top of 2D bounding box detections. In contrast, we exploit
state-of-the-art instance aware semantic segmentation techniques to compute 2D
shape representations of target objects in each frame. We predict position and
shape of segmented instances in subsequent frames by exploiting optical flow
cues. We define an affinity matrix between instances of subsequent frames which
reflects locality and visual similarity. The instance association is solved by
applying the Hungarian method. We evaluate different configurations of our
algorithm using the MOT 2D 2015 train dataset. The evaluation shows that our
tracking approach is able to track objects with high relative motions. In
addition, we provide results of our approach on the MOT 2D 2015 test set for
comparison with previous works. We achieve a MOTA score of 32.1
Blending Learning and Inference in Structured Prediction
In this paper we derive an efficient algorithm to learn the parameters of
structured predictors in general graphical models. This algorithm blends the
learning and inference tasks, which results in a significant speedup over
traditional approaches, such as conditional random fields and structured
support vector machines. For this purpose we utilize the structures of the
predictors to describe a low dimensional structured prediction task which
encourages local consistencies within the different structures while learning
the parameters of the model. Convexity of the learning task provides the means
to enforce the consistencies between the different parts. The
inference-learning blending algorithm that we propose is guaranteed to converge
to the optimum of the low dimensional primal and dual programs. Unlike many of
the existing approaches, the inference-learning blending allows us to learn
efficiently high-order graphical models, over regions of any size, and very
large number of parameters. We demonstrate the effectiveness of our approach,
while presenting state-of-the-art results in stereo estimation, semantic
segmentation, shape reconstruction, and indoor scene understanding
Dynamic Models and Nonlinear Filtering of Wave Propagation in Random Fields
In this paper, a general model of wireless channels is established based on
the physics of wave propagation. Then the problems of inverse scattering and
channel prediction are formulated as nonlinear filtering problems. The
solutions to the nonlinear filtering problems are given in the form of dynamic
evolution equations of the estimated quantities. Finally, examples are provided
to illustrate the practical applications of the proposed theory.Comment: 12 pages, 1 figur
A Generic Framework for Tracking Using Particle Filter With Dynamic Shape Prior
©2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or distribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.DOI: 10.1109/TIP.2007.894244Tracking deforming objects involves estimating the global motion of the object and its local deformations as functions of time. Tracking algorithms using Kalman filters or particle filters (PFs) have been proposed for tracking such objects, but these have limitations due to the lack of dynamic shape information. In this paper, we propose a novel method based on employing a locally linear embedding in order to incorporate dynamic shape information into the particle filtering framework for tracking highly deformable objects in the presence of noise and clutter. The PF also models image statistics such as mean and variance of the given data which can be useful in obtaining proper separation of object and backgroun
- …