6,790 research outputs found
Analyzing Modular CNN Architectures for Joint Depth Prediction and Semantic Segmentation
This paper addresses the task of designing a modular neural network
architecture that jointly solves different tasks. As an example we use the
tasks of depth estimation and semantic segmentation given a single RGB image.
The main focus of this work is to analyze the cross-modality influence between
depth and semantic prediction maps on their joint refinement. While most
previous works solely focus on measuring improvements in accuracy, we propose a
way to quantify the cross-modality influence. We show that there is a
relationship between final accuracy and cross-modality influence, although not
a simple linear one. Hence a larger cross-modality influence does not
necessarily translate into an improved accuracy. We find that a beneficial
balance between the cross-modality influences can be achieved by network
architecture and conjecture that this relationship can be utilized to
understand different network design choices. Towards this end we propose a
Convolutional Neural Network (CNN) architecture that fuses the state of the
state-of-the-art results for depth estimation and semantic labeling. By
balancing the cross-modality influences between depth and semantic prediction,
we achieve improved results for both tasks using the NYU-Depth v2 benchmark.Comment: Accepted to ICRA 201
End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss
Cross-modality retrieval encompasses retrieval tasks where the fetched items
are of a different type than the search query, e.g., retrieving pictures
relevant to a given text query. The state-of-the-art approach to cross-modality
retrieval relies on learning a joint embedding space of the two modalities,
where items from either modality are retrieved using nearest-neighbor search.
In this work, we introduce a neural network layer based on Canonical
Correlation Analysis (CCA) that learns better embedding spaces by analytically
computing projections that maximize correlation. In contrast to previous
approaches, the CCA Layer (CCAL) allows us to combine existing objectives for
embedding space learning, such as pairwise ranking losses, with the optimal
projections of CCA. We show the effectiveness of our approach for
cross-modality retrieval on three different scenarios (text-to-image,
audio-sheet-music and zero-shot retrieval), surpassing both Deep CCA and a
multi-view network using freely learned projections optimized by a pairwise
ranking loss, especially when little training data is available (the code for
all three methods is released at: https://github.com/CPJKU/cca_layer).Comment: Preliminary version of a paper published in the International Journal
of Multimedia Information Retrieva
FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization
Cross-modality point cloud registration is confronted with significant
challenges due to inherent differences in modalities between different sensors.
We propose a cross-modality point cloud registration framework FF-LOGO: a
cross-modality point cloud registration method with feature filtering and
local-global optimization. The cross-modality feature correlation filtering
module extracts geometric transformation-invariant features from cross-modality
point clouds and achieves point selection by feature matching. We also
introduce a cross-modality optimization process, including a local adaptive key
region aggregation module and a global modality consistency fusion optimization
module. Experimental results demonstrate that our two-stage optimization
significantly improves the registration accuracy of the feature association and
selection module. Our method achieves a substantial increase in recall rate
compared to the current state-of-the-art methods on the 3DCSR dataset,
improving from 40.59% to 75.74%. Our code will be available at
https://github.com/wangmohan17/FFLOGO.Comment: 7 pages, 2 figure
- …