656 research outputs found
3D Object Detection Using Scale Invariant and Feature Reweighting Networks
3D object detection plays an important role in a large number of real-world
applications. It requires us to estimate the localizations and the orientations
of 3D objects in real scenes. In this paper, we present a new network
architecture which focuses on utilizing the front view images and frustum point
clouds to generate 3D detection results. On the one hand, a PointSIFT module is
utilized to improve the performance of 3D segmentation. It can capture the
information from different orientations in space and the robustness to
different scale shapes. On the other hand, our network obtains the useful
features and suppresses the features with less information by a SENet module.
This module reweights channel features and estimates the 3D bounding boxes more
effectively. Our method is evaluated on both KITTI dataset for outdoor scenes
and SUN-RGBD dataset for indoor scenes. The experimental results illustrate
that our method achieves better performance than the state-of-the-art methods
especially when point clouds are highly sparse.Comment: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19
Second-order Democratic Aggregation
Aggregated second-order features extracted from deep convolutional networks
have been shown to be effective for texture generation, fine-grained
recognition, material classification, and scene understanding. In this paper,
we study a class of orderless aggregation functions designed to minimize
interference or equalize contributions in the context of second-order features
and we show that they can be computed just as efficiently as their first-order
counterparts and they have favorable properties over aggregation by summation.
Another line of work has shown that matrix power normalization after
aggregation can significantly improve the generalization of second-order
representations. We show that matrix power normalization implicitly equalizes
contributions during aggregation thus establishing a connection between matrix
normalization techniques and prior work on minimizing interference. Based on
the analysis we present {\gamma}-democratic aggregators that interpolate
between sum ({\gamma}=1) and democratic pooling ({\gamma}=0) outperforming both
on several classification tasks. Moreover, unlike power normalization, the
{\gamma}-democratic aggregations can be computed in a low dimensional space by
sketching that allows the use of very high-dimensional second-order features.
This results in a state-of-the-art performance on several datasets
Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks
Deep learning models have achieved excellent recognition results on
large-scale video benchmarks. However, they perform poorly when applied to
videos with rare scenes or objects, primarily due to the bias of existing video
datasets. We tackle this problem from two different angles: algorithm and
dataset. From the perspective of algorithms, we propose Spatial-aware
Multi-Aspect Debiasing (SMAD), which incorporates both explicit debiasing with
multi-aspect adversarial training and implicit debiasing with the spatial
actionness reweighting module, to learn a more generic representation invariant
to non-action aspects. To neutralize the intrinsic dataset bias, we propose
OmniDebias to leverage web data for joint training selectively, which can
achieve higher performance with far fewer web data. To verify the
effectiveness, we establish evaluation protocols and perform extensive
experiments on both re-distributed splits of existing datasets and a new
evaluation dataset focusing on the action with rare scenes. We also show that
the debiased representation can generalize better when transferred to other
datasets and tasks.Comment: ECCVW 202
SBNet: Sparse Blocks Network for Fast Inference
Conventional deep convolutional neural networks (CNNs) apply convolution
operators uniformly in space across all feature maps for hundreds of layers -
this incurs a high computational cost for real-time applications. For many
problems such as object detection and semantic segmentation, we are able to
obtain a low-cost computation mask, either from a priori problem knowledge, or
from a low-resolution segmentation network. We show that such computation masks
can be used to reduce computation in the high-resolution main network. Variants
of sparse activation CNNs have previously been explored on small-scale tasks
and showed no degradation in terms of object classification accuracy, but often
measured gains in terms of theoretical FLOPs without realizing a practical
speed-up when compared to highly optimized dense convolution implementations.
In this work, we leverage the sparsity structure of computation masks and
propose a novel tiling-based sparse convolution algorithm. We verified the
effectiveness of our sparse CNN on LiDAR-based 3D object detection, and we
report significant wall-clock speed-ups compared to dense convolution without
noticeable loss of accuracy.Comment: 10 pages, CVPR 201
- …