404 research outputs found
Hypercolumns for Object Segmentation and Fine-grained Localization
Recognition algorithms based on convolutional networks (CNNs) typically use
the output of the last layer as feature representation. However, the
information in this layer may be too coarse to allow precise localization. On
the contrary, earlier layers may be precise in localization but will not
capture semantics. To get the best of both worlds, we define the hypercolumn at
a pixel as the vector of activations of all CNN units above that pixel. Using
hypercolumns as pixel descriptors, we show results on three fine-grained
localization tasks: simultaneous detection and segmentation[22], where we
improve state-of-the-art from 49.7[22] mean AP^r to 60.0, keypoint
localization, where we get a 3.3 point boost over[20] and part labeling, where
we show a 6.6 point gain over a strong baseline.Comment: CVPR Camera read
PARTICLE: Part Discovery and Contrastive Learning for Fine-grained Recognition
We develop techniques for refining representations for fine-grained
classification and segmentation tasks in a self-supervised manner. We find that
fine-tuning methods based on instance-discriminative contrastive learning are
not as effective, and posit that recognizing part-specific variations is
crucial for fine-grained categorization. We present an iterative learning
approach that incorporates part-centric equivariance and invariance objectives.
First, pixel representations are clustered to discover parts. We analyze the
representations from convolutional and vision transformer networks that are
best suited for this task. Then, a part-centric learning step aggregates and
contrasts representations of parts within an image. We show that this improves
the performance on image classification and part segmentation tasks across
datasets. For example, under a linear-evaluation scheme, the classification
accuracy of a ResNet50 trained on ImageNet using DetCon, a self-supervised
learning approach, improves from 35.4% to 42.0% on the Caltech-UCSD Birds, from
35.5% to 44.1% on the FGVC Aircraft, and from 29.7% to 37.4% on the Stanford
Cars. We also observe significant gains in few-shot part segmentation tasks
using the proposed technique, while instance-discriminative learning was not as
effective. Smaller, yet consistent, improvements are also observed for stronger
networks based on transformers
Superpixel-based Semantic Segmentation Trained by Statistical Process Control
Semantic segmentation, like other fields of computer vision, has seen a
remarkable performance advance by the use of deep convolution neural networks.
However, considering that neighboring pixels are heavily dependent on each
other, both learning and testing of these methods have a lot of redundant
operations. To resolve this problem, the proposed network is trained and tested
with only 0.37% of total pixels by superpixel-based sampling and largely
reduced the complexity of upsampling calculation. The hypercolumn feature maps
are constructed by pyramid module in combination with the convolution layers of
the base network. Since the proposed method uses a very small number of sampled
pixels, the end-to-end learning of the entire network is difficult with a
common learning rate for all the layers. In order to resolve this problem, the
learning rate after sampling is controlled by statistical process control (SPC)
of gradients in each layer. The proposed method performs better than or equal
to the conventional methods that use much more samples on Pascal Context,
SUN-RGBD dataset.Comment: Accepted in British Machine Vision Conference (BMVC), 201
- …