42 research outputs found
Understanding the Overfitting of the Episodic Meta-training
Despite the success of two-stage few-shot classification methods, in the
episodic meta-training stage, the model suffers severe overfitting. We
hypothesize that it is caused by over-discrimination, i.e., the model learns to
over-rely on the superficial features that fit for base class discrimination
while suppressing the novel class generalization. To penalize
over-discrimination, we introduce knowledge distillation techniques to keep
novel generalization knowledge from the teacher model during training.
Specifically, we select the teacher model as the one with the best validation
accuracy during meta-training and restrict the symmetric Kullback-Leibler (SKL)
divergence between the output distribution of the linear classifier of the
teacher model and that of the student model. This simple approach outperforms
the standard meta-training process. We further propose the Nearest Neighbor
Symmetric Kullback-Leibler (NNSKL) divergence for meta-training to push the
limits of knowledge distillation techniques. NNSKL takes few-shot tasks as
input and penalizes the output of the nearest neighbor classifier, which
possesses an impact on the relationships between query embedding and support
centers. By combining SKL and NNSKL in meta-training, the model achieves even
better performance and surpasses state-of-the-art results on several
benchmarks
Voxel or Pillar: Exploring Efficient Point Cloud Representation for 3D Object Detection
Efficient representation of point clouds is fundamental for LiDAR-based 3D
object detection. While recent grid-based detectors often encode point clouds
into either voxels or pillars, the distinctions between these approaches remain
underexplored. In this paper, we quantify the differences between the current
encoding paradigms and highlight the limited vertical learning within. To
tackle these limitations, we introduce a hybrid Voxel-Pillar Fusion network
(VPF), which synergistically combines the unique strengths of both voxels and
pillars. Specifically, we first develop a sparse voxel-pillar encoder that
encodes point clouds into voxel and pillar features through 3D and 2D sparse
convolutions respectively, and then introduce the Sparse Fusion Layer (SFL),
facilitating bidirectional interaction between sparse voxel and pillar
features. Our efficient, fully sparse method can be seamlessly integrated into
both dense and sparse detectors. Leveraging this powerful yet straightforward
framework, VPF delivers competitive performance, achieving real-time inference
speeds on the nuScenes and Waymo Open Dataset. The code will be available.Comment: Accepted by AAAI-202