68 research outputs found
GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning Benchmarks
Label errors have been found to be prevalent in popular text, vision, and
audio datasets, which heavily influence the safe development and evaluation of
machine learning algorithms. Despite increasing efforts towards improving the
quality of generic data types, such as images and texts, the problem of
mislabel detection in graph data remains underexplored. To bridge the gap, we
explore mislabelling issues in popular real-world graph datasets and propose
GraphCleaner, a post-hoc method to detect and correct these mislabelled nodes
in graph datasets. GraphCleaner combines the novel ideas of 1) Synthetic
Mislabel Dataset Generation, which seeks to generate realistic mislabels; and
2) Neighborhood-Aware Mislabel Detection, where neighborhood dependency is
exploited in both labels and base classifier predictions. Empirical evaluations
on 6 datasets and 6 experimental settings demonstrate that GraphCleaner
outperforms the closest baseline, with an average improvement of 0.14 in F1
score, and 0.16 in MCC. On real-data case studies, GraphCleaner detects real
and previously unknown mislabels in popular graph benchmarks: PubMed, Cora,
CiteSeer and OGB-arxiv; we find that at least 6.91% of PubMed data is
mislabelled or ambiguous, and simply removing these mislabelled data can boost
evaluation performance from 86.71% to 89.11%.Comment: ICML 202
Deformable Convolutional Networks
Convolutional neural networks (CNNs) are inherently limited to model
geometric transformations due to the fixed geometric structures in its building
modules. In this work, we introduce two new modules to enhance the
transformation modeling capacity of CNNs, namely, deformable convolution and
deformable RoI pooling. Both are based on the idea of augmenting the spatial
sampling locations in the modules with additional offsets and learning the
offsets from target tasks, without additional supervision. The new modules can
readily replace their plain counterparts in existing CNNs and can be easily
trained end-to-end by standard back-propagation, giving rise to deformable
convolutional networks. Extensive experiments validate the effectiveness of our
approach on sophisticated vision tasks of object detection and semantic
segmentation. The code would be released
UltraLiDAR: Learning Compact Representations for LiDAR Completion and Generation
LiDAR provides accurate geometric measurements of the 3D world.
Unfortunately, dense LiDARs are very expensive and the point clouds captured by
low-beam LiDAR are often sparse. To address these issues, we present
UltraLiDAR, a data-driven framework for scene-level LiDAR completion, LiDAR
generation, and LiDAR manipulation. The crux of UltraLiDAR is a compact,
discrete representation that encodes the point cloud's geometric structure, is
robust to noise, and is easy to manipulate. We show that by aligning the
representation of a sparse point cloud to that of a dense point cloud, we can
densify the sparse point clouds as if they were captured by a real high-density
LiDAR, drastically reducing the cost. Furthermore, by learning a prior over the
discrete codebook, we can generate diverse, realistic LiDAR point clouds for
self-driving. We evaluate the effectiveness of UltraLiDAR on sparse-to-dense
LiDAR completion and LiDAR generation. Experiments show that densifying
real-world point clouds with our approach can significantly improve the
performance of downstream perception systems. Compared to prior art on LiDAR
generation, our approach generates much more realistic point clouds. According
to A/B test, over 98.5\% of the time human participants prefer our results over
those of previous methods.Comment: CVPR 2023. Project page: https://waabi.ai/ultralidar
- …