Large language models (LLMs) based on the generative pre-training transformer
(GPT) have demonstrated remarkable effectiveness across a diverse range of
downstream tasks. Inspired by the advancements of the GPT, we present PointGPT,
a novel approach that extends the concept of GPT to point clouds, addressing
the challenges associated with disorder properties, low information density,
and task gaps. Specifically, a point cloud auto-regressive generation task is
proposed to pre-train transformer models. Our method partitions the input point
cloud into multiple point patches and arranges them in an ordered sequence
based on their spatial proximity. Then, an extractor-generator based
transformer decoder, with a dual masking strategy, learns latent
representations conditioned on the preceding point patches, aiming to predict
the next one in an auto-regressive manner. Our scalable approach allows for
learning high-capacity models that generalize well, achieving state-of-the-art
performance on various downstream tasks. In particular, our approach achieves
classification accuracies of 94.9% on the ModelNet40 dataset and 93.4% on the
ScanObjectNN dataset, outperforming all other transformer models. Furthermore,
our method also attains new state-of-the-art accuracies on all four few-shot
learning benchmarks.Comment: 9 pages, 2 figure