Transformer based semantic segmentation in large-scale urban point clouds focusing on rare classes

Zhang, Xinlong

Search results>Research output from OPUS Online Publikationen der Universität Stuttgart

thesis

oai:elib.uni-stuttgart.de:11682/18136

Transformer based semantic segmentation in large-scale urban point clouds focusing on rare classes

Authors: Xinlong Zhang
Publication date: 1 January 2026
Publisher
Doi

Abstract

Automated semantic interpretation of three-dimensional scenes has increasingly drawn attention in the fields of autonomous driving, building information modeling, and robotics, where reliable scene understanding plays a fundamental role in enabling intelligent perception and decision-making. Point clouds, obtained through laser scanning or stereo image matching technologies, have been widely acknowledged as the most reliable data source for representing spatial environments, owing to their ability to provide accurate and detailed 3D information. With the advances of lightweight laser scanning devices and flexible acquisition platforms, the availability of massive and high-resolution 3D point clouds has significantly increased. This unprecedented scale and richness of data, however, also poses a strong demand for effective methods that are capable of comprehensively interpreting large-scale and fine-grained point cloud scenes. In particular, the diversity of object categories, the imbalanced class distribution, and the intricate contextual relationships in large-scale urban areas bring great challenges to 3D semantic segmentation. This research aims to address these challenges by developing novel methods and techniques that specifically focus on the 3D semantic segmentation of rare classes in large-scale urban point clouds. Hereby, the work provides contributions on four major aspects: (i) the hierarchical attentional framework tailored for large-scale point cloud segmentation, (ii) the embedding method of local surface features for fine-grained segmentation, (iii) the contextual joint augmentation strategy to strengthen the representation of rare classes, and (iv) the target-aware learning designed to enhance the segmentation of rare classes. To achieve effective 3D semantic segmentation of urban scenes, we proposed a novel framework that integrates the attention mechanism focusing on crucial regions within large-scale point clouds. To further improve the perception of local geometric structures, the framework incorporates the embedding method of surface features into the segmentation pipeline, thereby facilitating the robust representation of fine-grained geometry inherent in complex environments. Addressing the issue of imbalanced class distribution, particularly concerning rare classes, we designed a contextguided data augmentation strategy that selectively augments underrepresented classes under the constraint of their semantic and spatial contextual relationships. Moreover, focusing on unique geometric cues that characterize rare objects, we further developed a target-aware network that adaptively modulates attention to rare categories, thereby maintaining its high sensitivity to these underrepresented classes. The proposed methods were evaluated through experiments on different large-scale benchmark datasets collected from multiple platforms, i.e., airborne, mobile and terrestrial platforms. In our submission to the official Hessigheim 3D benchmark ranking, our framework achieved a mean F1-score of 83.8% and an overall accuracy of 90.5%. In particular, the F1-scores of rare classes, namely vehicles and chimneys, significantly exceeded the average performance of other published methods, with improvements of 32.0% and 32.5%, respectively. Additionally, comprehensive experimental analyses on three other open benchmark datasets, including the Paris-Lille-3D dataset, the Semantic3D dataset and the WHU-Urban3D dataset, further demonstrate the robustness and effectiveness of our rare class segmentation methods

Similar works

Full text

OPUS Online Publikationen der Universität Stuttgart

oai:elib.uni-stuttgart.de:1168...

Last time updated on 12/05/2026

This paper was published in OPUS Online Publikationen der Universität Stuttgart.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: info:eu-repo/semantics/openAccess