Currently, object detection applications in construction are almost based on
pure 2D data (both image and annotation are 2D-based), resulting in the
developed artificial intelligence (AI) applications only applicable to some
scenarios that only require 2D information. However, most advanced applications
usually require AI agents to perceive 3D spatial information, which limits the
further development of the current computer vision (CV) in construction. The
lack of 3D annotated datasets for construction object detection worsens the
situation. Therefore, this study creates and releases a virtual dataset with 3D
annotations named VCVW-3D, which covers 15 construction scenes and involves ten
categories of construction vehicles and workers. The VCVW-3D dataset is
characterized by multi-scene, multi-category, multi-randomness,
multi-viewpoint, multi-annotation, and binocular vision. Several typical 2D and
monocular 3D object detection models are then trained and evaluated on the
VCVW-3D dataset to provide a benchmark for subsequent research. The VCVW-3D is
expected to bring considerable economic benefits and practical significance by
reducing the costs of data construction, prototype development, and exploration
of space-awareness applications, thus promoting the development of CV in
construction, especially those of 3D applications