Deep Generative Models (DGMs) are known for their superior capability in
generating realistic data. Extending purely data-driven approaches, recent
specialized DGMs may satisfy additional controllable requirements such as
embedding a traffic sign in a driving scene, by manipulating patterns
\textit{implicitly} in the neuron or feature level. In this paper, we introduce
a novel method to incorporate domain knowledge \textit{explicitly} in the
generation process to achieve semantically controllable scene generation. We
categorize our knowledge into two types to be consistent with the composition
of natural scenes, where the first type represents the property of objects and
the second type represents the relationship among objects. We then propose a
tree-structured generative model to learn complex scene representation, whose
nodes and edges are naturally corresponding to the two types of knowledge
respectively. Knowledge can be explicitly integrated to enable semantically
controllable scene generation by imposing semantic rules on properties of nodes
and edges in the tree structure. We construct a synthetic example to illustrate
the controllability and explainability of our method in a clean setting. We
further extend the synthetic example to realistic autonomous vehicle driving
environments and conduct extensive experiments to show that our method
efficiently identifies adversarial traffic scenes against different
state-of-the-art 3D point cloud segmentation models satisfying the traffic
rules specified as the explicit knowledge.Comment: 14 pages, 6 figures. Under revie