In this paper, we develop a generic methodology to encode hierarchical
causality structure among observed variables into a neural network in order to
improve its predictive performance. The proposed methodology, called
causality-informed neural network (CINN), leverages three coherent steps to
systematically map the structural causal knowledge into the layer-to-layer
design of neural network while strictly preserving the orientation of every
causal relationship. In the first step, CINN discovers causal relationships
from observational data via directed acyclic graph (DAG) learning, where causal
discovery is recast as a continuous optimization problem to avoid the
combinatorial nature. In the second step, the discovered hierarchical causality
structure among observed variables is systematically encoded into neural
network through a dedicated architecture and customized loss function. By
categorizing variables in the causal DAG as root, intermediate, and leaf nodes,
the hierarchical causal DAG is translated into CINN with a one-to-one
correspondence between nodes in the causal DAG and units in the CINN while
maintaining the relative order among these nodes. Regarding the loss function,
both intermediate and leaf nodes in the DAG graph are treated as target outputs
during CINN training so as to drive co-learning of causal relationships among
different types of nodes. As multiple loss components emerge in CINN, we
leverage the projection of conflicting gradients to mitigate gradient
interference among the multiple learning tasks. Computational experiments
across a broad spectrum of UCI data sets demonstrate substantial advantages of
CINN in predictive performance over other state-of-the-art methods. In
addition, an ablation study underscores the value of integrating structural and
quantitative causal knowledge in enhancing the neural network's predictive
performance incrementally