Deep neural networks typically outperform more traditional machine learning
models in their ability to classify complex data, and yet is not clear how the
individual hidden layers of a deep network contribute to the overall
classification performance. We thus introduce a Generalized Discrimination
Value (GDV) that measures, in a non-invasive manner, how well different data
classes separate in each given network layer. The GDV can be used for the
automatic tuning of hyper-parameters, such as the width profile and the total
depth of a network. Moreover, the layer-dependent GDV(L) provides new insights
into the data transformations that self-organize during training: In the case
of multi-layer perceptrons trained with error backpropagation, we find that
classification of highly complex data sets requires a temporal {\em reduction}
of class separability, marked by a characteristic 'energy barrier' in the
initial part of the GDV(L) curve. Even more surprisingly, for a given data set,
the GDV(L) is running through a fixed 'master curve', independently from the
total number of network layers. Furthermore, applying the GDV to Deep Belief
Networks reveals that also unsupervised training with the Contrastive
Divergence method can systematically increase class separability over tens of
layers, even though the system does not 'know' the desired class labels. These
results indicate that the GDV may become a useful tool to open the black box of
deep learning