Quantifying predictive uncertainty of deep semantic segmentation networks is
essential in safety-critical tasks. In applications like autonomous driving,
where video data is available, convolutional long short-term memory networks
are capable of not only providing semantic segmentations but also predicting
the segmentations of the next timesteps. These models use cell states to
broadcast information from previous data by taking a time series of inputs to
predict one or even further steps into the future. We present a temporal
postprocessing method which estimates the prediction performance of
convolutional long short-term memory networks by either predicting the
intersection over union of predicted and ground truth segments or classifying
between intersection over union being equal to zero or greater than zero. To
this end, we create temporal cell state-based input metrics per segment and
investigate different models for the estimation of the predictive quality based
on these metrics. We further study the influence of the number of considered
cell states for the proposed metrics.Comment: 14 pages, 4 figures, this work is related to arXiv:1811.00648 and
arXiv:1911.0507