The estimation of depth in two-dimensional images has long been a challenging
and extensively studied subject in computer vision. Recently, significant
progress has been made with the emergence of Deep Learning-based approaches,
which have proven highly successful. This paper focuses on the explainability
in monocular depth estimation methods, in terms of how humans perceive depth.
This preliminary study emphasizes on one of the most significant visual cues,
the relative size, which is prominent in almost all viewed images. We designed
a specific experiment to mimic the experiments in humans and have tested
state-of-the-art methods to indirectly assess the explainability in the context
defined. In addition, we observed that measuring the accuracy required further
attention and a particular approach is proposed to this end. The results show
that a mean accuracy of around 77% across methods is achieved, with some of the
methods performing markedly better, thus, indirectly revealing their
corresponding potential to uncover monocular depth cues, like relative size