We propose an encoder-decoder framework for the segmentation of blood vessels
in retinal images that relies on the extraction of large-scale patches at
multiple image-scales during training. Experiments on three fundus image
datasets demonstrate that this approach achieves state-of-the-art results and
can be implemented using a simple and efficient fully-convolutional network
with a parameter count of less than 0.8M. Furthermore, we show that this
framework - called VLight - avoids overfitting to specific training images and
generalizes well across different datasets, which makes it highly suitable for
real-world applications where robustness, accuracy as well as low inference
time on high-resolution fundus images is required