Nonlinear photonic delay systems present interesting implementation platforms
for machine learning models. They can be extremely fast, offer great degrees of
parallelism and potentially consume far less power than digital processors. So
far they have been successfully employed for signal processing using the
Reservoir Computing paradigm. In this paper we show that their range of
applicability can be greatly extended if we use gradient descent with
backpropagation through time on a model of the system to optimize the input
encoding of such systems. We perform physical experiments that demonstrate that
the obtained input encodings work well in reality, and we show that optimized
systems perform significantly better than the common Reservoir Computing
approach. The results presented here demonstrate that common gradient descent
techniques from machine learning may well be applicable on physical
neuro-inspired analog computers