This paper describes FBK's submission to the end-to-end English-German speech
translation task at IWSLT 2018. Our system relies on a state-of-the-art model
based on LSTMs and CNNs, where the CNNs are used to reduce the temporal
dimension of the audio input, which is in general much higher than machine
translation input. Our model was trained only on the audio-to-text parallel
data released for the task, and fine-tuned on cleaned subsets of the original
training corpus. The addition of weight normalization and label smoothing
improved the baseline system by 1.0 BLEU point on our validation set. The final
submission also featured checkpoint averaging within a training run and
ensemble decoding of models trained during multiple runs. On test data, our
best single model obtained a BLEU score of 9.7, while the ensemble obtained a
BLEU score of 10.24.Comment: 6 pages, 2 figures, system description at the 15th International
Workshop on Spoken Language Translation (IWSLT) 201