This paper presents the results of the premier
shared task organized alongside the Confer-
ence on Machine Translation (WMT) 2018.
Participants were asked to build machine
translation systems for any of 7 language pairs
in both directions, to be evaluated on a test set
of news stories. The main metric for this task
is human judgment of translation quality. This
year, we also opened up the task to additional
test suites to probe specific aspects of transla-
tion