This research investigates the Statistical Machine Translation approaches to
translate speech in real time automatically. Such systems can be used in a
pipeline with speech recognition and synthesis software in order to produce a
real-time voice communication system between foreigners. We obtained three main
data sets from spoken proceedings that represent three different types of human
speech. TED, Europarl, and OPUS parallel text corpora were used as the basis
for training of language models, for developmental tuning and testing of the
translation system. We also conducted experiments involving part of speech
tagging, compound splitting, linear language model interpolation, TrueCasing
and morphosyntactic analysis. We evaluated the effects of variety of data
preparations on the translation results using the BLEU, NIST, METEOR and TER
metrics and tried to give answer which metric is most suitable for PL-EN
language pair.Comment: machine translation, polish englis