Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb
Abstract
Statistical language modelling estimates the regularities in natural languages. Language models are used in speech recognition, machine translation and other applications for speech and language technologies. In this paper we will present a procedure for language models building for the Croatian weather domain corpus. Different types of n-gram statistic language models and smoothing methods for language modelling are presented. Those models are compared in terms of their estimated perplexity