Most modern processor architectures provide SIMD (single instruction multiple data) instructions to speed up algorithms based on vector or matrix operations. This paper describes the use of SIMD instructions to calculate Gaussian or Laplacian densities in a large vocabulary speech recognition system. We present a simple, robust method based on scalar quantization of the mean and observation vector components without any loss in recognition performance while speeding up the whole system's runtime by a factor of 3. Combining the approach with vector space partitioning techniques accelerated the overall system by a factor of over 7. The experiments show that the approach can be also applied to Viterbi training without any loss of accuracy. All experiments were conducted on a German, 10,000-word, spontaneous speech task using two architectures, namely Intel Pentium III and SUN UltraSPARC. 1. INTRODUCTION The number of log-likelihood calculations for the emission probabilities of HMMs inc..