Spoken term detection (STD) is a fundamental task for multimedia information
retrieval. A major challenge faced by an STD system is the serious performance reduction
when detecting out-of-vocabulary (OOV) terms. The difficulties arise not only
from the absence of pronunciations for such terms in the system dictionaries, but from
intrinsic uncertainty in pronunciations, significant diversity in term properties and a
high degree of weakness in acoustic and language modelling.
To tackle the OOV issue, we first applied the joint-multigram model to predict pronunciations
for OOV terms in a stochastic way. Based on this, we propose a stochastic
pronunciation model that considers all possible pronunciations for OOV terms so that
the high pronunciation uncertainty is compensated for.
Furthermore, to deal with the diversity in term properties, we propose a termdependent
discriminative decision strategy, which employs discriminative models to
integrate multiple informative factors and confidence measures into a classification
probability, which gives rise to minimum decision cost.
In addition, to address the weakness in acoustic and language modelling, we propose
a direct posterior confidence measure which replaces the generative models with
a discriminative model, such as a multi-layer perceptron (MLP), to obtain a robust
confidence for OOV term detection.
With these novel techniques, the STD performance on OOV terms was improved
substantially and significantly in our experiments set on meeting speech data