1,974 research outputs found
Phonetic Searching
An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.Georgia Tech Research Corporatio
Fast and Accurate OOV Decoder on High-Level Features
This work proposes a novel approach to out-of-vocabulary (OOV) keyword search
(KWS) task. The proposed approach is based on using high-level features from an
automatic speech recognition (ASR) system, so called phoneme posterior based
(PPB) features, for decoding. These features are obtained by calculating
time-dependent phoneme posterior probabilities from word lattices, followed by
their smoothing. For the PPB features we developed a special novel very fast,
simple and efficient OOV decoder. Experimental results are presented on the
Georgian language from the IARPA Babel Program, which was the test language in
the OpenKWS 2016 evaluation campaign. The results show that in terms of maximum
term weighted value (MTWV) metric and computational speed, for single ASR
systems, the proposed approach significantly outperforms the state-of-the-art
approach based on using in-vocabulary proxies for OOV keywords in the indexed
database. The comparison of the two OOV KWS approaches on the fusion results of
the nine different ASR systems demonstrates that the proposed OOV decoder
outperforms the proxy-based approach in terms of MTWV metric given the
comparable processing speed. Other important advantages of the OOV decoder
include extremely low memory consumption and simplicity of its implementation
and parameter optimization.Comment: Interspeech 2017, August 2017, Stockholm, Sweden. 201
Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean
A new tightly coupled speech and natural language integration model is
presented for a TDNN-based continuous possibly large vocabulary speech
recognition system for Korean. Unlike popular n-best techniques developed for
integrating mainly HMM-based speech recognition and natural language processing
in a {\em word level}, which is obviously inadequate for morphologically
complex agglutinative languages, our model constructs a spoken language system
based on a {\em morpheme-level} speech and language integration. With this
integration scheme, the spoken Korean processing engine (SKOPE) is designed and
implemented using a TDNN-based diphone recognition module integrated with a
Viterbi-based lexical decoding and symbolic phonological/morphological
co-analysis. Our experiment results show that the speaker-dependent continuous
{\em eojeol} (Korean word) recognition and integrated morphological analysis
can be achieved with over 80.6% success rate directly from speech inputs for
the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer
processing of oriental language journa
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
- …