56 research outputs found

    LOW RESOURCE HIGH ACCURACY KEYWORD SPOTTING

    Get PDF
    Keyword spotting (KWS) is a task to automatically detect keywords of interest in continuous speech, which has been an active research topic for over 40 years. Recently there is a rising demand for KWS techniques in resource constrained conditions. For example, as for the year of 2016, USC Shoah Foundation covers audio-visual testimonies from survivors and other witnesses of the Holocaust in 63 countries and 39 languages, and providing search capability for those testimonies requires substantial KWS technologies in low language resource conditions, as for most languages, resources for developing KWS systems are not as rich as that for English. Despite the fact that KWS has been in the literature for a long time, KWS techniques in resource constrained conditions have not been researched extensively. In this dissertation, we improve KWS performance in two low resource conditions: low language resource condition where language specific data is inadequate, and low computation resource condition where KWS runs on computation constrained devices. For low language resource KWS, we focus on applications for speech data mining, where large vocabulary continuous speech recognition (LVCSR)-based KWS techniques are widely used. Keyword spotting for those applications are also known as keyword search (KWS) or spoken term detection (STD). A key issue for this type of KWS technique is the out-of-vocabulary (OOV) keyword problem. LVCSR-based KWS can only search for words that are defined in the LVCSR's lexicon, which is typically very small in a low language resource condition. To alleviate the OOV keyword problem, we propose a technique named "proxy keyword search" that enables us to search for OOV keywords with regular LVCSR-based KWS systems. We also develop a technique that expands LVCSR's lexicon automatically by adding hallucinated words, which increases keyword coverage and therefore improves KWS performance. Finally we explore the possibility of building LVCSR-based KWS systems with limited lexicon, or even without an expert pronunciation lexicon. For low computation resource KWS, we focus on wake-word applications, which usually run on computation constrained devices such as mobile phones or tablets. We first develop a deep neural network (DNN)-based keyword spotter, which is lightweight and accurate enough that we are able to run it on devices continuously. This keyword spotter typically requires a pre-defined keyword, such as "Okay Google". We then propose a long short-term memory (LSTM)-based feature extractor for query-by-example KWS, which enables the users to define their own keywords

    Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish /

    Get PDF
    The statistical morphological disambiguation of agglutinative languages suffers from data sparseness. In this study, we introduce the notion of distinguishing tag sets (DTS) to overcome the problem. The morphological analyses of words are modeled with DTS and the root major part-of-speech tags. The disambiguator based on the introduced representations performs the statistical morphological disambiguation of Turkish with a recall of as high as 95.69 percent. In text-to-speech systems and in developing transcriptions for acoustic speech data, the problem occurs in disambiguating the pronunciation of a token in context, so that the correct pronunciation can be produced or the transcription uses the correct set of phonemes. We apply the morphological disambiguator to this problem of pronunciation disambiguation and achieve 99.54 percent recall with 97.95 percent precision. Most text-to-speech systems perform phrase level accentuation based on content word/function word distinction. This approach seems easy and adequate for some right headed languages such as English but is not suitable for languages such as Turkish. We then use a a heuristic approach to mark up the phrase boundaries based on dependency parsing on a basis of phrase level accentuation for Turkish TTS synthesizers

    ミャンマー語テキストの形式手法による音節分割、正規化と辞書順排列

    Get PDF
    国立大学法人長岡技術科学大
    corecore