127 research outputs found

    Subword balance, position indices and power sums

    Get PDF
    AbstractIn this paper, we investigate various ways of characterizing words, mainly over a binary alphabet, using information about the positions of occurrences of letters in words. We introduce two new measures associated with words, the position index and sum of position indices. We establish some characterizations, connections with Parikh matrices, and connections with power sums. One particular emphasis concerns the effect of morphisms and iterated morphisms on words

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Out-of-vocabulary spoken term detection

    Get PDF
    Spoken term detection (STD) is a fundamental task for multimedia information retrieval. A major challenge faced by an STD system is the serious performance reduction when detecting out-of-vocabulary (OOV) terms. The difficulties arise not only from the absence of pronunciations for such terms in the system dictionaries, but from intrinsic uncertainty in pronunciations, significant diversity in term properties and a high degree of weakness in acoustic and language modelling. To tackle the OOV issue, we first applied the joint-multigram model to predict pronunciations for OOV terms in a stochastic way. Based on this, we propose a stochastic pronunciation model that considers all possible pronunciations for OOV terms so that the high pronunciation uncertainty is compensated for. Furthermore, to deal with the diversity in term properties, we propose a termdependent discriminative decision strategy, which employs discriminative models to integrate multiple informative factors and confidence measures into a classification probability, which gives rise to minimum decision cost. In addition, to address the weakness in acoustic and language modelling, we propose a direct posterior confidence measure which replaces the generative models with a discriminative model, such as a multi-layer perceptron (MLP), to obtain a robust confidence for OOV term detection. With these novel techniques, the STD performance on OOV terms was improved substantially and significantly in our experiments set on meeting speech data

    Author index

    Get PDF

    Subword-based approaches for spoken document retrieval

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 181-187).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for [\em a priori], or requiring a very large recognition vocabulary in order to cover the contents of growing and diverse message collections. The use of subword units in the recognizer constrains the size of the vocabulary needed to cover the language; and the use of subword units as indexing terms allows for the detection of new user-specified query terms during retrieval. Four research issues are addressed. First, what are suitable subword units and how well can they perform? Second, how can these units be reliably extracted from the speech signal? Third, what is the behavior of the subword units when there are speech recognition errors and how well do they perform? And fourth, how can the indexing and retrieval methods be modified to take into account the fact that the speech recognition output will be errorful?(cont.) We first explore a range of subword units ofvarying complexity derived from error-free phonetic transcriptions and measure their ability to effectively index and retrieve speech messages. We find that many subword units capture enough information to perform effective retrieval and that it is possible to achieve performance comparable to that of text-based word units. Next, we develop a phonetic speech recognizer and process the spoken document collection to generate phonetic transcriptions. We then measure the ability of subword units derived from these transcriptions to perform spoken document retrieval and examine the effects of recognition errors on retrieval performance. Retrieval performance degrades for all subword units (to 60% of the clean reference), but remains reasonable for some subword units even without the use of any error compensation techniques. We then investigate a number of robust methods that take into account the characteristics of the recognition errors and try to compensate for them in an effort to improve spoken document retrieval performance when there are speech recognition errors. We study the methods individually and explore the effects of combining them. Using these robust methods improves retrieval performance by 23%. We also propose a novel approach to SDR where the speech recognition and information retrieval components are more tightly integrated.(cont.) This is accomplished by developing new recognizer and retrieval models where the interface between the two components is better matched and the goals of the two components are consistent with each other and with the overall goal of the combined system. Using this new integrated approach improves retrieval performance by 28%. ...by Kenney Ng.Ph.D

    Multimedia Retrieval

    Get PDF
    corecore