23 research outputs found

    Audio Information Retrieval using Sinusoidal Modeling based Features

    Get PDF
    In this paper, we propose an approach for searching a speech query word in an audio data base. In this approach, we explore Sinusoidal modeling based features such as Amplitude, Frequency and Phase of the speech signal. Three independent systems are built using these features. Further, the Majority voting logic is used to arrive at a conclusion to locate (time stamp) the query word in the reference utterances. The studies are performed on the TIMIT database. The results show that, Sinusoidal based features can be used for speech processing in place of conventional approaches

    Spoken document retrieval based on approximated sequence alignment

    Get PDF
    This paper presents a new approach to spoken document information retrieval for spontaneous speech corpora. The classical approach to this problem is the use of an automatic speech recognizer (ASR) combined with standard information retrieval techniques. However, ASRs tend to produce transcripts of spontaneous speech with significant word error rate, which is a drawback for standard retrieval techniques. To overcome such a limitation, our method is based on an approximated sequence alignment algorithm to search “sounds like” sequences. Our approach does not depend on extra information from the ASR and outperforms up to 7 points the precision of state-of-the-art techniques in our experiments.Peer ReviewedPostprint (author’s final draft

    Open-vocabulary spoken utterance retrieval using confusion networks

    Get PDF
    This paper presents a novel approach to open-vocabulary spoken utterance retrieval using confusion networks. If out-of-vocabulary (OOV) words are present in queries and the corpus, word-based indexing will not be sufficient. For this problem, we apply phone confusion networks and combine them with word confusion networks. With this approach, we can generate a more compact index table that enables robust keyword matching compared with typical lattice-based methods. In the retrieval experiments with speech recordings in MIT lecture corpus, our method using phone confusion networks outperformed lattice-based methods especially for OOV queries

    A critical assessment of spoken utterance retrieval through approximate lattice representations

    Full text link

    Automates lexico-phonétiques pour l'indexation et la recherche de segments de parole

    Get PDF
    National audienceThis paper presents a method for indexing spoken utterances which combines lexical and phonetic hypotheses in a hybrid index built from automata. The retrieval is realized by a lexical-phonetic and semi-imperfect matching whose aim is to improve the recall. A feature vector, containing edit distance scores and a confidence measure, weights each transition to help the filtering of the candidate utterance list for a more precise search. Experiment results show that the lexical and phonetic representations are complementary and we compare the hybrid search with the state-of-the-art cascaded search to retrieve named entity queries.Ce papier1 présente une méthode d'indexation de segments de parole qui combine des hypothèses lexicales et phonétiques au sein d'un index hybride à base d'automates. La recherche se fait via un appariement lexico-phonétique semi-imparfait qui tolère certaines imperfections pour améliorer le rappel. Un vecteur de descripteurs, contenant des scores d'édition et une mesure de confiance, pondère chaque transition permettant de caractériser la pertinence des segments candidats pour une recherche plus précise. Les expériences montrent la complémentarité des représentations lexicales et phonétiques et leur intérêt pour rechercher des requêtes d'entités nommées

    PHAST: Spoken document retrieval based on sequence alignment

    Get PDF
    This paper presents a new approach to spoken document information retrieval for spontaneous speech corpora. Classical approach to this problem is the use of an automatic speech recognizer (ASR) combined with standard information retrieval techniques, based on terms or n-grams. However, state-of-the-art large vocabulary continuous ASRs produce transcripts of spontaneous speech with a word error rate of 25% or higher, which is a drawback for retrieval techniques based on terms or n-grams. In order to overcome such a limitation, our method is based on a sequence alignment algorithm drawn from the field of bioinformatics to searchPostprint (published version
    corecore