65 research outputs found

    Zero-resource audio-only spoken term detection based on a combination of template matching techniques

    Get PDF
    spoken term detection, template matching, unsupervised learning, posterior featuresInternational audienceSpoken term detection is a well-known information retrieval task that seeks to extract contentful information from audio by locating occurrences of known query words of interest. This paper describes a zero-resource approach to such task based on pattern matching of spoken term queries at the acoustic level. The template matching module comprises the cascade of a segmental variant of dynamic time warping and a self-similarity matrix comparison to further improve robustness to speech variability. This solution notably differs from more traditional train and test methods that, while shown to be very accurate, rely upon the availability of large amounts of linguistic resources. We evaluate our framework on different parameterizations of the speech templates: raw MFCC features and Gaussian posteriorgrams, French and English phonetic posteriorgrams output by two different state of the art phoneme recognizers

    Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

    Full text link
    Query-by-example spoken term detection (QbE STD) aims at retrieving data from a speech repository given an acoustic query containing the term of interest as input. Nowadays, it is receiving much interest due to the large volume of multimedia information. This paper presents the systems submitted to the ALBAYZIN QbE STD 2014 evaluation held as a part of the ALBAYZIN 2014 Evaluation campaign within the context of the IberSPEECH 2014 conference. This is the second QbE STD evaluation in Spanish, which allows us to evaluate the progress in this technology for this language. The evaluation consists in retrieving the speech files that contain the input queries, indicating the start and end times where the input queries were found, along with a score value that reflects the confidence given to the detection of the query. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from workshops, which amount to about 7 h of speech. We present the database, the evaluation metric, the systems submitted to the evaluation, the results, and compare this second evaluation with the first ALBAYZIN QbE STD evaluation held in 2012. Four different research groups took part in the evaluations held in 2012 and 2014. In 2014, new multi-word and foreign queries were added to the single-word and in-language queries used in 2012. Systems submitted to the second evaluation are hybrid systems which integrate letter transcription- and template matching-based systems. Despite the significant improvement obtained by the systems submitted to this second evaluation compared to those of the first evaluation, results still show the difficulty of this task and indicate that there is still room for improvement.This research was funded by the Spanish Government ('SpeechTech4All Project' TEC2012 38939 C03 01 and 'CMC-V2 Project' TEC2012 37585 C02 01), the Galician Government through the research contract GRC2014/024 (Modalidade: Grupos de Referencia Competitiva 2014) and 'AtlantTIC Project' CN2012/160, and also by the Spanish Government and the European Regional Development Fund (ERDF) under project TACTICA

    Experimental studies on effect of speaking mode on spoken term detection

    Get PDF
    The objective of this paper is to study the effect of speaking mode on spoken term detection (STD) system. The experiments are conducted with respect to query words recorded in isolated manner and words cut out from continuous speech. Durations of phonemes in query words greatly vary between these two modes. Hence pattern matching stage plays a crucial role which takes care of temporal variations. Matching is done using Subsequence dynamic time warping (DTW) on posterior features of query and reference utterances, obtained by training Multilayer perceptron (MLP). The difference in performance of the STD system for different phoneme groupings (45, 25, 15 and 6 classes) is also analyzed. Our STD system is tested on Telugu broadcast news. Major difference in STD system performance is observed for recorded and cut-out types of query words. It is observed that STD system performance is better with query words cut out from continuous speech compared to words recorded in isolated manner. This performance difference can be accounted for large temporal variations

    Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency

    Get PDF
    Subsequence matching has appeared to be an ideal approach for solving many problems related to the fields of data mining and similarity retrieval. It has been shown that almost any data class (audio, image, biometrics, signals) is or can be represented by some kind of time series or string of symbols, which can be seen as an input for various subsequence matching approaches. The variety of data types, specific tasks and their partial or full solutions is so wide that the choice, implementation and parametrization of a suitable solution for a given task might be complicated and time-consuming; a possibly fruitful combination of fragments from different research areas may not be obvious nor easy to realize. The leading authors of this field also mention the implementation bias that makes difficult a proper comparison of competing approaches. Therefore we present a new generic Subsequence Matching Framework (SMF) that tries to overcome the aforementioned problems by a uniform frame that simplifies and speeds up the design, development and evaluation of subsequence matching related systems. We identify several relatively separate subtasks solved differently over the literature and SMF enables to combine them in straightforward manner achieving new quality and efficiency. This framework can be used in many application domains and its components can be reused effectively. Its strictly modular architecture and openness enables also involvement of efficient solutions from different fields, for instance efficient metric-based indexes. This is an extended version of a paper published on DEXA 2012.Comment: This is an extended version of a paper published on DEXA 201
    corecore