33 research outputs found

    State dependent feature component selection for noise robust ASR

    No full text
    The acoustic environment in which speech is recorded has a strong influence on the statistical distributions of observed acoustic features. In order to make ASR insensitive to noise it is crucial that these distributions are similar in the training and testing condition. Mostly, it is attempted to compensate for the impact of noise by estimating the noise characteristics from the signal. In this paper we explore the feasibility of a new method to increase noise robustness: We try to exploit a priori knowledge stored in clean speech models. Using Mel bank log-energy features, recognition is done by ignoring the model components for features that contained little energy during training. This strategy aims at recognition results that are determined more strongly by the match in the high-energy rather than by the mismatch in the low-energy model components. Application of the new method to clean speech data confirms that discarding components below a certain energy threshold does not deteriorate recognition performance. Experiments with noisy data, however, show that performance gains are relatively small. This paper explains why that is the case and why, despite the limited success, the outcomes suggest that the method still could prove to be a valuable addition to data-driven methods like (bounded) marginalisation

    A computational model for unsupervised word discovery,” in Order: A

    No full text
    Abstract We present an unsupervised algorithm for the discovery of words and word-like fragments from the speech signal, without using an upfront defined lexicon or acoustic phone models. The algorithm is based on a combination of acoustic pattern discovery, clustering, and temporal sequence learning. It exploits the acoustic similarity between multiple acoustic tokens of the same words or word-like fragments. In its current form, the algorithm is able to discover words in speech with low perplexity (connected digits). Although its performance still falls off compared to mainstream ASR approaches, the value of the algorithm is its potential to serve as a computational model in two research directions. First, the algorithm may lead to an approach for speech recognition that is fundamentally liberated from the modelling constraints in conventional ASR. Second, the proposed algorithm can be interpreted as a computational model of language acquisition that takes actual speech as input and is able to find words as 'emergent' properties from raw input

    Fotballbanen som "sosialt hengested" : en sosialantropologisk studie av jenter, fotball og integrering

    Get PDF
    Denne avhandlingen retter fokus mot samhandling mellom jenter med ulik etnisk bakgrunn i alderen 12 til 14 Ă„r pĂ„ en fotballarena. Med utgangspunkt i feltarbeid pĂ„ et fotballag pĂ„ Oslo Øst med en majoritet av jenter med etnisk minoritetsbakgrunn (muslimer), har jeg undersĂžkt om fotball er en egnet arena for integrering av jenter med ulik etnisk bakgrunn, og om integrering faktisk finner sted. Problemstillingen kan deles i tre: er det en strukturell integrering pĂ„ laget, er fotball en egnet arena for integrering og finner sosial integrering sted gjennom deltakelse i fotball? I tillegg til dette har jeg ogsĂ„ sett pĂ„ hvordan det er for de etnisk norske jentene Ă„ vĂŠre i minoritet pĂ„ laget. Det er et interessant tema siden de ellers tilhĂžrer majoriteten i samfunnet. Hva skjer nĂ„r majoritet – minoritet forholdet blir snudd pĂ„ hodet? For Ă„ belyse spĂžrsmĂ„lene i problemstillingene har jeg gjennomfĂžrt deltagende observasjon, samt en del uformelle samtaler med bĂ„de jentene, familien og treneren over en periode pĂ„ sju mĂ„neder. Studien min viser at jenter med ulik etnisk bakgrunn mĂžtes pĂ„ fotballarenaen, det vil si at det er en strukturell integrasjon, siden de faktisk mĂžtes. Venner er den viktigste faktoren for om jentene spiller fotball, i tillegg til at de synes det er gĂžy. Familien og treneren har ogsĂ„ en pĂ„virkning, men i mindre grad. Jeg viser at fotball er en egnet arena for disse jentene Ă„ delta pĂ„, ved Ă„ se pĂ„ hvordan de balanserer mellom fotball og religion, gjennom Ă„ se pĂ„ klesstil, hijab, gutter, moskeen og ramadan. Vennskap utviklet seg ogsĂ„ blant jentene og ble overfĂžrt til andre arenaer, men i liten grad. Konklusjonen er derfor at fotball er en egnet arena for integrering av jenter med ulik etnisk bakgrunn, og at integrering faktisk finner sted, men ikke i sĂ„ stor grad som man gjerne Ăžnsker pĂ„ en slik arena. Hovedgrunnen til at de etnisk norske er i mindretall pĂ„ laget er pĂ„ grunn av at fotballbanen og omrĂ„det rundt er preget av ungdom med minoritetsbakgrunn. Det er ulikt hvordan de etnisk norske opplever Ă„ vĂŠre en minoritet pĂ„ laget. En jente skaper likhet til majoriteten, for Ă„ passe inn og fĂžle seg likeverdig, mens andre skaper vesentlige forskjeller ved sin oppfĂžrsel mot gutter og mĂ„ter Ă„ kle seg pĂ„. Jeg vil hevde at det er de med etnisk minoritetsbakgrunn som har stĂžrst definisjonsmakt pĂ„ denne arenaen, selv om de er i minoritet ellers i det norske samfunnet

    Acoustic Backing-off as an implementation of missing feature theory

    Get PDF
    Contains fulltext : 75056.pdf (author's version ) (Open Access)19 p

    Collecting a Corpus of Dutch Noise-induced 'Slips of the Ear'

    No full text
    Abstract When trying to understand how listeners recognise words, listeners' misperceptions, so-called 'slips of the ear', can reveal important aspects of the underlying mechanisms of normal word recognition. Such misperceptions shed light onto how inferences are made by listeners about acoustic details in the speech signal and how these interact with other sound sources in the background. On the other hand, if speech from a particular speaker is more prone to being misperceived than that from another speaker, these misperceptions may also shed light onto speaker characteristics. To study these phenomena, misperceptions that occur consistently are invaluable. Although such confusions are quite rare, within the Marie Curie INSPIRE project, software has been developed to efficiently collect such consistent confusions for different languages. Using this software, we have started to collect Dutch consistent confusions. Single words, embedded in five different types of noise at different SNRs, produced by four speakers were presented to Dutch listeners. In a preliminary analysis, consistent confusions were analysed in terms of phoneme substitutions, insertions, and deletions, reconstructions of words using background noise, and eccentric cases. Moreover, the number and types of consistent confusions obtained in the different noise types and from different speakers are compared

    Acoustic Backing-Off In The Local Distance Computation For Robust Automatic Speech Recognition

    No full text
    In this paper we propose to introduce backing-off in the acoustic contributions of the local distance functions used during Viterbi decoding as an operationalisation of missing feature theory for increased recognition robustness. Acoustic backing-off effectively removes the detrimental influence of outlier values from the local decisions in the Viterbi algorithm. It does so without the need for prior knowledge that specific features are missing. Acoustic backing-off avoids any kind of explicit outlier detection. This paper provides a proof of concept of acoustic backing-off in the context of connected digit recognition over the telephone, using artificial distortions of the acoustic observations. It is shown that the word error rate can be maintained at the level of 2:5% obtained for undisturbed features, even in the case where a conventional local distance computation without backing-off leads to a word error rate ? 80:0%. The approach appears to be able to handle up to four independe..

    Acoustic Backing-Off As An Implementation Of Missing Feature Theory

    No full text
    In this paper, we discuss acoustic backing-off as a method to improve automatic speech recognition robustness. Acoustic backing-off aims to achieve the same objective as the marginalization approach of Missing Feature Theory: The detrimental influence of outlier values is effectively removed from the local distance computation in the Viterbi algorithm. The proposed method is based on one of the principles of Robust Statistical Pattern Matching: During recognition the local distance function is modeled using a mixture of the distribution observed during training and a distribution describing observations not previously seen. In order to asses the effectiveness of the new method we used artificial distortions of the acoustic vectors in connected digit recognition over telephone lines. We found that acoustic backing-off is capable of restoring recognition performance almost to the level observed for the undisturbed features, even in cases where a conventional local distance function completely fails. These results show that recognition robustness can be improved using a marginalization approach where making the distinction between reliable and corrupted feature values is wired into the recognition process. In addition, the results show that application of acoustic backing-off is not limited to feature representations based on filter bank outputs
    corecore