680 research outputs found

    Analysing the importance of different visual feature coefficients

    Get PDF
    A study is presented to determine the relative importance of different visual features for speech recognition which includes pixel-based, model-based, contour-based and physical features. Analysis to determine the discriminability of features is per- formed through F-ratio and J-measures for both static and tem- poral derivatives, the results of which were found to correlate highly with speech recognition accuracy (r = 0.97). Princi- pal component analysis is then used to combine all visual fea- tures into a single feature vector, of which further analysis is performed on the resulting basis functions. An optimal feature vector is obtained which outperforms the best individual feature (AAM) with 93.5 % word accuracy

    Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

    Get PDF
    We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: the IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.Comment: Draf

    Physiology-based model of multi-source auditory processing

    Full text link
    Our auditory systems are evolved to process a myriad of acoustic environments. In complex listening scenarios, we can tune our attention to one sound source (e.g., a conversation partner), while monitoring the entire acoustic space for cues we might be interested in (e.g., our names being called, or the fire alarm going off). While normal hearing listeners handle complex listening scenarios remarkably well, hearing-impaired listeners experience difficulty even when wearing hearing-assist devices. This thesis presents both theoretical work in understanding the neural mechanisms behind this process, as well as the application of neural models to segregate mixed sources and potentially help the hearing impaired population. On the theoretical side, auditory spatial processing has been studied primarily up to the midbrain region, and studies have shown how individual neurons can localize sounds using spatial cues. Yet, how higher brain regions such as the cortex use this information to process multiple sounds in competition is not clear. This thesis demonstrates a physiology-based spiking neural network model, which provides a mechanism illustrating how the auditory cortex may organize up-stream spatial information when there are multiple competing sound sources in space. Based on this model, an engineering solution to help hearing-impaired listeners segregate mixed auditory inputs is proposed. Using the neural model to perform sound-segregation in the neural domain, the neural outputs (representing the source of interest) are reconstructed back to the acoustic domain using a novel stimulus reconstruction method.2017-09-22T00:00:00

    Quantitative Analyses Of Acceptable Noise Level For Air Conduction Listening

    Get PDF
    This study was conducted to develop quantitative models for Acceptable Noise Level (ANL) under air conduction (AC) listening conditions. Experimental results on the effects of frequency bandwidths on ANL under two listening conditions involving earphones and loudspeaker (sound field) with high and low frequencies and babble noise and white noise revealed: (a) there are statistically significant interactions among the background noise types, the background noise frequency bandwidths and signal source; (b) background noise and noise frequency bandwidths have effects on listener discriminability bias toward the noise and the signal intensity; (c) different listening conditions had different ANL thresholds; and (d) a significant difference existed between listeners\u27 Minimum ANL threshold under earphone listening and air conduction listening. The findings revealed that ANLs at different loudspeaker locations were not significantly different statistically from one another. The psychophysical parameters revealed that males had a higher positive discriminability bias toward signal and noise intensities at all locations, except at the 315 degree azimuth; female listeners had higher discriminability biases (β) toward sound at the 315 degree azimuth. For example, the β value for males under signal alone was 0.2095 compared to females\u27 value of 0.23 at the 315 degree location. Under noise only, male β values were all superior to those of females with values higher than 0.22 against less than 0.1 for females at the 180-, 225-, and 315-degree locations. The result showed that the minimum ANL threshold and the listeners\u27 discriminability biases toward sound could be found at the 315-degree loudspeaker location. Finally, a study to determine the differences between ANL and Speech Comprehension in Noise Level (SCNL) was not significant. However, the sensitivity toward sound intensity was higher under ANL than SCNL. This is because ANL is the willingness to work in noisy conditions while SCNL seeks meaning out of signals

    The pressure to communicate efficiently continues to shape language use later in life

    Get PDF
    Language use is shaped by a pressure to communicate efficiently, yet the tendency towards redundancy is said to increase in older age. The longstanding assumption is that saying more than is necessary is inefficient and may be driven by age-related decline in inhibition (i.e. the ability to filter out irrelevant information). However, recent work proposes an alternative account of efficiency: In certain contexts, redundancy facilitates communication (e.g., when the colour or size of an object is perceptually salient and its mention aids the listener’s search). A critical question follows: Are older adults indiscriminately redundant, or do they modulate their use of redundant information to facilitate communication? We tested efficiency and cognitive capacities in 200 adults aged 19–82. Irrespective of age, adults with better attention switching skills were redundant in efficient ways, demonstrating that the pressure to communicate efficiently continues to shape language use later in life

    Memory as discrimination: what distraction reveals

    Get PDF
    Recalling information involves the process of discriminating between relevant and irrelevant information stored in memory. Not infrequently, the relevant information needs to be selected from amongst a series of related possibilities. This is likely to be particularly problematic when the irrelevant possibilities are not only temporally or contextually appropriate but also overlap semantically with the target or targets. Here, we investigate the extent to which purely perceptual features which discriminate between irrelevant and target material can be used to overcome the negative impact of contextual and semantic relatedness. Adopting a distraction paradigm, it is demonstrated that when distracters are interleaved with targets presented either visually (Experiment 1) or auditorily (Experiment 2), a within-modality semantic distraction effect occurs; semantically-related distracters impact upon recall more than unrelated distracters. In the semantically-related condition, the number of intrusions in recall is reduced whilst the number of correctly recalled targets is simultaneously increased by the presence of perceptual cues to relevance (color features in Experiment 1 or speaker’s gender in Experiment 2). However, as demonstrated in Experiment 3, even presenting semantically-related distracters in a language and a sensory modality (spoken Welsh) distinct from that of the targets (visual English) is insufficient to eliminate false recalls completely, or to restore correct recall to levels seen with unrelated distracters . Together, the study shows how semantic and non-semantic discriminability shape patterns of both erroneous and correct recall

    A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

    Get PDF
    Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility to continuous recognition and detection domains. In this work, we apply HMMs to several different species and bioacoustic tasks using generalized spectral features that can be easily adjusted across species and HMM network topologies suited to each task. This experimental work includes a simple call type classification task using one HMM per vocalization for repertoire analysis of Asian elephants, a language-constrained song recognition task using syllable models as base units for ortolan bunting vocalizations, and a stress stimulus differentiation task in poultry vocalizations using a non-sequential model via a one-state HMM with Gaussian mixtures. Results show strong performance across all tasks and illustrate the flexibility of the HMM framework for a variety of species, vocalization types, and analysis tasks

    Visual speech alters the discrimination and identification of non-intact auditory speech in children with hearing loss

    Get PDF
    OBJECTIVES: Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. METHODS: Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/–B/aa or /–B/az). The items started with an easy-to-speechread /B/ or difficult-to-speechread /G/ onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/–B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same—as opposed to different—responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g., /–B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz—as opposed to az— responses in the audiovisual than auditory mode. RESULTS: Performance in the audiovisual mode showed more same responses for the intact vs. non-intact different pairs (e.g., Baa:/–B/aa) and more intact onset responses for nonword repetition (Baz for/–B/az). Thus visual speech altered both discrimination and identification in the CHL—to a large extent for the /B/ onsets but only minimally for the /G/ onsets. The CHL identified the stimuli similarly to the CNH but did not discriminate the stimuli similarly. A bias-free measure of the children’s discrimination skills (i.e., d’ analysis) revealed that the CHL had greater difficulty discriminating intact from non-intact speech in both modes. As the degree of HL worsened, the ability to discriminate the intact vs. non-intact onsets in the auditory mode worsened. Discrimination ability in CHL significantly predicted their identification of the onsets—even after variation due to the other variables was controlled. CONCLUSIONS: These results clearly established that visual speech can fill in non-intact auditory speech, and this effect, in turn, made the non-intact onsets more difficult to discriminate from intact speech and more likely to be perceived as intact. Such results 1) demonstrate the value of visual speech at multiple levels of linguistic processing and 2) support intervention programs that view visual speech as a powerful asset for developing spoken language in CHL

    A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

    Get PDF
    Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility to continuous recognition and detection domains. In this work, we apply HMMs to several different species and bioacoustic tasks using generalized spectral features that can be easily adjusted across species and HMM network topologies suited to each task. This experimental work includes a simple call type classification task using one HMM per vocalization for repertoire analysis of Asian elephants, a language-constrained song recognition task using syllable models as base units for ortolan bunting vocalizations, and a stress stimulus differentiation task in poultry vocalizations using a non-sequential model via a one-state HMM with Gaussian mixtures. Results show strong performance across all tasks and illustrate the flexibility of the HMM framework for a variety of species, vocalization types, and analysis tasks
    • …
    corecore