Search CORE

79,640 research outputs found

Optimizing Frequency Channels for Adults with Cochlear Implants

Author: Boyce Lauren
Publication venue: 'The Ohio State University Libraries'
Publication date: 01/05/2015
Field of study

Cochlear implants (CIs) are devices used by individuals with hearing loss to improve communication through the use of an electrode array that directly stimulates the auditory nerve. Existing signal processing strategies utilize a logarithmic frequency-to-electrode allocation, mimicking the representation of frequencies along the basilar membrane (high frequencies at the base and low frequencies at the apex). These strategies support some degree of open-set speech recognition for CI users; however, average speech recognition remains well below what normal-hearing adults are capable of. To enhance speech recognition in adult CI users, this study examined one promising alternative to the standard logarithmic frequency-to-electrode allocation maps. The frequency-to-electrode allocation maps were modified to provide more refined representations of the first two (and most important) vowel formant frequencies (energy peaks in vowels that are critical to speech perception). Twelve participants were tested using two different CI maps: one based on existing clinical frequency-to-electrode allocation strategies (Standard) and one designed to improve the resolution of the first two formants, which should especially enhance vowel recognition (Speech). Alternating between these maps, participants listened to and repeated three kinds of stimulus materials: (1) highly meaningful five-word sentences, (2) syntactically correct but not meaningful four-word sentences, and (3) phonetically balanced consonant-vowel-consonant words in isolation. Analyses revealed that some participants benefitted from the Speech strategy. Moreover, an improvement in vowel recognition in words strongly predicted an improvement in recognition of words in sentences. These findings suggest that optimizing the representation of the first two formants enhances speech recognition for CI users. Future efforts should focus on better representing this speech-specific information in modern-day signal processing strategies.No embargoAcademic Major: Speech and Hearing Scienc

KnowledgeBank at OSU

Glyph guessing for 'oo' and 'ee': spatial frequency information in sound symbolic matching for ancient and unfamiliar scripts.

Author: Styles S.J.
Turoman N.
Publication venue: 'The Royal Society'
Publication date: 01/09/2017
Field of study

In three experiments, we asked whether diverse scripts contain interpretable information about the speech sounds they represent. When presented with a pair of unfamiliar letters, adult readers correctly guess which is /i/ (the 'ee' sound in 'feet'), and which is /u/ (the 'oo' sound in 'shoe') at rates higher than expected by chance, as shown in a large sample of Singaporean university students (Experiment 1) and replicated in a larger sample of international Internet users (Experiment 2). To uncover what properties of the letters contribute to different scripts' 'guessability,' we analysed the visual spatial frequencies in each letter (Experiment 3). We predicted that the lower spectral frequencies in the formants of the vowel /u/ would pattern with lower spatial frequencies in the corresponding letters. Instead, we found that across all spatial frequencies, the letter with more black/white cycles (i.e. more ink) was more likely to be guessed as /u/, and the larger the difference between the glyphs in a pair, the higher the script's guessability. We propose that diverse groups of humans across historical time and geographical space tend to employ similar iconic strategies for representing speech in visual form, and provide norms for letter pairs from 56 diverse scripts

Serveur académique lausannois

Speech analyzer

Author: Lokerson D. C.
Publication venue
Publication date: 02/08/1977
Field of study

A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform

NASA Technical Reports Server

Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus

Author: Carlson Nicole L.
DeWeese Michael R.
Ming Vivienne L.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 12/07/2012
Field of study

We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogram representation of sound such as localized checkerboard patterns and frequency-modulated excitatory subregions flanked by suppressive sidebands. Moreover, several of these novel features resemble neuronal receptive fields reported in the Inferior Colliculus (IC), as well as auditory thalamus and cortex, and our model neurons exhibit the same tradeoff in spectrotemporal resolution as has been observed in IC. To our knowledge, this is the first demonstration that receptive fields of neurons in the ascending mammalian auditory pathway beyond the auditory nerve can be predicted based on coding principles and the statistical properties of recorded sounds.Comment: For Supporting Information, see PLoS website: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.100259

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

Disentangling the effects of phonation and articulation: Hemispheric asymmetries in the auditory N1m response of the human brain

Author: Alku Paavo
May Patrick JC
Mäkelä Anna Mari
Mäkinen Ville
Tiitinen Hannu
Publication venue: BioMed Central
Publication date: 01/10/2005
Field of study

BACKGROUND: The cortical activity underlying the perception of vowel identity has typically been addressed by manipulating the first and second formant frequency (F1 & F2) of the speech stimuli. These two values, originating from articulation, are already sufficient for the phonetic characterization of vowel category. In the present study, we investigated how the spectral cues caused by articulation are reflected in cortical speech processing when combined with phonation, the other major part of speech production manifested as the fundamental frequency (F0) and its harmonic integer multiples. To study the combined effects of articulation and phonation we presented vowels with either high (/a/) or low (/u/) formant frequencies which were driven by three different types of excitation: a natural periodic pulseform reflecting the vibration of the vocal folds, an aperiodic noise excitation, or a tonal waveform. The auditory N1m response was recorded with whole-head magnetoencephalography (MEG) from ten human subjects in order to resolve whether brain events reflecting articulation and phonation are specific to the left or right hemisphere of the human brain. RESULTS: The N1m responses for the six stimulus types displayed a considerable dynamic range of 115–135 ms, and were elicited faster (~10 ms) by the high-formant /a/ than by the low-formant /u/, indicating an effect of articulation. While excitation type had no effect on the latency of the right-hemispheric N1m, the left-hemispheric N1m elicited by the tonally excited /a/ was some 10 ms earlier than that elicited by the periodic and the aperiodic excitation. The amplitude of the N1m in both hemispheres was systematically stronger to stimulation with natural periodic excitation. Also, stimulus type had a marked (up to 7 mm) effect on the source location of the N1m, with periodic excitation resulting in more anterior sources than aperiodic and tonal excitation. CONCLUSION: The auditory brain areas of the two hemispheres exhibit differential tuning to natural speech signals, observable already in the passive recording condition. The variations in the latency and strength of the auditory N1m response can be traced back to the spectral structure of the stimuli. More specifically, the combined effects of the harmonic comb structure originating from the natural voice excitation caused by the fluctuating vocal folds and the location of the formant frequencies originating from the vocal tract leads to asymmetric behaviour of the left and right hemisphere

Directory of Open Access Journals

PubMed Central

Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality

Author: Howard D M
Mullen J
Murphy D T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2006
Field of study

Digital waveguide physical modeling is often used as an efficient representation of acoustical resonators such as the human vocal tract. Building on the basic one-dimensional (1-D) Kelly-Lochbaum tract model, various speech synthesis techniques demonstrate improvements to the wave scattering mechanisms in order to better approximate wave propagation in the complex vocal system. Some of these techniques are discussed in this paper, with particular reference to an alternative approach in the form of a two-dimensional waveguide mesh model. Emphasis is placed on its ability to produce vowel spectra similar to that which would be present in natural speech, and how it improves upon the 1-D model. Tract area function is accommodated as model width, rather than translated into acoustic impedance, and as such offers extra control as an additional bounding limit to the model. Results show that the two-dimensional (2-D) model introduces approximately linear control over formant bandwidths leading to attainable realistic values across a range of vowels. Similarly, the 2-D model allows for application of theoretical reflection values within the tract, which when applied to the 1-D model result in small formant bandwidths, and, hence, unnatural sounding synthesized vowels

Crossref

White Rose Research Online

The new accent technologies:recognition, measurement and manipulation of accented speech

Author: Huckvale M
Publication venue: Beijing: Language and Culture Press
Publication date: 01/01/2006
Field of study

UCL Discovery

Perceptual adaptation by normally hearing listeners to a simulated "hole" in hearing

Author: Andrew Faulkner
Bench R. J.
Faulkner A.
Matthew W. Smith
Moore B. C. J.
Murray N.
Publication venue: ACOUSTICAL SOC AMER AMER INST PHYSICS
Publication date: 01/12/2006
Field of study

Simulations of cochlear implants have demonstrated that the deleterious effects of a frequency misalignment between analysis bands and characteristic frequencies at basally shifted simulated electrode locations are significantly reduced with training. However, a distortion of frequency-to-place mapping may also arise due to a region of dysfunctional neurons that creates a "hole" in the tonotopic representation. This study simulated a 10 mm hole in the mid-frequency region. Noise-band processors were created with six output bands (three apical and three basal to the hole). The spectral information that would have been represented in the hole was either dropped or reassigned to bands on either side. Such reassignment preserves information but warps the place code, which may in itself impair performance. Normally hearing subjects received three hours of training in two reassignment conditions. Speech recognition improved considerably with training. Scores were much lower in a baseline (untrained) condition where information from the hole region was dropped. A second group of subjects trained in this dropped condition did show some improvement; however, scores after training were significantly lower than in the reassignment conditions. These results are consistent with the view that speech processors should present the most informative frequency range irrespective of frequency misalignment. 0 2006 Acoustical Society of America

Crossref

UCL Discovery