1,498 research outputs found

    Production and perception of individual speaking styles

    Get PDF
    As explanation of between-speaker differences in speech production moves beyond sex-and age-related differences in physiology, discussion has focused on individual vocal tract morphology. While it is interesting to relate, say, variable recruitment of the jaw to extent of palate doming, there is a substantial residue of arbitrary differences that constitute the speaker's "style". Style differences observed across a well-defined social group indicate group membership. Other style differences are idiosyncratic "habits" of articulation, individual solutions to the many-to-many mapping between motoric and acoustic representations and to the many different attentional trading relationships that can exploit the typical patterns of redundant variation in independent acoustic correlates of any minimal contrast. Perceptual studies of social style differences suggest that perceptibility depends upon the task and upon the hearer's own group membership. The few studies of idiosyncratic differences suggest that speakers perceive each others' productions in terms of their own habits. Thus, perceptual compensation for speaker differences must go beyond mere vocal tract normalization. A promising route for describing how listeners compensate for the arbitrary variation of style is an instance-based (or exemplar) model of speech perception in which the distribution of exemplars is heavily weighted by instances of the speaker's own productions

    Weak biases emerging from vocal tract anatomy shape the repeated transmission of vowels

    No full text
    Linguistic diversity is affected by multiple factors, but it is usually assumed that variation in the anatomy of our speech organs plays no explanatory role. Here we use realistic computer models of the human speech organs to test whether inter-individual and inter-group variation in the shape of the hard palate (the bony roof of the mouth) affects acoustics of speech sounds. Based on 107 midsagittal MRI scans of the hard palate of human participants, we modelled with high accuracy the articulation of a set of five cross-linguistically representative vowels by agents learning to produce speech sounds. We found that different hard palate shapes result in subtle differences in the acoustics and articulatory strategies of the produced vowels, and that these individual-level speech idiosyncrasies are amplified by the repeated transmission of language across generations. Therefore, we suggest that, besides culture and environment, quantitative biological variation can be amplified, also influencing language

    Automatic Speech Recognition Using LP-DCTC/DCS Analysis Followed by Morphological Filtering

    Get PDF
    Front-end feature extraction techniques have long been a critical component in Automatic Speech Recognition (ASR). Nonlinear filtering techniques are becoming increasingly important in this application, and are often better than linear filters at removing noise without distorting speech features. However, design and analysis of nonlinear filters are more difficult than for linear filters. Mathematical morphology, which creates filters based on shape and size characteristics, is a design structure for nonlinear filters. These filters are limited to minimum and maximum operations that introduce a deterministic bias into filtered signals. This work develops filtering structures based on a mathematical morphology that utilizes the bias while emphasizing spectral peaks. The combination of peak emphasis via LP analysis with morphological filtering results in more noise robust speech recognition rates. To help understand the behavior of these pre-processing techniques the deterministic and statistical properties of the morphological filters are compared to the properties of feature extraction techniques that do not employ such algorithms. The robust behavior of these algorithms for automatic speech recognition in the presence of rapidly fluctuating speech signals with additive and convolutional noise is illustrated. Examples of these nonlinear feature extraction techniques are given using the Aurora 2.0 and Aurora 3.0 databases. Features are computed using LP analysis alone to emphasize peaks, morphological filtering alone, or a combination of the two approaches. Although absolute best results are normally obtained using a combination of the two methods, morphological filtering alone is nearly as effective and much more computationally efficient

    Speaker Independent Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography - Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data

    Numerical simulation of the influence of the orifice aperture on the flow around a teeth-shaped obstacle

    Get PDF
    The sound generated during the production of the sibilant [s] results from the impact of a turbulent jet on the incisors. Several geometric characteristics of the oral tract can affect the properties of the flow-induced noise so that the characterization of the influence of different geometric parameters on the acoustic sources properties allows determining control factors of the noise production. In this study, a simplified vocal tract/teeth geometric model is used to numerically investigate the flow around a teeth-shaped obstacle placed in a channel and to analyze the influence of the aperture at the teeth on the spectral properties of the fluctuating pressure force exerted on the surface of the obstacle, which is at the origin of the dipole sound source. The results obtained for Re = 4000 suggest that the aperture of the constriction formed by the teeth modifies the characteristics of the turbulent jet downstream of the teeth. Thus, the variations of the flow due to the modification of the constriction aperture lead to variations of the spectral properties of the sound source even if the levels predicted are lower than during the production of real sibilant fricative

    A multilinear tongue model derived from speech related MRI data of the human vocal tract

    Get PDF
    We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. The extraction is performed by using a minimally supervised method that uses as basis an image segmentation approach and a template fitting technique. Furthermore, it uses image denoising to deal with possibly corrupt data, palate surface information reconstruction to handle palatal tongue contacts, and a bootstrap strategy to refine the obtained shapes. Our evaluation concludes that limiting the degrees of freedom for the anatomical and speech related variations to 5 and 4, respectively, produces a model that can reliably register unknown data while avoiding overfitting effects. Furthermore, we show that it can be used to generate a plausible tongue animation by tracking sparse motion capture data

    A Cervid Vocal Fold Model Suggests Greater Glottal Efficiency in Calling at High Frequencies

    Get PDF
    Male Rocky Mountain elk (Cervus elaphus nelsoni) produce loud and high fundamental frequency bugles during the mating season, in contrast to the male European Red Deer (Cervus elaphus scoticus) who produces loud and low fundamental frequency roaring calls. A critical step in understanding vocal communication is to relate sound complexity to anatomy and physiology in a causal manner. Experimentation at the sound source, often difficult in vivo in mammals, is simulated here by a finite element model of the larynx and a wave propagation model of the vocal tract, both based on the morphology and biomechanics of the elk. The model can produce a wide range of fundamental frequencies. Low fundamental frequencies require low vocal fold strain, but large lung pressure and large glottal flow if sound intensity level is to exceed 70 dB at 10 m distance. A high-frequency bugle requires both large muscular effort (to strain the vocal ligament) and high lung pressure (to overcome phonation threshold pressure), but at least 10 dB more intensity level can be achieved. Glottal efficiency, the ration of radiated sound power to aerodynamic power at the glottis, is higher in elk, suggesting an advantage of high-pitched signaling. This advantage is based on two aspects; first, the lower airflow required for aerodynamic power and, second, an acoustic radiation advantage at higher frequencies. Both signal types are used by the respective males during the mating season and probably serve as honest signals. The two signal types relate differently to physical qualities of the sender. The low-frequency sound (Red Deer call) relates to overall body size via a strong relationship between acoustic parameters and the size of vocal organs and body size. The high-frequency bugle may signal muscular strength and endurance, via a ‘vocalizing at the edge’ mechanism, for which efficiency is critical

    Volumetric Manganese Enhanced Magnetic Resonance Imaging in mice (mus musculus)

    Get PDF
    The present doctoral thesis introduces a method for semi-automatic volumetric analysis of the hippocampus and other distinct brain regions in laboratory mice. The method of volumetric manganese enhanced magnetic resonance imaging (vMEMRI) makes use of the paramagnetic property of the manganese ion, Mn2+, which results in a positive contrast enhancement of specific brain areas on the MR image and enables a more detailed image of brain morphology. The chemical similarity of Mn2+ to Calcium leads to an accumulation of Mn2+ in excited cells and consequentially an enhanced signal in certain brain regions in an activity dependent manner. However, one major drawback for vMEMRI is the toxicity of Mn2+. Therefore, the aims of the thesis have been: (1) Establishment of a MEMRI protocol in mice (2) Optimization of a Mn2+ application procedure to reduce toxic side effects (3) Development of an automatized method to determine hippocampal volume (4) Validation of vMEMRI analysis (5) Application of volumetric analysis in mouse models of psychopathology This thesis splits into 3 studies. Study 1 deals with Mn2+ toxicity and introduces an application method that considerably reduces the toxic side effects of Mn2+. Study 2 validates vMEMRI as a method to reliably determine hippocampal volume and explores its utilization it in animals with genetically and chemically modified hippocampi. Study 3 displays the application vMEMRI in a mouse model of a psychiatric disorder. Study 1 shows that a single application of Mn2+ in dosages used in current MEMRI studies leads to considerable toxic side effects measurable with physiological, behavioral and endocrine markers. In contrast, a fractionated application of a low dose of Mn2+ is proposed as an alternative to a single injection of a high dose. Repeated application of low dosages of 30 mg/kg Mn2+ showed less toxic side effects compared to the application schemes with higher dosages of 60 mg/kg. Additionally, the best vMEMRI signal contrast was seen for an injection protocol of 30 mg/kg 8 times with an inter-injection interval of 24 h (8x30/24 protocol). The impact of the 8x30/24 application protocol on longitudinal studies was tested by determining whether learning processes are disturbed. Mice were injected with the 8x30/24 protocol 2 weeks prior to receiving a single footshock. Manganese injected mice showed less contextual freezing to the shock context and a shock context reminder one month after shock application. Furthermore, mice showed increased hyperarousal and no avoidance of shock context related odors. This impairment in fear conditioning indicates a disturbed associative learning of Mn2+ injected mice. Therefore, it was investigated whether Mn2+ application shows a specific disturbance of hippocampus dependent learning. Mice were subjected to habitual and spatial learning protocols 12 h after each injection in a water cross-maze. There was no impairment in learning protocols which allowed for hippocampus-independent habitual learning. However, Mn2+ injected mice were specifically impaired in the hippocampus-dependent spatial learning protocol. Furthermore, it was shown that only mice with higher Mn2+ accumulation showed this impairment. Altogether, the results of this chapter argue for a fractionated application scheme such as 30 mg/kg every 24 h for 8 days to provide sufficient MEMRI signal contrast while minimizing toxic side effects. However, the treatment procedure has to be further improved to allow for an analysis of hippocampus-dependent learning processes as well. Because of the potential side effects, the vMEMRI method was applied as a final experiment in study 2 and 3. Study 2 introduces the method of vMEMRI, which allows, for the first time, an in vivo semi-automatic detection of hippocampal volume. Hippocampal volume of mice with genetically altered adult neurogenesis and those with chemically lesioned hippocampi could be analyzed with vMEMRI. Even the highly variable differences in hippocampal volume of these animals could be detected with vMEMRI. vMEMRI data correlated with manually obtained volumes and are in agreement with previously reported histological findings, indicating the high reliability of this method. Study 3 investigates the ability of vMEMRI to detect even small differences in brain morphology by examining volumetric changes of the hippocampus and other brain structures in a mouse model of PTSD supplemented with enriched housing conditions. It was shown, that exposure to a brief inescapable foot shock led to a volume reduction in both the left hippocampus and right central amygdala two months later. Enriched housing decreased the intensity of trauma-associated contextual fear independently of whether it was provided before or after the shock. vMEMRI analysis revealed that enriched housing led to an increase in whole brain volume, including the lateral ventricles and the hippocampus. Furthermore, the enhancement of hippocampal volume through enriched housing was accompanied by the amelioration of trauma-associated PTSD-like symptoms. Hippocampal volume gain and loss was mirrored by ex vivo ultramicroscopic measurements of the hippocampus. Together, these data demonstrate that vMEMRI is able to detect small changes in hippocampal and central amygdalar volumes induced by a traumatic experience in mice. In conclusion, vMEMRI proves to be very reliable and able to detect small volumetric differences in various brain regions in living mice. vMEMRI opens up a great number possibilities for future research determining neuroanatomical structure, volumes and activity in vivo as well as the ability to repeatedly determine such characteristics within each subject, given an improvement of the Mn2+ treatment protocols to minimize potential toxic side effects
    corecore