59 research outputs found

    Psychological stress measurement through voice output analysis

    Get PDF
    Audio tape recordings of selected Skylab communications were processed by a psychological stress evaluator. Strip chart tracings were read blind and scores were assigned based on characteristics reported by the manufacturer to indicate psychological stress. These scores were analyzed for their empirical relationships with operational variables in Skylab judged to represent varying degrees of situational stress. Although some statistically significant relationships were found, the technique was not judged to be sufficiently predictive to warrant its use in assessing the degree of psychological stress of crew members in future space missions

    On the design of visual feedback for the rehabilitation of hearing-impaired speech

    Get PDF

    Mapping across feature spaces in forensic voice comparison: the contribution of auditory-based voice quality to (semi-)automatic system testing

    Get PDF
    In forensic voice comparison, there is increasing focus on the integration of automatic and phonetic methods to improve the validity and reliability of voice evidence to the courts. In line with this, we present a comparison of long-term measures of the speech signal to assess the extent to which they capture complementary speaker-specific information. Likelihood ratio-based testing was conducted using MFCCs and (linear and Mel-weighted) long-term formant distributions (LTFDs). Fusing automatic and semi-automatic systems yielded limited improvement in performance over the baseline MFCC system, indicating that these measures capture essentially the same speaker-specific information. The output from the best performing system was used to evaluate the contribution of auditory-based analysis of supralaryngeal (filter) and laryngeal (source) voice quality in system testing. Results suggest that the problematic speakers for the (semi-)automatic system are, to some extent, predictable from their supralaryngeal voice quality profiles, with the least distinctive speakers producing the weakest evidence and most misclassifications. However, the misclassified pairs were still easily differentiated via auditory analysis. Laryngeal voice quality may thus be useful in resolving problematic pairs for (semi-)automatic systems, potentially improving their overall performance

    The female-to-male transsexual voice: Physiology vs. performance in production

    Get PDF
    Results of the three studies on the speech production of female-to-male transgender individuals (transmen) present phonetic evidence that speech produces the transmen by what I termed triple decoupling. Transmen successfully decouple gender from biological sex. The results of the longitudinal studies exemplified that speakers born and raised female do not necessarily need to have a female voicing source or filter function. Both qualitative changes can he achieved (to different degree) by bringing exogenous testosterone into the system that virilizes both source and filter over time. Moreover, the cross-sectional study showed that articulatory gestures can be modified to move the acoustic targets towards a gendered target one is striving to present. The acoustic manifestations of transmen with different partner attraction offers the next type of decoupling, that between sexual orientation and gender identity. The results of the cross-sectional study imply that female-born individuals attracted to men do not necessarily have to identify as women. They can opt out of this self-identification by selectively adopting features associated with the gay cismale speaking style. This is suggested by the fact that sexual orientation was found to have a significant effect on the durational and spectral quality of fricatives /s/ and /s/, formant values and sentential pitch range. Finally, the longitudinal studies provide evidence for the third type of decoupling, which comes in the form of gender breaking free from physiology. The recurring "reverse J-pattern" of both the transitioning source and filter, as well as the mean fundamental frequency raising above the pitch floor illustrate the fact that transmen do not feel obliged to sound as masculine (as low-pitched and "low-formanted") as testosterone enables them to. This final type of decoupling also serves to demonstrate that many transmen decidedly do not opt in to the binary system of sex / gender even though they are physiologically able to do so. Although LGB speaking styles have been investigated before, this dissertation is the first to discuss a number of acoustic descriptors specifically in transmen's speech and place them into the context of hormone treatment, sexual orientation and disclosure status

    Broadcast speech and the effect of voice quality on the listener : a study of the various components which categorise listener perception by vocal characteristics.

    Get PDF
    Voice quality is crucial to the art of the broadcast speaker. Acceptable voice quality is a necessity for an acceptable microphone voice and essential therefore for employment as a broadcaster. This thesis investigates the characteristics of the voice which provide that acceptability; and categorises the features which lead the listener to make judgements about their vocal likes and dislikes. These subjective judgements are explored by investigating the psychological, medical, and innate features contributing to the vocal perceptions of the listener. Voice quality is related to the efficiency of the larynx and its importance to voice production; and to the various vocal disorders which can affect the broadcaster. It becomes evident throughout the thesis that each listener receives a clear impression of the personality of the speaker through the features present in the voice. Many of these impressions however are based on stereotypes. The thesis relates these stereotypical judgements to accents, investigating their relationship to the 'BBC' voice, the 'World Service' voice, the 'ILR' voice and the 'reporter's voice' . It is shown that the listener's subjective impression of the voice and the broadcaster personality is formed by the presentational and physical aspects of voice quality. Listener perceptions of voice acceptability are tested and discussed. The data is analysed to provide a set of dominant characteristics from which are drawn voice histograms and frequency polygons. The result is a set of preferred voice characteristics which apply specifically to the broadcast speaker and which can be sought during the selection process

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Book Reviews

    Get PDF

    Book Reviews

    Get PDF

    Detection of clinical depression in adolescents' using acoustic speech analysis

    Get PDF
    Clinical depression is a major risk factor in suicides and is associated with high mortality rates, therefore making it one of the leading causes of death worldwide every year. Symptoms of depression often first appear during adolescence at a time when the voice is changing, in both males and females, suggesting that specific studies of these phenomena in adolescent populations are warranted. The properties of acoustic speech have previously been investigated as possible cues for depression in adults. However, these studies were restricted to small populations of patients and the speech recordings were made during patient’s clinical interviews or fixed-text reading sessions. A collaborative effort with the Oregon research institute (ORI), USA allowed the development of a new speech corpus consisting of a large sample size of 139 adolescents (46 males and 93 females) that were divided into two groups (68 clinically depressed and 71 controls). The speech recordings were made during naturalistic interactions between adolescents and parents. Instead of covering a plethora of acoustic features in the investigation, this study takes the knowledge based from speech science and groups the acoustic features into five categories that relate to the physiological and perceptual areas of the speech production mechanism. These five acoustic feature categories consisted of the prosodic, cepstral, spectral, glottal and Teager energy operator (TEO) based features. The effectiveness in applying these acoustic feature categories in detecting adolescent’s depression was measured. The salient feature categories were determined by testing the feature categories and their combinations within a binary classification framework. In consistency with previous studies, it was observed that: - there are strong gender related differences in classification accuracy; - the glottal features provide an important enhancement of the classification accuracy when combined with other types of features; An important new contribution provided by this thesis was to observe that the TEO based features significantly outperformed prosodic, cepstral, spectral, glottal features and their combinations. An investigation into the possible reasons of such strong performance of the TEO features pointed into the importance of nonlinear mechanisms associated with the glottal flow formation as possible cues for depression

    Stress and emotion recognition in natural speech in the work and family environments

    Get PDF
    The speech stress and emotion recognition and classification technology has a potential to provide significant benefits to the national and international industry and society in general. The accuracy of an automatic emotion speech and emotion recognition relays heavily on the discrimination power of the characteristic features. This work introduced and examined a number of new linear and nonlinear feature extraction methods for an automatic detection of stress and emotion in speech. The proposed linear feature extraction methods included features derived from the speech spectrograms (SS-CB/BARK/ERB-AE, SS-AF-CB/BARK/ERB-AE, SS-LGF-OFS, SS-ALGF-OFS, SS-SP-ALGF-OFS and SS-sigma-pi), wavelet packets (WP-ALGF-OFS) and the empirical mode decomposition (EMD-AER). The proposed nonlinear feature extraction methods were based on the results of recent laryngological studies and nonlinear modelling of the phonation process. The proposed nonlinear features included the area under the TEO autocorrelation envelope based on different spectral decompositions (TEO-DWT, TEO-WP, TEO-PWP-S and TEO-PWP-G), as well as features representing spectral energy distribution of speech (AUSEES) and glottal waveform (AUSEEG). The proposed features were compared with features based on the classical linear model of speech production including F0, formants, MFCC and glottal time/frequency parameters. Two classifiers GMM and KNN were tested for consistency. The experiments used speech under actual stress from the SUSAS database (7 speakers; 3 female and 4 male) and speech with five naturally expressed emotions (neutral, anger, anxious, dysphoric and happy) from the ORI corpora (71 speakers; 27 female and 44 male). The nonlinear features clearly outperformed all the linear features. The classification results demonstrated consistency with the nonlinear model of the phonation process indicating that the harmonic structure and the spectral distribution of the glottal energy provide the most important cues for stress and emotion recognition in speech. The study also investigated if the automatic emotion recognition can determine differences in emotion expression between parents of depressed adolescents and parents of non-depressed adolescents. It was also investigated if there are differences in emotion expression between mothers and fathers in general. The experiment results indicated that parents of depressed adolescent produce stronger more exaggerated expressions of affect than parents of non-depressed children. And females in general provide easier to discriminate (more exaggerated) expressions of affect than males
    corecore