27 research outputs found

    A cross-lingual adaptation approach for rapid development of speech recognizers for learning disabled users

    Get PDF
    Building a voice-operated system for learning disabled users is a difficult task that requires a considerable amount of time and effort. Due to the wide spectrum of disabilities and their different related phonopathies, most approaches available are targeted to a specific pathology. This may improve their accuracy for some users, but makes them unsuitable for others. In this paper, we present a cross-lingual approach to adapt a general-purpose modular speech recognizer for learning disabled people. The main advantage of this approach is that it allows rapid and cost-effective development by taking the already built speech recognition engine and its modules, and utilizing existing resources for standard speech in different languages for the recognition of the users’ atypical voices. Although the recognizers built with the proposed technique obtain lower accuracy rates than those trained for specific pathologies, they can be used by a wide population and developed more rapidly, which makes it possible to design various types of speech-based applications accessible to learning disabled users.This research was supported by the project ‘Favoreciendo la vida autónoma de discapacitados intelectuales con problemas de comunicación oral mediante interfaces personalizados de reconocimiento automático del habla’, financed by the Centre of Initiatives for Development Cooperation (Centro de Iniciativas de Cooperación al Desarrollo, CICODE), University of Granada, Spain. This research was supported by the Student Grant Scheme 2014 (SGS) at the Technical University of Liberec

    The effect of audio recordings and photographs of autistic and typical children on social judgments

    Get PDF
    In a counterbalanced, 2x2 mixed factorial design, 61 randomly assigned participants rated two audio recordings and two photographs of autistic or typical children. The hypothesis was that participants would judge autistic children most negatively when listening to audio recordings of them, but that they would judge photographs of autistic and typical children similarly. The two-way mixed ANOVA found a statistically significant main effect for the autistic versus typical child, but no statistically significant main effect for type of medium (recording vs. picture) nor interaction effect. This points to autistic children being judged more negatively in comparison to their typical peers, although it is unclear how those judgments are being determined, and has implications for diagnosis and therapeutic success

    Behaviorally Measuring Ease-of-Use by Analyzing Users’ Mouse Cursor Movements

    Get PDF
    Ease-of-use—the extent to which a technology is free of effort—is a hallmark of many successful websites and is a predictor of important user outcomes including intentions to use a system and a system’s perceived usefulness. We propose a behavior-based measure of ease-of-use based on the analysis of users’ mouse cursor movements. As a basis for this measure, we explain how ease-of-use influences the precision of users’ mouse cursor movements, extending Attentional Control Theory and the Response Activation Model. We propose two mousing statistics—Normalized Area under the Curve and Normalized Additional Distance—and predict that they are correlated with PEOU and can be used to differentiate ease-of-use among different tasks. We end by describing next steps to test our hypotheses and highlight potential implications

    Perceptual and acoustic assessment of a child’s speech before and after laryngeal web surgery

    Get PDF
    The aim of this paper was to point to the importance of early diagnostics and surgery in patients with laryngeal web in order to achieve normal breathing, as well as to stress the need for an interdisciplinary approach to observing the quality of voice and prosodic features at an early age. The subject under consideration was a 6.5-year-old girl who had previously been diagnosed with irregular breathing (R06). An endoscopic exam revealed a laryngeal web between the vocal folds and the fact that the posterior intercartilaginous section of the glottis of the child’s larynx was in order (normal). The child’s speech had been recorded in the acoustic studio, both before and after the vocal-fold surgery (six and twelve months later). Due to severe dysphonia, difficulties with breathing, and frequent noisy breathing (stridor), we recorded only the phonation of the vowel [a], as well as spontaneous speech before the surgery. In addition, there was intense glottic and supraglottic strain before the surgery, which in phonetics corresponds to the term laryngeal and supralaryngeal strain and pathologically creaky whispery phonation (according to VPA protocol). This strain was visible in the area of the chest, neck, and head, as well as audible in the voice quality. Acoustic analysis showed that the average F0 for the vowel [a] was remarkably high (442 Hz), and the pathological values were established using the following measures: local jitter (1.68%), local shimmer (0.7 dB), and the harmonic to noise ratio (17.6 dB). In contrast, six months after the surgery, the pitch for [a] was half the value of the preoperative one (220.5 Hz, p < 0.001), and the local jitter for all vowels (0.30-0.47%) and the harmonic to noise ratio (22.46 dB, p = 0.05) was within the normal range. There was also significant improvement in the F0 values, standard deviation of F0, and minimum and maximum F0 values. The average and median F0 values in spontaneous speech were also lower postoperatively. The voice quality showed a more balanced timbre (LTASS), particularly after one year. Some other prosodic features also showed improvement

    A method for calculating the strength of evidence associated with an earwitness’s claimed recognition of a familiar speaker

    Get PDF
    The present paper proposes and demonstrates a method for assessing strength of evidence when an earwitness claims to recognize the voice of a speaker who is familiar to them. The method calculates a Bayes factor that answers the question: What is the probability that the earwitness would claim to recognize the offender as the suspect if the offender was the suspect versus what is the probability that the earwitness would claim to recognize the offender as the suspect if the offender was not the suspect but some other speaker from the relevant population? By “claim” we mean a claim made by a cooperative earwitness not a claim made by an earwitness who is intentionally deceptive. Relevant data are derived from naïve listeners' responses to recordings of familiar speakers presented in a speaker lineup. The method is demonstrated under recording conditions that broadly reflect those of a real case

    Perceptual judgment of hypernasality and audible nasal emission in cleft palate speakers

    Full text link
    Objective: The purpose of this study is to determine whether a novel, user-friendly rating system, visual sort and rate (VSR) provides comparable ratings to the currently used direct magnitude estimation (DME) rating system for rating perceptions of audible nasal emission (ANE) and hypernasality in cleft palate speakers. Methods: Twelve naïve listeners rated 152 speech samples of speakers with cleft palate across four conditions: rating hypernasality and ANE using either a VSR or DME rating scale. Raters were provided with a short training session, prior to rating each day. Inter- and intra-rater reliabilities, as well the line of best fit between scores using VSR and scores using DME was calculated to determine usability of VSR as a novel rating system. Results: Direct magnitude estimation resulted in the highest levels of inter-rater reliability, when rating hypernasality (DME r= .48; VSR r=.14), as well as ANE (DME r= .27; VSR r=.15). Most raters demonstrated high intra-rater reliabilities across conditions. A curvilinear line of best fit most accurately captured the relationship between DME and VSR scores when rating hypernasality (r=.64) and ANE (r=.66). Conclusions: A curvilinear relationship between ratings suggests that both variables are prothetic, and therefore, best captured using a DME rating scale (Eadie & Doyle, 2002). The use of DME is supported for continued use rating hypernasality, even amongst naïve listeners given a training session. Rating ANE was difficult, as ratings yielded low inter-rater reliabilities, regardless of the scale used. Further research regarding perceptions of audible nasal emission is warranted

    Quantitative measurement of nasality in EMR children

    Full text link
    A new bioelectronic system for detecting and measuring voice parameters (TONAR) was used to quantify nasality in 50 educable mentally retarded children. Results indicated that over one-third of the children evaluated were hypernasal. The high prevalence of excessive nasality in EMR children was contrasted with (1) normative data on 78 nonretarded children collected with the bioelectronic system and (2) with prior incidence data based on listener judgments reported for retarded and nonretarded populations. The unique advantages of instrumental measurement of the elusive problem of hypernasality are discussed.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/22214/1/0000647.pd

    Perceptual judgment of hypernasality and audible nasal emission in cleft palate speakers

    Full text link
    Objective: The purpose of this study is to determine whether a novel, user-friendly rating system, visual sort and rate (VSR) provides comparable ratings to the currently used direct magnitude estimation (DME) rating system for rating perceptions of audible nasal emission (ANE) and hypernasality in cleft palate speakers. Methods: Twelve naïve listeners rated 152 speech samples of speakers with cleft palate across four conditions: rating hypernasality and ANE using either a VSR or DME rating scale. Raters were provided with a short training session, prior to rating each day. Inter- and intra-rater reliabilities, as well the line of best fit between scores using VSR and scores using DME was calculated to determine usability of VSR as a novel rating system. Results: Direct magnitude estimation resulted in the highest levels of inter-rater reliability, when rating hypernasality (DME r= .48; VSR r=.14), as well as ANE (DME r= .27; VSR r=.15). Most raters demonstrated high intra-rater reliabilities across conditions. A curvilinear line of best fit most accurately captured the relationship between DME and VSR scores when rating hypernasality (r=.64) and ANE (r=.66). Conclusions: A curvilinear relationship between ratings suggests that both variables are prothetic, and therefore, best captured using a DME rating scale (Eadie & Doyle, 2002). The use of DME is supported for continued use rating hypernasality, even amongst naïve listeners given a training session. Rating ANE was difficult, as ratings yielded low inter-rater reliabilities, regardless of the scale used. Further research regarding perceptions of audible nasal emission is warranted

    Pitch Perfect: Impression Formation and Impression Management in Women\u27s Pitch Modulation

    Get PDF
    How does the pitch of a woman’s voice impact how she is perceived, and how might women change the pitch of their voices to fit the situation at hand? Study 1 examined whether pitch plays a role in impression formation. Participants listened to two women’s voices at three pitch levels (raised, unchanged, lowered) and rated the speakers’ personality traits. Ratings of speaker competence, confidence, and intelligence were significantly lower for the pitch-raised voices than for the unchanged or pitch-lowered voices. Additionally, ratings of speaker persuasiveness and attractiveness were significantly lower for the pitch-raised voices than for the unchanged voices. No effect of pitch on sociability ratings was observed, but ratings of femininity were significantly lower for the pitch-lowered voices than for the unchanged or pitch-raised voices. Study 2 investigated whether women would modulate their pitch in different conversational contexts. Female participants were recorded answering questions in neutral, flirtatious, and professional conversational contexts over Zoom. No effects of context were observed for participants’ minimum, maximum, and median pitch, but participants’ mean pitch was significantly lower in the professional context than in the neutral context. The results of these studies suggested that pitch may be a factor in the formation of impressions about female speakers, and that women may, whether or not they are aware of the role of pitch in impression formation, modulate their voices to appear more professional
    corecore