9,153 research outputs found

    Explaining Schizophrenia: Auditory Verbal Hallucination and Self‐Monitoring

    Get PDF
    Do self‐monitoring accounts, a dominant account of the positive symptoms of schizophrenia, explain auditory verbal hallucination? In this essay, I argue that the account fails to answer crucial questions any explanation of auditory verbal hallucination must address. Where the account provides a plausible answer, I make the case for an alternative explanation: auditory verbal hallucination is not the result of a failed control mechanism, namely failed self‐monitoring, but, rather, of the persistent automaticity of auditory experience of a voice. My argument emphasizes the importance of careful examination of phenomenology as providing substantive constraints on causal models of the positive symptoms in schizophrenia

    An acoustic analysis of the Cantonese whispered tones

    Get PDF
    "A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, December 31, 2004."Also available in print.Thesis (B.Sc)--University of Hong Kong, 2004.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    Addressee Identification In Face-to-Face Meetings

    Get PDF
    We present results on addressee identification in four-participants face-to-face meetings using Bayesian Network and Naive Bayes classifiers. First, we investigate how well the addressee of a dialogue act can be predicted based on gaze, utterance and conversational context features. Then, we explore whether information about meeting context can aid classifiers’ performances. Both classifiers perform the best when conversational context and utterance features are combined with speaker’s gaze information. The classifiers show little gain from information about meeting context

    Real-time interactive speech technology at Threshold Technology, Incorporated

    Get PDF
    Basic real-time isolated-word recognition techniques are reviewed. Industrial applications of voice technology are described in chronological order of their development. Future research efforts are also discussed

    Speaker and Speech Recognition Using Hierarchy Support Vector Machine and Backpropagation

    Get PDF
    Voice signal processing has been proposed to improve effectiveness and facilitate the public, such as Smart Home. This study aims a smart home simulation model to move doors, TVs, and lights from voice instructions. Sound signals are processed using Mel-frequency Cepstrum Coefficients (MFCC) to perform feature extraction. Then, the voice is recognized by the speaker using a hierarchy Support Vector Machine (SVM). So that unregistered speakers are not processed or are declared not having access rights. For the process of recognizing spoken words such as "Open the Door”,"Close the Door","Turn on the TV","Turn off the TV","Turn on the Lights" and "Turn Offthe Lights" are done using Backpropagation. The results showed that hierarchy SVM provided an accuracy of 71% compared to the single SVM of 45%

    Simultaneous interpreting as an aid in parallel-medium tertiary education

    Get PDF
    No Abstract

    Speech Mode Classification using the Fusion of CNNs and LSTM Networks

    Get PDF
    Speech mode classification is an area that has not been as widely explored in the field of sound classification as others such as environmental sounds, music genre, and speaker identification. But what is speech mode? While mode is defined as the way or the manner in which something occurs or is expressed or done, speech mode is defined as the style in which the speech is delivered by a person. There are some reports on speech mode classification using conventional methods, such as whispering and talking using a normal phonetic sound. However, to the best of our knowledge, deep learning-based methods have not been reported in the open literature for the aforementioned classification scenario. Specifically, in this work we assess the performance of image-based classification algorithms on this challenging speech mode classification problem, including the usage of pre-trained deep neural networks, namely AlexNet, ResNet18 and SqueezeNet. Thus, we compare the classification efficiency of a set of deep learning-based classifiers, while we also assess the impact of different 2D image representations (spectrograms, mel-spectrograms, and their image-based fusion) on classification accuracy. These representations are used as input to the networks after being generated from the original audio signals. Next, we compare the accuracy of the DL-based classifies to a set of machine learning (ML) ones that use as their inputs Mel-Frequency Cepstral Coefficients (MFCCs) features. Then, after determining the most efficient sampling rate for our classification problem (i.e. 32kHz), we study the performance of our proposed method of combining CNN with LSTM (Long Short-Term Memory) networks. For this purpose, we use the features extracted from the deep networks of the previous step. We conclude our study by evaluating the role of sampling rates on classification accuracy by generating two sets of 2D image representations – one with 32kHz and the other with 16kHz sampling. Experimental results show that after cross validation the accuracy of DL-based approaches is 15% higher than ML ones, with SqueezeNet yielding an accuracy of more than 91% at 32kHz, whether we use transfer learning, feature-level fusion or score-level fusion (92.5%). Our proposed method using LSTMs further increased that accuracy by more than 3%, resulting in an average accuracy of 95.7%
    • 

    corecore