9,153 research outputs found
Explaining Schizophrenia: Auditory Verbal Hallucination and SelfâMonitoring
Do selfâmonitoring accounts, a dominant account of the positive symptoms of schizophrenia, explain auditory verbal hallucination? In this essay, I argue that the account fails to answer crucial questions any explanation of auditory verbal hallucination must address. Where the account provides a plausible answer, I make the case for an alternative explanation: auditory verbal hallucination is not the result of a failed control mechanism, namely failed selfâmonitoring, but, rather, of the persistent automaticity of auditory experience of a voice. My argument emphasizes the importance of careful examination of phenomenology as providing substantive constraints on causal models of the positive symptoms in schizophrenia
An acoustic analysis of the Cantonese whispered tones
"A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, December 31, 2004."Also available in print.Thesis (B.Sc)--University of Hong Kong, 2004.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science
Recommended from our members
Junctions podcast
This report outlines the development process for Junctions, an original, mini-drama podcast that explores racial bias in the context of everyday interaction. The report consists of three sections â The Podcast, Pilot Episode, and Future Episodes â each of which speaks to a different aspect of the programâs framework. It also includes nearly all of the planning documents I created during pre-production, production, and post-production.Radio-Television-Fil
Addressee Identification In Face-to-Face Meetings
We present results on addressee identification in four-participants face-to-face meetings using Bayesian Network and Naive Bayes classifiers. First, we investigate how well the addressee of a dialogue act can be predicted based on gaze, utterance and conversational context features. Then, we explore whether information about meeting context can aid classifiersâ performances. Both classifiers perform the best when conversational context and utterance features are combined with speakerâs gaze information. The classifiers show little gain from information about meeting context
Real-time interactive speech technology at Threshold Technology, Incorporated
Basic real-time isolated-word recognition techniques are reviewed. Industrial applications of voice technology are described in chronological order of their development. Future research efforts are also discussed
Speaker and Speech Recognition Using Hierarchy Support Vector Machine and Backpropagation
Voice signal processing has been proposed to improve effectiveness and facilitate the public, such as Smart Home. This study aims a smart home simulation model to move doors, TVs, and lights from voice instructions. Sound signals are processed using Mel-frequency Cepstrum Coefficients (MFCC) to perform feature extraction. Then, the voice is recognized by the speaker using a hierarchy Support Vector Machine (SVM). So that unregistered speakers are not processed or are declared not having access rights. For the process of recognizing spoken words such as "Open the Doorâ,"Close the Door","Turn on the TV","Turn off the TV","Turn on the Lights" and "Turn Offthe Lights" are done using Backpropagation. The results showed that hierarchy SVM provided an accuracy of 71% compared to the single SVM of 45%
Speech Mode Classification using the Fusion of CNNs and LSTM Networks
Speech mode classification is an area that has not been as widely explored in the field of sound classification as others such as environmental sounds, music genre, and speaker identification. But what is speech mode? While mode is defined as the way or the manner in which something occurs or is expressed or done, speech mode is defined as the style in which the speech is delivered by a person.
There are some reports on speech mode classification using conventional methods, such as whispering and talking using a normal phonetic sound. However, to the best of our knowledge, deep learning-based methods have not been reported in the open literature for the aforementioned classification scenario. Specifically, in this work we assess the performance of image-based classification algorithms on this challenging speech mode classification problem, including the usage of pre-trained deep neural networks, namely AlexNet, ResNet18 and SqueezeNet. Thus, we compare the classification efficiency of a set of deep learning-based classifiers, while we also assess the impact of different 2D image representations (spectrograms, mel-spectrograms, and their image-based fusion) on classification accuracy. These representations are used as input to the networks after being generated from the original audio signals. Next, we compare the accuracy of the DL-based classifies to a set of machine learning (ML) ones that use as their inputs Mel-Frequency Cepstral Coefficients (MFCCs) features. Then, after determining the most efficient sampling rate for our classification problem (i.e. 32kHz), we study the performance of our proposed method of combining CNN with LSTM (Long Short-Term Memory) networks. For this purpose, we use the features extracted from the deep networks of the previous step. We conclude our study by evaluating the role of sampling rates on classification accuracy by generating two sets of 2D image representations â one with 32kHz and the other with 16kHz sampling. Experimental results show that after cross validation the accuracy of DL-based approaches is 15% higher than ML ones, with SqueezeNet yielding an accuracy of more than 91% at 32kHz, whether we use transfer learning, feature-level fusion or score-level fusion (92.5%). Our proposed method using LSTMs further increased that accuracy by more than 3%, resulting in an average accuracy of 95.7%
- âŠ