34 research outputs found

    An Effective Speech Understanding Method with a Multiple Speech Recognizer based on Output Selection using Edit Distance

    Get PDF
    PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled

    Get PDF
    In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditions is presented. For automatic speech recognition systems based on statistical methods, it is important that the conditions of the audio used for training the statistical models match the conditions of the audio to be processed. Any mismatch will decrease the accuracy of the recognition. If it is unpredictable what kind of data can be expected, or in other words if the conditions of the audio to be processed are unknown, it is impossible to tune the models. If the material consists of `surprise data' the output of the system is likely to be poor. In this thesis methods are presented for which no external training data is required for training models. These novel methods have been implemented in a large vocabulary continuous speech recognition system called SHoUT. This system consists of three subsystems: speech/non-speech classification, speaker diarization and automatic speech recognition. The speech/non-speech classification subsystem separates speech from silence and unknown audible non-speech events. The type of non-speech present in audio recordings can vary from paper shuffling in recordings of meetings to sound effects in television shows. Because it is unknown what type of non-speech needs to be detected, it is not possible to train high quality statistical models for each type of non-speech sound. The speech/non-speech classification subsystem, also called the speech activity detection subsystem, does not attempt to classify all audible non-speech in a single run. Instead, first a bootstrap speech/silence classification is obtained using a standard speech activity component. Next, the models for speech, silence and audible non-speech are trained on the target audio using the bootstrap classification. This approach makes it possible to classify speech and non-speech with high accuracy, without the need to know what kinds of sound are present in the audio recording. Once all non-speech is filtered out of the audio, it is the task of the speaker diarization subsystem to determine how many speakers occur in the recording and exactly when they are speaking. The speaker diarization subsystem applies agglomerative clustering to create clusters of speech fragments for each speaker in the recording. First, statistical speaker models are created on random chunks of the recording and by iteratively realigning the data, retraining the models and merging models that represent the same speaker, accurate speaker models are obtained for speaker clustering. This method does not require any statistical models developed on a training set, which makes the diarization subsystem insensitive for variation in audio conditions. Unfortunately, because the algorithm is of complexity O(n3)O(n^3), this clustering method is slow for long recordings. Two variations of the subsystem are presented that reduce the needed computational effort, so that the subsystem is applicable for long audio recordings as well. The automatic speech recognition subsystem developed for this research, is based on Viterbi decoding on a fixed pronunciation prefix tree. Using the fixed tree, a flexible modular decoder could be developed, but it was not straightforward to apply full language model look-ahead efficiently. In this thesis a novel method is discussed that makes it possible to apply language model look-ahead effectively on the fixed tree. Also, to obtain higher speech recognition accuracy on audio with unknown acoustical conditions, a selection from the numerous known methods that exist for robust automatic speech recognition is applied and evaluated in this thesis. The three individual subsystems as well as the entire system have been successfully evaluated on three international benchmarks. The diarization subsystem has been evaluated at the NIST RT06s benchmark and the speech activity detection subsystem has been tested at RT07s. The entire system was evaluated at N-Best, the first automatic speech recognition benchmark for Dutch

    Detecting early signs of dementia in conversation

    Get PDF
    Dementia can affect a person's speech, language and conversational interaction capabilities. The early diagnosis of dementia is of great clinical importance. Recent studies using the qualitative methodology of Conversation Analysis (CA) demonstrated that communication problems may be picked up during conversations between patients and neurologists and that this can be used to differentiate between patients with Neuro-degenerative Disorders (ND) and those with non-progressive Functional Memory Disorder (FMD). However, conducting manual CA is expensive and difficult to scale up for routine clinical use.\ud This study introduces an automatic approach for processing such conversations which can help in identifying the early signs of dementia and distinguishing them from the other clinical categories (FMD, Mild Cognitive Impairment (MCI), and Healthy Control (HC)). The dementia detection system starts with a speaker diarisation module to segment an input audio file (determining who talks when). Then the segmented files are passed to an automatic speech recogniser (ASR) to transcribe the utterances of each speaker. Next, the feature extraction unit extracts a number of features (CA-inspired, acoustic, lexical and word vector) from the transcripts and audio files. Finally, a classifier is trained by the features to determine the clinical category of the input conversation. Moreover, we investigate replacing the role of a neurologist in the conversation with an Intelligent Virtual Agent (IVA) (asking similar questions). We show that despite differences between the IVA-led and the neurologist-led conversations, the results achieved by the IVA are as good as those gained by the neurologists. Furthermore, the IVA can be used for administering more standard cognitive tests, like the verbal fluency tests and produce automatic scores, which then can boost the performance of the classifier. The final blind evaluation of the system shows that the classifier can identify early signs of dementia with an acceptable level of accuracy and robustness (considering both sensitivity and specificity)

    Physics-constrained Hyperspectral Data Exploitation Across Diverse Atmospheric Scenarios

    Get PDF
    Hyperspectral target detection promises new operational advantages, with increasing instrument spectral resolution and robust material discrimination. Resolving surface materials requires a fast and accurate accounting of atmospheric effects to increase detection accuracy while minimizing false alarms. This dissertation investigates deep learning methods constrained by the processes governing radiative transfer to efficiently perform atmospheric compensation on data collected by long-wave infrared (LWIR) hyperspectral sensors. These compensation methods depend on generative modeling techniques and permutation invariant neural network architectures to predict LWIR spectral radiometric quantities. The compensation algorithms developed in this work were examined from the perspective of target detection performance using collected data. These deep learning-based compensation algorithms resulted in comparable detection performance to established methods while accelerating the image processing chain by 8X

    Automatic detection of drusen associated with age-related macular degeneration in optical coherence tomography: a graph-based approach

    Get PDF
    Tese de Doutoramento em Líderes para Indústrias TecnológicasThe age-related macular degeneration (AMD) starts to manifest itself with the appearance of drusen. Progressively, the drusen increase in size and in number without causing alterations to vision. Nonetheless, their quantification is important because it correlates with the evolution of the disease to an advanced stage, which could lead to the loss of central vision. Manual quantification of drusen is impractical, since it is time-consuming and it requires specialized knowledge. Therefore, this work proposes a method for quantifying drusen automatically In this work, it is proposed a method for segmenting boundaries limiting drusen and another method for locating them through classification. The segmentation method is based on a multiple surface framework that is adapted for segmenting the limiting boundaries of drusen: the inner boundary of the retinal pigment epithelium + drusen complex (IRPEDC) and the Bruch’s membrane (BM). Several segmentation methods have been considerably successful in segmenting layers of healthy retinas in optical coherence tomography (OCT) images. These methods were successful because they incorporate prior information and regularization. However, these factors have the side-effect of hindering the segmentation in regions of altered morphology that often occur in diseased retinas. The proposed segmentation method takes into account the presence of lesion related with AMD, i.e., drusen and geographic atrophies (GAs). For that, it is proposed a segmentation scheme that excludes prior information and regularization that is only valid for healthy regions. Even with this segmentation scheme, the prior information and regularization can still cause the oversmoothing of some drusen. To address this problem, it is also proposed the integration of local shape priors in the form of a sparse high order potentials (SHOPs) into the multiple surface framework. Drusen are commonly detected by thresholding the distance among the boundaries that limit drusen. This approach misses drusen or portions of drusen with a height below the threshold. To improve the detection of drusen, Dufour et al. [1] proposed a classification method that detects drusen using textural information. In this work, the method of Dufour et al. [1] is extended by adding new features and performing multi-label classification, which allow the individual detection of drusen when these occur in clusters. Furthermore, local information is incorporated into the classification by combining the classifier with a hidden Markov model (HMM). Both the segmentation and detections methods were evaluated in a database of patients with intermediate AMD. The results suggest that both methods frequently perform better than some methods present in the literature. Furthermore, the results of these two methods form drusen delimitations that are closer to expert delimitations than two methods of the literature.A degenerescência macular relacionada com a idade (DMRI) começa a manifestar-se com o aparecimento de drusas. Progressivamente, as drusas aumentam em tamanho e em número sem causar alterações à visão. Porém, a sua quantificação é importante porque está correlacionada com a evolução da doença para um estado avançado, levar à perda de visão central. A quantificação manual de drusas é impraticável, já que é demorada e requer conhecimento especializado. Por isso, neste trabalho é proposto um método para segmentar drusas automaticamente. Neste trabalho, é proposto um método para segmentar as fronteiras que limitam as drusas e outro método para as localizar através de classificação. O método de segmentação é baseado numa ”framework” de múltiplas superfícies que é adaptada para segmentar as fronteiras que limitam as drusas: a fronteira interior do epitélio pigmentar + complexo de drusas e a membrana de Bruch. Vários métodos de segmentação foram consideravelmente bem-sucedidos a segmentar camadas de retinas saudáveis em imagens de tomografia de coerência ótica. Estes métodos foram bem-sucedidos porque incorporaram informação prévia e regularização. Contudo, estes fatores têm como efeito secundário dificultar a segmentação em regiões onde a morfologia da retina está alterada devido a doenças. O método de segmentação proposto toma em consideração a presença de lesões relacionadas com DMRI, .i.e., drusas e atrofia geográficas. Para isso, é proposto um esquema de segmentação que exclui informação prévia e regularização que são válidas apenas em regiões saudáveis da retina. Mesmo com este esquema de segmentação, a informação prévia e a regularização podem causar a suavização excessiva de algumas drusas. Para tentar resolver este problema, também é proposta a integração de informação prévia local sob a forma de potenciais esparsos de ordem elevada na ”framework” multi-superfície. As drusas são usalmente detetadas por ”thresholding” da distância entre as fronteiras que limitam as drusas. Esta abordagem falha drusas ou porções de drusas abaixo do ”threshold”. Para melhorar a deteção de drusas, Dufour et al. [1] propuseram um método de classificação que deteta drusas usando informação de texturas. Neste trabalho, o método de Dufour et al. [1] é estendido, adicionando novas características e realizando uma classificação com múltiplas classes, o que permite a deteção individual de drusas em aglomerados. Além disso, é incorporada informação local na classificação, combinando o classificador com um modelo oculto de Markov. Ambos os métodos de segmentação e deteção foram avaliados numa base de dados de pacientes com DMRI intermédia. Os resultados sugerem que ambos os métodos obtêm frequentemente melhores resultados que alguns métodos descritos na literatura. Para além disso, os resultados destes dois métodos formam delimitações de drusas que estão mais próximas das delimitações dos especialistas que dois métodos da literatura.This work was supported by FCT with the reference project UID/EEA/04436/2013, by FEDER funds through the COMPETE 2020 – Programa Operacional Competitividade e Internacionalização (POCI) with the reference project POCI-01-0145-FEDER-006941. Furthermore, the Portuguese funding institution Fundação Calouste Gulbenkian has conceded me a Ph.D. grant for this work. For that, I wish to acknowledge this institution. Additionally, I want to thank one of its members, Teresa Burnay, for all her assistance with issues related with the grant, for believing that my work was worth supporting and for encouraging me to apply for the grant
    corecore