921 research outputs found
Radio Oranje: Enhanced Access to a Historical Spoken Word Collection
Access to historical audio collections is typically very restricted:\ud
content is often only available on physical (analog) media and the\ud
metadata is usually limited to keywords, giving access at the level\ud
of relatively large fragments, e.g., an entire tape. Many spoken\ud
word heritage collections are now being digitized, which allows the\ud
introduction of more advanced search technology. This paper presents\ud
an approach that supports online access and search for recordings of\ud
historical speeches. A demonstrator has been built, based on the\ud
so-called Radio Oranje collection, which contains radio speeches by\ud
the Dutch Queen Wilhelmina that were broadcast during World War II.\ud
The audio has been aligned with its original 1940s manual\ud
transcriptions to create a time-stamped index that enables the speeches to be\ud
searched at the word level. Results are presented together with\ud
related photos from an external database
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in
conversational speech, i.e., speech-act-like units such as Statement, Question,
Backchannel, Agreement, Disagreement, and Apology. Our model detects and
predicts dialogue acts based on lexical, collocational, and prosodic cues, as
well as on the discourse coherence of the dialogue act sequence. The dialogue
model is based on treating the discourse structure of a conversation as a
hidden Markov model and the individual dialogue acts as observations emanating
from the model states. Constraints on the likely sequence of dialogue acts are
modeled via a dialogue act n-gram. The statistical dialogue grammar is combined
with word n-grams, decision trees, and neural networks modeling the
idiosyncratic lexical and prosodic manifestations of each dialogue act. We
develop a probabilistic integration of speech recognition with dialogue
modeling, to improve both speech recognition and dialogue act classification
accuracy. Models are trained and evaluated using a large hand-labeled database
of 1,155 conversations from the Switchboard corpus of spontaneous
human-to-human telephone speech. We achieved good dialogue act labeling
accuracy (65% based on errorful, automatically recognized words and prosody,
and 71% based on word transcripts, compared to a chance baseline accuracy of
35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling
changed
Automatic detection of drusen associated with age-related macular degeneration in optical coherence tomography: a graph-based approach
Tese de Doutoramento em Líderes para Indústrias TecnológicasThe age-related macular degeneration (AMD) starts to manifest itself with the appearance of drusen. Progressively, the drusen increase in size and in number without causing alterations to vision. Nonetheless, their quantification is important because it correlates with the evolution of the disease to an advanced stage, which could lead to the loss of central vision. Manual quantification of drusen is impractical, since it is time-consuming and it requires specialized knowledge. Therefore, this work proposes a method for quantifying drusen automatically In this work, it is proposed a method for segmenting boundaries limiting drusen and another method for locating them through classification. The segmentation method is based on a multiple surface framework that is adapted for segmenting the limiting boundaries of drusen: the inner boundary of the retinal pigment epithelium + drusen complex (IRPEDC) and the Bruch’s membrane (BM). Several segmentation methods have been considerably successful in segmenting layers of healthy retinas in optical coherence tomography (OCT) images. These methods were successful because they incorporate prior information and regularization. However, these factors have the side-effect of hindering the segmentation in regions of altered morphology that often occur in diseased retinas. The proposed segmentation method takes into account the presence of lesion related with AMD, i.e., drusen and geographic atrophies (GAs). For that, it is proposed a segmentation scheme that excludes prior information and regularization that is only valid for healthy regions. Even with this segmentation scheme, the prior information and regularization can still cause the oversmoothing of some drusen. To address this problem, it is also proposed the integration of local shape priors in the form of a sparse high order potentials (SHOPs) into the multiple surface framework.
Drusen are commonly detected by thresholding the distance among the boundaries that limit drusen. This
approach misses drusen or portions of drusen with a height below the threshold. To improve the detection
of drusen, Dufour et al. [1] proposed a classification method that detects drusen using textural information.
In this work, the method of Dufour et al. [1] is extended by adding new features and performing multi-label
classification, which allow the individual detection of drusen when these occur in clusters. Furthermore, local
information is incorporated into the classification by combining the classifier with a hidden Markov model (HMM).
Both the segmentation and detections methods were evaluated in a database of patients with intermediate
AMD. The results suggest that both methods frequently perform better than some methods present in the
literature. Furthermore, the results of these two methods form drusen delimitations that are closer to expert
delimitations than two methods of the literature.A degenerescência macular relacionada com a idade (DMRI) começa a manifestar-se com o aparecimento de
drusas. Progressivamente, as drusas aumentam em tamanho e em número sem causar alterações à visão.
Porém, a sua quantificação é importante porque está correlacionada com a evolução da doença para um estado avançado, levar à perda de visão central. A quantificação manual de drusas é impraticável, já que é demorada e requer conhecimento especializado. Por isso, neste trabalho é proposto um método para segmentar drusas automaticamente.
Neste trabalho, é proposto um método para segmentar as fronteiras que limitam as drusas e outro método
para as localizar através de classificação. O método de segmentação é baseado numa ”framework” de múltiplas superfícies que é adaptada para segmentar as fronteiras que limitam as drusas: a fronteira interior do epitélio pigmentar + complexo de drusas e a membrana de Bruch. Vários métodos de segmentação foram consideravelmente bem-sucedidos a segmentar camadas de retinas saudáveis em imagens de tomografia de coerência ótica. Estes métodos foram bem-sucedidos porque incorporaram informação prévia e regularização. Contudo, estes fatores têm como efeito secundário dificultar a segmentação em regiões onde a morfologia da retina está alterada devido a doenças. O método de segmentação proposto toma em consideração a presença de lesões relacionadas com DMRI, .i.e., drusas e atrofia geográficas. Para isso, é proposto um esquema de segmentação que exclui informação prévia e regularização que são válidas apenas em regiões saudáveis da retina. Mesmo com este esquema de segmentação, a informação prévia e a regularização podem causar a suavização excessiva de algumas drusas. Para tentar resolver este problema, também é proposta a integração de informação prévia local sob a forma de potenciais esparsos de ordem elevada na ”framework” multi-superfície.
As drusas são usalmente detetadas por ”thresholding” da distância entre as fronteiras que limitam as drusas.
Esta abordagem falha drusas ou porções de drusas abaixo do ”threshold”. Para melhorar a deteção de drusas,
Dufour et al. [1] propuseram um método de classificação que deteta drusas usando informação de texturas.
Neste trabalho, o método de Dufour et al. [1] é estendido, adicionando novas características e realizando uma
classificação com múltiplas classes, o que permite a deteção individual de drusas em aglomerados. Além disso, é incorporada informação local na classificação, combinando o classificador com um modelo oculto de Markov.
Ambos os métodos de segmentação e deteção foram avaliados numa base de dados de pacientes com DMRI
intermédia. Os resultados sugerem que ambos os métodos obtêm frequentemente melhores resultados que
alguns métodos descritos na literatura. Para além disso, os resultados destes dois métodos formam delimitações de drusas que estão mais próximas das delimitações dos especialistas que dois métodos da literatura.This work was supported by FCT with the reference project UID/EEA/04436/2013, by FEDER funds through
the COMPETE 2020 – Programa Operacional Competitividade e Internacionalização (POCI) with the reference
project POCI-01-0145-FEDER-006941.
Furthermore, the Portuguese funding institution Fundação Calouste Gulbenkian has conceded me a Ph.D.
grant for this work. For that, I wish to acknowledge this institution. Additionally, I want to thank one of its
members, Teresa Burnay, for all her assistance with issues related with the grant, for believing that my work was worth supporting and for encouraging me to apply for the grant
A detection-based pattern recognition framework and its applications
The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation.
Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages.
A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts
can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage.
This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed
in more applications in the future.Ph.D.Committee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Ghovanloo, Maysam; Committee Member: Romberg, Justin; Committee Member: Yuan, Min
Acoustic-Phonetic Approaches for Improving Segment-Based Speech Recognition for Large Vocabulary Continuous Speech
Segment-based speech recognition has shown to be a competitive alternative to the state-of-the-art HMM-based techniques. Its accuracies rely heavily on the quality of the segment graph from which the recognizer searches for the most likely recognition hypotheses. In order to increase the inclusion rate of actual segments in the graph, it is important to recover possible missing segments generated by segment-based segmentation algorithm. An aspect of this research focuses on determining the missing segments due to missed detection of segment boundaries. The acoustic discontinuities, together with manner-distinctive features are utilized to recover the missing segments. Another aspect of improvement to our segment-based framework tackles the restriction of having limited amount of training speech data which prevents the usage of more complex covariance matrices for the acoustic models. Feature dimensional reduction in the form of the Principal Component Analysis (PCA) is applied to enable the training of full covariance matrices and it results in improved segment-based phoneme recognition. Furthermore, to benefit from the fact that segment-based approach allows the integration of phonetic knowledge, we incorporate the probability of each segment being one type of sound unit of a certain specific common manner of articulation into the scoring of the segment graphs. Our experiment shows that, with the proposed improvements, our segment-based framework approximately increases the phoneme recognition accuracy by approximately 25% of the one obtained from the baseline segment-based speech recognition
Recommended from our members
A digital neural network approach to speech recognition
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.This thesis presents two novel methods for isolated word speech recognition based on sub-word components. A digital neural network is the fundamental processing strategy in both methods. The first design is based on the 'Separate Segmentation &
Labelling' (SS&L) approach. The spectral data of the input utterance is first segmented into phoneme-like units which are then time normalised by linear time normalisation. The neural network labels the
time-normalised phoneme-like segments 78.36% recognition accuracy is achieved for the phoneme-like unit. In the second design, no time normalisation is required. After segmentation, recognition is performed by classifying the data in a window as it is slid one frame at a time, from the start to the end of of each phoneme-like segment in the utterance. 73.97% recognition accuracy for the phoneme-like unit is achieved in this application. The parameters of the neural net have been optimised for
maximum recognition performance. A segmentation strategy using the sum of the difference in filterbank channel energy over successive spectra produced 80.27% correct segmentation of isolated utterances into phoneme-like units. A linguistic processor based on that of Kashyap & Mittal [84] enables 93.11% and 93.49% word recognition accuracy to be achieved for the SS&L and 'Sliding Window' recognisers respectively. The linguistic processor has been redesigned to make it portable so that it can be easily applied to any phoneme based isolated word speech recogniser.This work is funded by the Ministry of Science & Technology, Government of Pakistan
- …