Search CORE

921 research outputs found

Radio Oranje: Enhanced Access to a Historical Spoken Word Collection

Author: Heeren Willemijn
Jong Franciska de
Ordelman Roeland
Werff Laurens van der
Publication venue: Landelijke Onderzoekschool Taalwetenschap
Publication date: 01/01/2007
Field of study

Access to historical audio collections is typically very restricted:\ud content is often only available on physical (analog) media and the\ud metadata is usually limited to keywords, giving access at the level\ud of relatively large fragments, e.g., an entire tape. Many spoken\ud word heritage collections are now being digitized, which allows the\ud introduction of more advanced search technology. This paper presents\ud an approach that supports online access and search for recordings of\ud historical speeches. A demonstrator has been built, based on the\ud so-called Radio Oranje collection, which contains radio speeches by\ud the Dutch Queen Wilhelmina that were broadcast during World War II.\ud The audio has been aligned with its original 1940s manual\ud transcriptions to create a time-stamped index that enables the speeches to be\ud searched at the word level. Results are presented together with\ud related photos from an external database

University of Twente Research Information

Utrecht University Repository

Fusion d'algorithmes de segmentation automatique de la parole de grands corpus et application à la synthèse vocale (Contrat France Télécom n°3ZCIF402)

Author: Jarifi Safaa
Pastor Dominique
Rosec Olivier
Publication venue: HAL CCSD
Publication date: 01/01/2006
Field of study

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

Author: Andreas Stolcke
Berger Adam L
Carletta Jean
Carol Van Ess-Dykema
Daniel Jurafsky
Dermatas Evangelos
Elizabeth Shriberg
Grosz Barbara J
Hirschberg Julia B
Klaus Ries
Marie Meteer
Noah Coccaro
Paul Taylor
Rachel Martin
Rebecca Bates
Publication venue
Publication date: 01/01/2000
Field of study

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Archive

Institutional Repository for Minnesota State University, Mankato

Automatic detection of drusen associated with age-related macular degeneration in optical coherence tomography: a graph-based approach

Author: Oliveira Jorge Miguel Gomes
Publication venue
Publication date: 24/04/2019
Field of study

Tese de Doutoramento em Líderes para Indústrias TecnológicasThe age-related macular degeneration (AMD) starts to manifest itself with the appearance of drusen. Progressively, the drusen increase in size and in number without causing alterations to vision. Nonetheless, their quantification is important because it correlates with the evolution of the disease to an advanced stage, which could lead to the loss of central vision. Manual quantification of drusen is impractical, since it is time-consuming and it requires specialized knowledge. Therefore, this work proposes a method for quantifying drusen automatically In this work, it is proposed a method for segmenting boundaries limiting drusen and another method for locating them through classification. The segmentation method is based on a multiple surface framework that is adapted for segmenting the limiting boundaries of drusen: the inner boundary of the retinal pigment epithelium + drusen complex (IRPEDC) and the Bruch’s membrane (BM). Several segmentation methods have been considerably successful in segmenting layers of healthy retinas in optical coherence tomography (OCT) images. These methods were successful because they incorporate prior information and regularization. However, these factors have the side-effect of hindering the segmentation in regions of altered morphology that often occur in diseased retinas. The proposed segmentation method takes into account the presence of lesion related with AMD, i.e., drusen and geographic atrophies (GAs). For that, it is proposed a segmentation scheme that excludes prior information and regularization that is only valid for healthy regions. Even with this segmentation scheme, the prior information and regularization can still cause the oversmoothing of some drusen. To address this problem, it is also proposed the integration of local shape priors in the form of a sparse high order potentials (SHOPs) into the multiple surface framework. Drusen are commonly detected by thresholding the distance among the boundaries that limit drusen. This approach misses drusen or portions of drusen with a height below the threshold. To improve the detection of drusen, Dufour et al. [1] proposed a classification method that detects drusen using textural information. In this work, the method of Dufour et al. [1] is extended by adding new features and performing multi-label classification, which allow the individual detection of drusen when these occur in clusters. Furthermore, local information is incorporated into the classification by combining the classifier with a hidden Markov model (HMM). Both the segmentation and detections methods were evaluated in a database of patients with intermediate AMD. The results suggest that both methods frequently perform better than some methods present in the literature. Furthermore, the results of these two methods form drusen delimitations that are closer to expert delimitations than two methods of the literature.A degenerescência macular relacionada com a idade (DMRI) começa a manifestar-se com o aparecimento de drusas. Progressivamente, as drusas aumentam em tamanho e em número sem causar alterações à visão. Porém, a sua quantificação é importante porque está correlacionada com a evolução da doença para um estado avançado, levar à perda de visão central. A quantificação manual de drusas é impraticável, já que é demorada e requer conhecimento especializado. Por isso, neste trabalho é proposto um método para segmentar drusas automaticamente. Neste trabalho, é proposto um método para segmentar as fronteiras que limitam as drusas e outro método para as localizar através de classificação. O método de segmentação é baseado numa ”framework” de múltiplas superfícies que é adaptada para segmentar as fronteiras que limitam as drusas: a fronteira interior do epitélio pigmentar + complexo de drusas e a membrana de Bruch. Vários métodos de segmentação foram consideravelmente bem-sucedidos a segmentar camadas de retinas saudáveis em imagens de tomografia de coerência ótica. Estes métodos foram bem-sucedidos porque incorporaram informação prévia e regularização. Contudo, estes fatores têm como efeito secundário dificultar a segmentação em regiões onde a morfologia da retina está alterada devido a doenças. O método de segmentação proposto toma em consideração a presença de lesões relacionadas com DMRI, .i.e., drusas e atrofia geográficas. Para isso, é proposto um esquema de segmentação que exclui informação prévia e regularização que são válidas apenas em regiões saudáveis da retina. Mesmo com este esquema de segmentação, a informação prévia e a regularização podem causar a suavização excessiva de algumas drusas. Para tentar resolver este problema, também é proposta a integração de informação prévia local sob a forma de potenciais esparsos de ordem elevada na ”framework” multi-superfície. As drusas são usalmente detetadas por ”thresholding” da distância entre as fronteiras que limitam as drusas. Esta abordagem falha drusas ou porções de drusas abaixo do ”threshold”. Para melhorar a deteção de drusas, Dufour et al. [1] propuseram um método de classificação que deteta drusas usando informação de texturas. Neste trabalho, o método de Dufour et al. [1] é estendido, adicionando novas características e realizando uma classificação com múltiplas classes, o que permite a deteção individual de drusas em aglomerados. Além disso, é incorporada informação local na classificação, combinando o classificador com um modelo oculto de Markov. Ambos os métodos de segmentação e deteção foram avaliados numa base de dados de pacientes com DMRI intermédia. Os resultados sugerem que ambos os métodos obtêm frequentemente melhores resultados que alguns métodos descritos na literatura. Para além disso, os resultados destes dois métodos formam delimitações de drusas que estão mais próximas das delimitações dos especialistas que dois métodos da literatura.This work was supported by FCT with the reference project UID/EEA/04436/2013, by FEDER funds through the COMPETE 2020 – Programa Operacional Competitividade e Internacionalização (POCI) with the reference project POCI-01-0145-FEDER-006941. Furthermore, the Portuguese funding institution Fundação Calouste Gulbenkian has conceded me a Ph.D. grant for this work. For that, I wish to acknowledge this institution. Additionally, I want to thank one of its members, Teresa Burnay, for all her assistance with issues related with the grant, for believing that my work was worth supporting and for encouraging me to apply for the grant

Universidade do Minho: RepositoriUM

A detection-based pattern recognition framework and its applications

Author: Ma Chengyuan
Publication venue: Georgia Institute of Technology
Publication date: 06/04/2010
Field of study

The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation. Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages. A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage. This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed in more applications in the future.Ph.D.Committee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Ghovanloo, Maysam; Committee Member: Romberg, Justin; Committee Member: Yuan, Min

Scholarly Materials And Research @ Georgia Tech

Acoustic-Phonetic Approaches for Improving Segment-Based Speech Recognition for Large Vocabulary Continuous Speech

Author: Likitsupin Krerksak
Punyabukkana Proadpran
Suchato Atiwong
Wutiwiwatchai Chai
Publication venue: 'Faculty of Engineering, Chulalongkorn University'
Publication date: 18/05/2016
Field of study

Segment-based speech recognition has shown to be a competitive alternative to the state-of-the-art HMM-based techniques. Its accuracies rely heavily on the quality of the segment graph from which the recognizer searches for the most likely recognition hypotheses. In order to increase the inclusion rate of actual segments in the graph, it is important to recover possible missing segments generated by segment-based segmentation algorithm. An aspect of this research focuses on determining the missing segments due to missed detection of segment boundaries. The acoustic discontinuities, together with manner-distinctive features are utilized to recover the missing segments. Another aspect of improvement to our segment-based framework tackles the restriction of having limited amount of training speech data which prevents the usage of more complex covariance matrices for the acoustic models. Feature dimensional reduction in the form of the Principal Component Analysis (PCA) is applied to enable the training of full covariance matrices and it results in improved segment-based phoneme recognition. Furthermore, to benefit from the fact that segment-based approach allows the integration of phonetic knowledge, we incorporate the probability of each segment being one type of sound unit of a certain specific common manner of articulation into the scoring of the segment graphs. Our experiment shows that, with the proposed improvements, our segment-based framework approximately increases the phoneme recognition accuracy by approximately 25% of the one obtained from the baseline segment-based speech recognition

Engineering Journal (Faculty of Engineering, Chulalongkorn University, Bangkok)

Blind Segmentation of Speech Using Non-Linear Filtering Methods

Author: Altosaar Toomas
Laine Unto K.
Räsänen Okko
Publication venue: 'IntechOpen'
Publication date: 13/06/2011
Field of study

IntechOpen

Application of singing synthesis techniquest to bertsolaritza

Author: Sarasola Aramendia Xabier
Publication venue
Publication date: 18/12/2020
Field of study

237 p

Archivo Digital para la Docencia y la Investigación

Recommended from our members

A digital neural network approach to speech recognition

Author: Haider Najmi Ghani
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/1989
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.This thesis presents two novel methods for isolated word speech recognition based on sub-word components. A digital neural network is the fundamental processing strategy in both methods. The first design is based on the 'Separate Segmentation & Labelling' (SS&L) approach. The spectral data of the input utterance is first segmented into phoneme-like units which are then time normalised by linear time normalisation. The neural network labels the time-normalised phoneme-like segments 78.36% recognition accuracy is achieved for the phoneme-like unit. In the second design, no time normalisation is required. After segmentation, recognition is performed by classifying the data in a window as it is slid one frame at a time, from the start to the end of of each phoneme-like segment in the utterance. 73.97% recognition accuracy for the phoneme-like unit is achieved in this application. The parameters of the neural net have been optimised for maximum recognition performance. A segmentation strategy using the sum of the difference in filterbank channel energy over successive spectra produced 80.27% correct segmentation of isolated utterances into phoneme-like units. A linguistic processor based on that of Kashyap & Mittal [84] enables 93.11% and 93.49% word recognition accuracy to be achieved for the SS&L and 'Sliding Window' recognisers respectively. The linguistic processor has been redesigned to make it portable so that it can be easily applied to any phoneme based isolated word speech recogniser.This work is funded by the Ministry of Science & Technology, Government of Pakistan

Brunel University Research Archive