13 research outputs found

    Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation

    Get PDF
    We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.Comment: 27 pages, 8 figure

    Improving automated segmentation of radio shows with audio embeddings

    Full text link
    Audio features have been proven useful for increasing the performance of automated topic segmentation systems. This study explores the novel task of using audio embeddings for automated, topically coherent segmentation of radio shows. We created three different audio embedding generators using multi-class classification tasks on three datasets from different domains. We evaluate topic segmentation performance of the audio embeddings and compare it against a text-only baseline. We find that a set-up including audio embeddings generated through a non-speech sound event classification task significantly outperforms our text-only baseline by 32.3% in F1-measure. In addition, we find that different classification tasks yield audio embeddings that vary in segmentation performance.Comment: 5 pages, 2 figures, submitted to ICASSP202

    Prosody-Based Automatic Segmentation of Speech into Sentences and Topics

    Get PDF
    A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models -- for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2), Special Issue on Accessing Information in Spoken Audio, September 200

    News Story Segmentation in Multiple Modalities

    Full text link

    Extracting and using prosodic information for Turkish spoken language processing

    Get PDF
    Bu projede genel olarak, konuşulan dili (Türkçe) anlamada, konuşulan dilin bürünsel/ezgisel (prosodic) ve sözcüksel (lexical) özelliklerinin ortaya çıkarılması ve bu özelliklerin konuşulan dilin bilgisayarla otomatik olarak işlenmesinde kullanılması amaçlanmaktadır. Bu daha özel olarak, otomatik konuşma tanıyıcısının (ASR) çıkışına ilişkin cümle bölütleme işlevini içermektedir. Otomatik konuşma tanıma sistemlerinden çıkan yazılı metnin özellikle noktalama (punctuation), büyük küçük harf farklılıkları ve vurgu, tonlama, perde, durak gibi konuşmaya ilişkin temel bazı parametrelerden yoksun olması veya bu özellikleri kaybetmiş olması, özellikle anlamda farklılıklara yol açmaktadır. Bu çıktının zenginleştirilmesi (enrichment) başka bir deyiş ile bu özelliklerin tekrar geriye kazandırılması, bu metinlerin hem insanlar tarafından okunmasını ve doğru algılanmasını hem de makineler tarafından işlenmesini kolaylaştıracaktır. Bu projedeki amaç, bu zenginleştirme ve geri kazandırım işleminin dilin bürünsel özelliklerinden yararlanarak yapılmasıdır.The text which the output of the Automatic Speech Recognition (ASR) system lacks especially punctuation, differences in the capitalization and the parameters related to the speaking such as stress, tone, pitch, pause cause some differences in the meaning. Enrichment of this output or another words to gain this features back to the output will provide either reading and understanding of the humans or processing of the machines easily. The aim of this project is doing this enrichment and the process of gaining back by using the prosodic features of the spoken language. In this proposal, we would like to examine the extraction and use of prosodic information in addition to lexical features for spoken language processing of Turkish. Specifically, we would like to research the use of prosodic features for sentence segmentation of Turkish speech. Another outcome of the project is to obtain a database of prosodic features at the word and morpheme level, which can be used for other purposes such as morphological disambiguation or word sense disambiguation. Turkish is an agglutinative language. Thus, the text should be analyzed morphologically in order to determine the root forms and the suffixes of the words before further analysis. In the framework of this project, we also would like to examine the interaction of prosodic features with morphological information. The role of sentence segmentation is to detect sentence boundaries in the stream of words provided by the ASR module for further downstream processing. This is helpful for various language processing tasks, such as parsing, machine translation and question answering. We formulate sentence segmentation as a binary classification task. For each position between two consecutive words the system must decide if the position marks a boundary between two sentences or if the two neighboring words belong to the same sentence. The sentence segmentation process is established by combining the Hidden Event Language Models (HELMs) with discriminative classification methods. The HELM takes into account the sequence of words and the output discriminative classification methods such as decision tree that is based on prosodic features such as pause durations. The new approach combines the HELMs for exploiting lexical information, with maximum entropy and boosting classifiers that tightly integrate lexical, as well as prosodic, speaker change and syntactic features. The boostingbased classifier alone performs better than all the other classification schemes. When combined with a hidden event language model the improvement is even more pronounced.Publisher's Versio

    Bürünsel, sözcüksel ve biçimbilgisel bilgiyi kullanan co-training ile Türkçe konuşma dilinin otomatik cümle bölütlemesi

    Get PDF
    Co-training, web sayfası sınıflandırması, kelime anlam açıklaştırma ve adlandırılmış varlık tanıma gibi pek çok sınıflandırma işlevinde başarı ile kullanılan oldukça etkili bir makine öğrenme algoritmasıdır. Co-training, elle etiketlenmiş eğitim veri setine, etiketlenmemiş büyük miktarlardaki veriyi belirli miktarlarda etiketleyerek katmak suretiyle öğreticili öğrenme algoritmalarının performansını arttıran bir yarı öğreticili öğrenme metodudur. Co-training algoritmaları etiketlenmiş giriş verisine ilişkin farklı bakışlar üzerinde eğitilmiş iki veya daha fazla sınıflandırıcının üretilmesi ve daha sonra bu sınıflandırıcıların etiketlenmemiş veriyi ayrı ayrı etiketlemesi için kullanıldığı algoritmalardır. Otomatik olarak en güvenilir biçimde etiketlenmiş örnekler daha sonra insanlar tarafından elle etiketlenmiş veriye katılmaktadır. Bu işlem pekçok defa devam ettirilmektedir. Bu projede konuşma verisine ilişkin bürünsel, sözcüksel ve biçimbilgisel bilgilerin bakış olarak kullanıldığı co-training ile cümle bölütlemenin gerçekleştirilmesi ele alınmıştır. Cümle Bölütleme işlevi standart konuşma tanıyıcılarının çıkışından elde edilen işlenmemiş kelime dizisi biçimindeki veriyi zenginleştirmeyi amaçlayan bir işlemdir. Bu işlemin rolü, kelime dizisi biçiminde olan verinin cümle ünitelerine ayrılmasını sağlamaktır. Cümle Bölütleme konuşma anlamaya kadar olan süreçte ilk adımdır. Cümle bölütleme işlevi, çözümleme, makine çevirimi, bilgi çıkarımı gibi cümle bölütlemenin yapıldığının varsayıldığı konuşma işlemenin daha ileri uygulamaları için bir ön adım olarak gerçekleştirilmektedir. Cümle sınırları belirlendikten sonra bu cümleler üzerinde daha ileri düzeydeki sözdizimsel ve/veya anlamsal analizler gerçekleştirilebilmektedir. Bu projede konuşma özellikleri (bürünsel, sözcüksel ve biçimbilgisel) ayrışık ve doğal özellik seti olarak ele alınmış ve bu özellik setlerinin co-training algoritması ile kullanılması ile baseline sistemin performansının arttırılmasına çalışılmıştır. Ayrıca, co-training için uzlaşma ve uzlaşmama adı verilen farklı öğrenme stratejileri de araştırılmıştır. Buna ek olarak, self-combined adını verdiğimiz ve kendi kendine eğitme ile co-training yaklaşımlarının bir araya getirildiği bir yaklaşım da öne sürülmüştür.Co-training is a very effective machine learning technique that has been used successfully in several classification tasks like web page classification, word sense disambiguation, and named entity recognition. Co-training is a semi-supervised learning method that aims to improve performance of a supervised learning algorithm by incorporating large amounts of unlabeled data into the training data set. Co-training algorithms work by generating two or more classifiers trained on different views of the input labeled data that are then used to label the unlabeled data separately. The most confidently labeled examples of the automatically labeled data can then be added to the set of manually labeled data. The process may continue for several iterations. In this project, we have described the application of the co-training method for sentence segmentation where we used the prosodic, lexical and morphological information as the views of the data. Sentence segmentation from speech is part of a process that aims at enriching the unstructured stream of words that are the output of standard speech recognizers. Its role is to find the sentence units in this stream of words. Sentence segmentation is a preliminary step toward speech understanding. It is of particular importance for speech related applications, as most of the further processing steps, such as parsing, machine translation and information extraction, assume the presence of sentence boundaries. In this project, we consider the speech features (prosodic, lexical and morphological) as disjoint and natural feature sets or views and we try to improve performance of the baseline by using these feature sets with the co-training algorithm, Furthermore we have tried to investigate the different learning strategies for the co-training such as agreement and disagreement. In addition to these strategies it has been proposed that a new approach that we called self-combined which is the mixed version of the self-training and co-training approaches.TÜBİTA
    corecore