14 research outputs found

    A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents

    Get PDF
    In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Automatic bilingual text document summarization.

    Get PDF
    Lo Sau-Han Silvia.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references (leaves 137-143).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Definition of a summary --- p.2Chapter 1.2 --- Definition of text summarization --- p.3Chapter 1.3 --- Previous work --- p.4Chapter 1.3.1 --- Extract-based text summarization --- p.5Chapter 1.3.2 --- Abstract-based text summarization --- p.8Chapter 1.3.3 --- Sophisticated text summarization --- p.9Chapter 1.4 --- Summarization evaluation methods --- p.10Chapter 1.4.1 --- Intrinsic evaluation --- p.10Chapter 1.4.2 --- Extrinsic evaluation --- p.11Chapter 1.4.3 --- The TIPSTER SUMMAC text summarization evaluation --- p.11Chapter 1.4.4 --- Text Summarization Challenge (TSC) --- p.13Chapter 1.5 --- Research contributions --- p.14Chapter 1.5.1 --- Text summarization based on thematic term approach --- p.14Chapter 1.5.2 --- Bilingual news summarization based on an event-driven approach --- p.15Chapter 1.6 --- Thesis organization --- p.16Chapter 2 --- Text Summarization based on a Thematic Term Approach --- p.17Chapter 2.1 --- System overview --- p.18Chapter 2.2 --- Document preprocessor --- p.20Chapter 2.2.1 --- English corpus --- p.20Chapter 2.2.2 --- English corpus preprocessor --- p.22Chapter 2.2.3 --- Chinese corpus --- p.23Chapter 2.2.4 --- Chinese corpus preprocessor --- p.24Chapter 2.3 --- Corpus thematic term extractor --- p.24Chapter 2.4 --- Article thematic term extractor --- p.26Chapter 2.5 --- Sentence score generator --- p.29Chapter 2.6 --- Chapter summary --- p.30Chapter 3 --- Evaluation for Summarization using the Thematic Term Ap- proach --- p.32Chapter 3.1 --- Content-based similarity measure --- p.33Chapter 3.2 --- Experiments using content-based similarity measure --- p.36Chapter 3.2.1 --- English corpus and parameter training --- p.36Chapter 3.2.2 --- Experimental results using content-based similarity mea- sure --- p.38Chapter 3.3 --- Average inverse rank (AIR) method --- p.59Chapter 3.4 --- Experiments using average inverse rank method --- p.60Chapter 3.4.1 --- Corpora and parameter training --- p.61Chapter 3.4.2 --- Experimental results using AIR method --- p.62Chapter 3.5 --- Comparison between the content-based similarity measure and the average inverse rank method --- p.69Chapter 3.6 --- Chapter summary --- p.73Chapter 4 --- Bilingual Event-Driven News Summarization --- p.74Chapter 4.1 --- Corpora --- p.75Chapter 4.2 --- Topic and event definitions --- p.76Chapter 4.3 --- Architecture of bilingual event-driven news summarization sys- tem --- p.77Chapter 4.4 --- Bilingual event-driven approach summarization --- p.80Chapter 4.4.1 --- Dictionary-based term translation applying on English news articles --- p.80Chapter 4.4.2 --- Preprocessing for Chinese news articles --- p.89Chapter 4.4.3 --- Event clusters generation --- p.89Chapter 4.4.4 --- Cluster selection and summary generation --- p.96Chapter 4.5 --- Evaluation for summarization based on event-driven approach --- p.101Chapter 4.6 --- Experimental results on event-driven summarization --- p.103Chapter 4.6.1 --- Experimental settings --- p.103Chapter 4.6.2 --- Results and analysis --- p.105Chapter 4.7 --- Chapter summary --- p.113Chapter 5 --- Applying Event-Driven Summarization to a Parallel Corpus --- p.114Chapter 5.1 --- Parallel corpus --- p.115Chapter 5.2 --- Parallel documents preparation --- p.116Chapter 5.3 --- Evaluation methods for the event-driven summaries generated from the parallel corpus --- p.118Chapter 5.4 --- Experimental results and analysis --- p.121Chapter 5.4.1 --- Experimental settings --- p.121Chapter 5.4.2 --- Results and analysis --- p.123Chapter 5.5 --- Chapter summary --- p.132Chapter 6 --- Conclusions and Future Work --- p.133Chapter 6.1 --- Conclusions --- p.133Chapter 6.2 --- Future work --- p.135Bibliography --- p.137Chapter A --- English Stop Word List --- p.144Chapter B --- Chinese Stop Word List --- p.149Chapter C --- Event List Items on the Corpora --- p.151Chapter C.1 --- "Event list items for the topic ""Upcoming Philippine election""" --- p.151Chapter C.2 --- "Event list items for the topic ""German train derail"" " --- p.153Chapter C.3 --- "Event list items for the topic ""Electronic service delivery (ESD) scheme"" " --- p.154Chapter D --- The sample of an English article (9505001.xml). --- p.15
    corecore