1 research outputs found

    Clustering of Imperfect Transcripts Using a Novel Similarity Measure

    Full text link
    There has been a surge of interest in last several years in methods for automatic generation of content indices for multimedia documents, particularly with respect to video and audio documents. As a result, there is much interest in methods for analyzing transcribed documents from audio and video broadcasts and telephone conversations and messages. The present paper deals with such an analysis by presenting a clustering technique to partition a set of transcribed documents into different meaningful topics. Our method determines the intersection between matching transcripts, evaluates the information contribution by each transcript, assesses the information closeness of overlapping words and calculates similarity based on Chi-square method. The main novelty of our method lies in the proposed similarity measure that is designed to withstand the imperfections of transcribed documents. Preliminary experimental results using an archive of transcribed news broadcasts demonstrate the efficacy of the proposed methodology. 1
    corecore