1 research outputs found

    Different Approaches for Speaker Diarization

    Get PDF
    Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources and other signal source/channel characteristics. Speaker diarization is the task of determining “who spoke when?†in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Diarization can be used for helping speech recognition, facilitating the searching and indexing of audio archives and increasing the richness of automatic transcriptions, making them more readable. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval or higher-level inference on audio data. Accordingly, many important improvements in accuracy and robustness have been reported in the area of conferences. The application domains, from broadcast news, to lectures and meetings, vary greatly and pose different problems, such as access to multiple microphones and multimodal information or overlapping speech
    corecore