1,955 research outputs found
Automatic Segmentation of Broadcast News Audio using Self Similarity Matrix
Generally audio news broadcast on radio is com- posed of music, commercials,
news from correspondents and recorded statements in addition to the actual news
read by the newsreader. When news transcripts are available, automatic
segmentation of audio news broadcast to time align the audio with the text
transcription to build frugal speech corpora is essential. We address the
problem of identifying segmentation in the audio news broadcast corresponding
to the news read by the newsreader so that they can be mapped to the text
transcripts. The existing techniques produce sub-optimal solutions when used to
extract newsreader read segments. In this paper, we propose a new technique
which is able to identify the acoustic change points reliably using an acoustic
Self Similarity Matrix (SSM). We describe the two pass technique in detail and
verify its performance on real audio news broadcast of All India Radio for
different languages.Comment: 4 pages, 5 image
Searching Spontaneous Conversational Speech
The ACM SIGIR Workshop on Searching Spontaneous Conversational Speech was held as part of the 2007 ACM SIGIR Conference in Amsterdam.\ud
The workshop program was a mix of elements, including a keynote speech, paper presentations and panel discussions. This brief report describes the organization of this workshop and summarizes the discussions
Comparing non-verbal vocalisations in conversational speech corpora
Conversations do not only consist of spoken words but they also consist of non-verbal vocalisations. Since there is no standard to define and to classify (possible) non-speech sounds the annotations for these vocalisations differ very much for various corpora of conversational speech. There seems to be agreement in the six inspected corpora that hesitation sounds and feedback vocalisations are considered as words (without a standard orthography). The most frequent non-verbal vocalisation are laughter on the one hand and, if considered a vocal sound, breathing noises on the other
Access to recorded interviews: A research agenda
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
Energy-based Self-attentive Learning of Abstractive Communities for Spoken Language Understanding
Abstractive community detection is an important spoken language understanding
task, whose goal is to group utterances in a conversation according to whether
they can be jointly summarized by a common abstractive sentence. This paper
provides a novel approach to this task. We first introduce a neural contextual
utterance encoder featuring three types of self-attention mechanisms. We then
train it using the siamese and triplet energy-based meta-architectures.
Experiments on the AMI corpus show that our system outperforms multiple
energy-based and non-energy based baselines from the state-of-the-art. Code and
data are publicly available.Comment: Update baseline
- âŚ