2,760 research outputs found
Exploration of audiovisual heritage using audio indexing technology
This paper discusses audio indexing tools that have been implemented for the disclosure of Dutch audiovisual cultural heritage collections. It explains the role of language models and their adaptation to historical settings and the adaptation of acoustic models for homogeneous audio collections. In addition to the benefits of cross-media linking, the requirements for successful tuning and improvement of available tools for indexing the heterogeneous A/V collections from the cultural heritage domain are reviewed. And finally the paper argues that research is needed to cope with the varying information needs for different types of users
Access to recorded interviews: A research agenda
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
An experiment in audio classification from compressed data
In this paper we present an algorithm for automatic classification of sound into speech, instrumental sound/ music and silence. The method is based on thresholding of features derived from the modulation envelope of the frequency limited audio signal. Four characteristics are examined for discrimination: the occurrence and duration of energy peaks, rhythmic content and the level of harmonic content. The proposed algorithm allows classification directly on MPEG-1 audio bitstreams. The performance of the classifier was evaluated on TRECVID test data. The test results are above-average among all TREC participants. The approaches adopted by other research groups participating in TREC are also discussed
Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data
The development of audio event recognition models requires labeled training
data, which are generally hard to obtain. One promising source of recordings of
audio events is the large amount of multimedia data on the web. In particular,
if the audio content analysis must itself be performed on web audio, it is
important to train the recognizers themselves from such data. Training from
these web data, however, poses several challenges, the most important being the
availability of labels : labels, if any, that may be obtained for the data are
generally {\em weak}, and not of the kind conventionally required for training
detectors or classifiers. We propose that learning algorithms that can exploit
weak labels offer an effective method to learn from web data. We then propose a
robust and efficient deep convolutional neural network (CNN) based framework to
learn audio event recognizers from weakly labeled data. The proposed method can
train from and analyze recordings of variable length in an efficient manner and
outperforms a network trained with {\em strongly labeled} web data by a
considerable margin
- âŚ