5 research outputs found

    On the Information Geometry of Audio Streams with Applications to Similarity Computing

    Get PDF
    International audienceThis paper proposes methods for information processing of audio streams using methods of information geometry. We lay the theoretical groundwork for a framework allowing the treatment of signal information as information entities, suitable for similarity and symbolic computing on audio signals. The theoretical basis of this paper is based on the information geometry of statistical structures representing audio spectrum features, and specifically through the bijection between the generic families of Bregman divergences and that of exponential distributions. The proposed framework, called Music Information Geometry allows online segmentation of audio streams to metric balls where each ball represents a quasi-stationary continuous chunk of audio, and discusses methods to qualify and quantify information between entities for similarity computing. We define an information geometry that approximates a similarity metric space, redefine general notions in music information retrieval such as similarity between entities, and address methods for dealing with non-stationarity of audio signals. We demonstrate the framework on two sample applications for online audio structure discovery and audio matching

    A Unified Multi-Functional Dynamic Spectrum Access Framework: Tutorial, Theory and Multi-GHz Wideband Testbed

    Get PDF
    Dynamic spectrum access is a must-have ingredient for future sensors that are ideally cognitive. The goal of this paper is a tutorial treatment of wideband cognitive radio and radar—a convergence of (1) algorithms survey, (2) hardware platforms survey, (3) challenges for multi-function (radar/communications) multi-GHz front end, (4) compressed sensing for multi-GHz waveforms—revolutionary A/D, (5) machine learning for cognitive radio/radar, (6) quickest detection, and (7) overlay/underlay cognitive radio waveforms. One focus of this paper is to address the multi-GHz front end, which is the challenge for the next-generation cognitive sensors. The unifying theme of this paper is to spell out the convergence for cognitive radio, radar, and anti-jamming. Moore’s law drives the system functions into digital parts. From a system viewpoint, this paper gives the first comprehensive treatment for the functions and the challenges of this multi-function (wideband) system. This paper brings together the inter-disciplinary knowledge

    Blind change detection for audio segmentation

    No full text
    Automatic segmentation of audio streams according to speaker identities, environmental and channel conditions has become an important preprocessing step for speech recognition, speaker recognition, and audio data mining. In most previous approaches, the automatic segmentation was evaluated in terms of the performance of the final system like the word error rate for speech recognition systems. In many applications like online audio indexing, and information retrieval systems, the actual boundaries of the segments are required. Therefore we present an approach based on the cumulative sum (CuSum) algorithm for automatic segmentation which minimizes the missing probability for a given false alarm rate. In this paper, we compare the CuSum algorithm to the Bayesian information criterion (BIC) algorithm, and a generalization of the Kolmogorov-Smirnov’s test for automatic segmentation of audio streams. We present a two-step variation of the three algorithms which improves the performance significantly. We present also a novel approach that combines hypothesized boundaries from the three algorithms to achieve the final segmentation of the audio stream. Our experiments on the 1998 Hub4 broadcast news show that a variation of the CuSum algorithm significantly outperforms the other two approaches and that combining the three approaches using a voting scheme improves the performance slightly compared to using the a two-step variation of the CuSum algorithm alone. 1

    Blind Change Detection for Audio Segmentation

    No full text
    Automatic segmentation of these audio streams according to speaker identities, environmental and channel conditions has become an important preprocessing step for speech recognition, speaker recognition, and audio data mining [7], [8], [?], and [?]. In this paper, we test and compare the cumulative sum (CuSum)algorithm [2], and [?], the Bayesian information criterion (BIC) algorithm [?], and [4], and Kolmogrov-Smirnov's test, [?], for detecting changes in speaker identity, environmental conditions and channel conditions in audio signals. We present a novel approach that combines hypothesized boundaries from the three algorithms to achieve the final segmentation of the audio signal. Our experiments on the 1998 EARS Hub4 Broadcast News show that a variation of the CuSum algorithm significantly outperforms the other two approaches and that combining the three approaches using a voting scheme improves the performance slightly compared to using the CuSum algorithm alone
    corecore