Search CORE

6 research outputs found

Recommended from our members

Audio Fingerprinting to Identify Multiple Videos of an Event

Author: Cotton Courtenay Valentine
Ellis Daniel P. W.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 02/06/2009
Field of study

The proliferation of consumer recording devices and video sharing websites makes the possibility of having access to multiple recordings of the same occurrence increasingly likely. These co-synchronous recordings can be identified via their audio tracks, despite local noise and channel variations. We explore a robust fingerprinting strategy to do this. Matching pursuit is used to obtain a sparse set of the most prominent elements in a video soundtrack. Pairs of these elements are hashed and stored, to be efficiently compared with one another. This fingerprinting is tested on a corpus of over 700 YouTube videos related to the 2009 U.S. presidential inauguration. Reliable matching of identical events in different recordings is demonstrated, even under difficult conditions

Columbia University Academic Commons

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Audio Fingerprinting to Identify Multiple Videos of an Event

Author: Cotton Courtenay Valentine
Ellis Daniel P. W.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

CiteSeerX

Crossref

Columbia University Academic Commons

Recommended from our members

Spectral vs. spectro-temporal features for acoustic event detection

Author: Cotton Courtenay Valentine
Ellis Daniel P. W.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

Automatic detection of different types of acoustic events is an interesting problem in soundtrack processing. Typical approaches to the problem use short-term spectral features to describe the audio signal, with additional modeling on top to take temporal context into account. We propose an approach to detecting and modeling acoustic events that directly describes temporal context, using convolutive non-negative matrix factorization (NMF). NMF is useful for finding parts-based decompositions of data; here it is used to discover a set of spectro-temporal patch bases that best describe the data, with the patches corresponding to event-like structures. We derive features from the activations of these patch bases, and perform event detection on a database consisting of 16 classes of meeting-room acoustic events. We compare our approach with a baseline using standard short-term mel frequency cepstal coefficient (MFCC) features. We demonstrate that the event-based system is more robust in the presence of added noise than the MFCC-based system, and that a combination of the two systems performs even better than either individually

Columbia University Academic Commons

Recommended from our members

Joint Audio-Visual Signatures for Web Video Analysis

Author: Chang Shih-Fu
Cotton Courtenay Valentine
Ellis Daniel P. W.
Jiang Wei
Jiang Yu-Gang
Zeng Xiaohong
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

Presentation of video classification project

Columbia University Academic Commons

Recommended from our members

Joint Audio-Visual Signatures for Web Video Analysis

Author: Ellis Daniel P. W.
Chang Shih-Fu
Jiang Yu-Gang
Zeng Xiaohong
Ye Guangnan
Cotton Courtenay Valentine
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

Presentation of video classification project, including the TRECVID MED2010 system

Columbia University Academic Commons

Soundtrack classification by transient events

Author: Cotton Courtenay Valentine
Ellis Daniel P. W.
Loui Alexander C.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

We present a method for video classification based on information in the soundtrack. Unlike previous approaches which describe the audio via statistics of mel-frequency cepstral coefficient (MFCC) features calculated on uniformly-spaced frames, we investigate an approach to focusing our representation on audio transients corresponding to sound-track events. These event-related features can reflect the "foreground" of the soundtrack and capture its short-term temporal structure better than conventional frame-based statistics. We evaluate our method on a test set of 1873 YouTube videos labeled with 25 semantic concepts. Retrieval results based on transient features alone are comparable to an MFCC-based system, and fusing the two representations achieves a relative improvement of 7.5% in mean average precision (MAP)

CiteSeerX

Crossref

Columbia University Academic Commons