Search CORE

22 research outputs found

Multimodal segmentation of lifelog data

Author: Doherty Aiden R.
Ellis Daniel P.W.
Lee Keansub
Smeaton Alan F.
Publication venue: CID Paris
Publication date: 01/01/2007
Field of study

A personal lifelog of visual and audio information can be very helpful as a human memory augmentation tool. The SenseCam, a passive wearable camera, used in conjunction with an iRiver MP3 audio recorder, will capture over 20,000 images and 100 hours of audio per week. If used constantly, very soon this would build up to a substantial collection of personal data. To gain real value from this collection it is important to automatically segment the data into meaningful units or activities. This paper investigates the optimal combination of data sources to segment personal data into such activities. 5 data sources were logged and processed to segment a collection of personal data, namely: image processing on captured SenseCam images; audio processing on captured iRiver audio data; and processing of the temperature, white light level, and accelerometer sensors onboard the SenseCam device. The results indicate that a combination of the image, light and accelerometer sensor data segments our collection of personal data better than a combination of all 5 data sources. The accelerometer sensor is good for detecting when the user moves to a new location, while the image and light sensors are good for detecting changes in wearer activity within the same location, as well as detecting when the wearer socially interacts with others

CiteSeerX

Columbia University Academic Commons

Irish Universities

DCU Online Research Access Service

Recommended from our members

Analysis of Everyday Sounds

Author: Ellis Daniel P. W.
Lee Keansub
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

Describes work on analyzing environmental sounds from personal audio recorders, and from the soundtracks of short consumer-shot videos, which are fused with video analysis to get remarkably usable automatic tags

Columbia University Academic Commons

Recommended from our members

Audio-Based Semantic Concept Classification for Consumer Video

Author: Ellis Daniel P. W.
Lee Keansub
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

This paper presents a novel method for automatically classifying consumer video clips based on their soundtracks. We use a set of 25 overlapping semantic classes, chosen for their usefulness to users, viability of automatic detection and of annotator labeling, and sufficiency of representation in available video collections. A set of 1873 videos from real users has been annotated with these concepts. Starting with a basic representation of each video clip as a sequence of mel-frequency cepstral coefficient (MFCC) frames, we experiment with three clip-level representations: single Gaussian modeling, Gaussian mixture modeling, and probabilistic latent semantic analysis of a Gaussian component histogram. Using such summary features, we produce support vector machine (SVM) classifiers based on the Kullback-Leibler, Bhattacharyya, or Mahalanobis distance measures. Quantitative evaluation shows that our approaches are effective for detecting interesting concepts in a large collection of real-world consumer video clips

Columbia University Academic Commons

Recommended from our members

Segmenting and Classifying Long-Duration Recordings of "Personal Audio"

Author: Ellis Daniel P. W.
Lee Keansub
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

An overview of the project at the Laboratory for Recognition and Organization of Speech and Audio, Department of Electrical Engineering, Columbia University, for segmenting, classifying, and accessing near-continuous recordings collected by a body-worn audio recordings

Columbia University Academic Commons

Recommended from our members

Voice Activity Detection in Personal Audio Recordings Using Autocorrelogram Compensation

Author: Ellis Daniel P. W.
Lee Keansub
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

This paper presents a novel method for identifying regions of speech in the kinds of energetic and highly-variable noise present in 'personal audio' collected by body-worn continuous recorders. Motivated by psychoacoustic evidence that pitch is crucial in the perception and organization of sound, we use a noise-robust pitch detection algorithm to locate speech-like regions. To avoid false alarms resulting from background noise with strong periodic components (such as air-conditioning), we add a new channel selection scheme to suppress frequency subbands where the autocorrelation is more stationary than encountered in voiced speech. Quantitative evaluation shows that these harmonic noises are effectively removed by this compensation technique in the domain of autocorrelogram, and that detection performance is significantly better than existing algorithms for detecting the presence of speech in real-world personal audio recordings

Columbia University Academic Commons

Recommended from our members

Detecting Music in Ambient Audio by Long-Window Autocorrelation

Author: Ellis Daniel P. W.
Lee Keansub
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2008
Field of study

We address the problem of detecting music in the background of ambient real-world audio recordings such as the sound track of consumer-shot video. Such material may contain high levels of noises, and we seek to devise features that will reveal music content in such circumstances. Sustained, steady musical pitches show significant, structured autocorrelation at when calculated over windows of hundreds of milliseconds, where autocorrelation of aperiodic noise has become negligible at higher-lag points if a signal is whitened by LPC. Using such features, further compensated by their long-term average to remove the effect of stationary periodic noise, we produce GMM and SVM based classifiers with high performance compared with previous approaches, as verified on a corpus of real consumer video

Columbia University Academic Commons

Recommended from our members

Minimal-Impact Audio-Based Personal Archives

Author: Ellis Daniel P. W.
Lee Keansub
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

An overview of work by the Laboratory for Recognition and Organization of Speech and Audio, Department of Electrical Engineering, Columbia University, on accessing recordings made by body-worn audio recorders, including examples of speech scrambling, and a screenshot of the improved visualization/user interface

Columbia University Academic Commons

Recommended from our members

Minimal-Impact Personal Audio Archives

Author: Ellis Daniel P. W.
Lee Keansub
Ogle James P.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

Review of personal audio work at the Laboratory for Recognition and Organization of Speech and Audio, Department of Electrical Engineering, Columbia University, as part of a meeting for Microsoft's Digital Memories initiative

Columbia University Academic Commons

Audio-Based Semantic Concept Classification for Consumer Video

Author: Daniel P. W. Ellis
Keansub Lee
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref