Search CORE

6,943 research outputs found

Overview of VideoCLEF 2008: Automatic generation of topic-based feeds for dual language audio-visual content

Author: Jones Gareth J.F.
Larson Martha
Newman Eamonn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

The VideoCLEF track, introduced in 2008, aims to develop and evaluate tasks related to analysis of and access to multilingual multimedia content. In its first year, VideoCLEF piloted the Vid2RSS task, whose main subtask was the classification of dual language video (Dutchlanguage television content featuring English-speaking experts and studio guests). The task offered two additional discretionary subtasks: feed translation and automatic keyframe extraction. Task participants were supplied with Dutch archival metadata, Dutch speech transcripts, English speech transcripts and 10 thematic category labels, which they were required to assign to the test set videos. The videos were grouped by class label into topic-based RSS-feeds, displaying title, description and keyframe for each video. Five groups participated in the 2008 VideoCLEF track. Participants were required to collect their own training data; both Wikipedia and general web content were used. Groups deployed various classifiers (SVM, Naive Bayes and k-NN) or treated the problem as an information retrieval task. Both the Dutch speech transcripts and the archival metadata performed well as sources of indexing features, but no group succeeded in exploiting combinations of feature sources to significantly enhance performance. A small scale fluency/adequacy evaluation of the translation task output revealed the translation to be of sufficient quality to make it valuable to a non-Dutch speaking English speaker. For keyframe extraction, the strategy chosen was to select the keyframe from the shot with the most representative speech transcript content. The automatically selected shots were shown, with a small user study, to be competitive with manually selected shots. Future years of VideoCLEF will aim to expand the corpus and the class label list, as well as to extend the track to additional tasks

CiteSeerX

Irish Universities

DCU Online Research Access Service

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Weakly Labelled AudioSet Tagging with Attention Neural Networks

Author: Iqbal Turab
Kong Qiuqiang
Plumbley Mark D.
Wang Wenwu
Xu Yong
Yu Changsong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/08/2019
Field of study

Audio tagging is the task of predicting the presence or absence of sound classes within an audio clip. Previous work in audio tagging focused on relatively small datasets limited to recognising a small number of sound classes. We investigate audio tagging on AudioSet, which is a dataset consisting of over 2 million audio clips and 527 classes. AudioSet is weakly labelled, in that only the presence or absence of sound classes is known for each clip, while the onset and offset times are unknown. To address the weakly-labelled audio tagging problem, we propose attention neural networks as a way to attend the most salient parts of an audio clip. We bridge the connection between attention neural networks and multiple instance learning (MIL) methods, and propose decision-level and feature-level attention neural networks for audio tagging. We investigate attention neural networks modeled by different functions, depths and widths. Experiments on AudioSet show that the feature-level attention neural network achieves a state-of-the-art mean average precision (mAP) of 0.369, outperforming the best multiple instance learning (MIL) method of 0.317 and Google's deep neural network baseline of 0.314. In addition, we discover that the audio tagging performance on AudioSet embedding features has a weak correlation with the number of training samples and the quality of labels of each sound class.Comment: 13 page

arXiv.org e-Print Archive

University of Surrey

Surrey Research Insight

Fingerprinting Smart Devices Through Embedded Acoustic Components

Author: Borisov Nikita
Caesar Matthew
Das Anupam
Publication venue
Publication date: 13/03/2014
Field of study

The widespread use of smart devices gives rise to both security and privacy concerns. Fingerprinting smart devices can assist in authenticating physical devices, but it can also jeopardize privacy by allowing remote identification without user awareness. We propose a novel fingerprinting approach that uses the microphones and speakers of smart phones to uniquely identify an individual device. During fabrication, subtle imperfections arise in device microphones and speakers which induce anomalies in produced and received sounds. We exploit this observation to fingerprint smart devices through playback and recording of audio samples. We use audio-metric tools to analyze and explore different acoustic features and analyze their ability to successfully fingerprint smart devices. Our experiments show that it is even possible to fingerprint devices that have the same vendor and model; we were able to accurately distinguish over 93% of all recorded audio clips from 15 different units of the same model. Our study identifies the prominent acoustic features capable of fingerprinting devices with high success rate and examines the effect of background noise and other variables on fingerprinting accuracy

arXiv.org e-Print Archive

CiteSeerX

Imaging time series for the classification of EMI discharge sources

Author: Boreham Philip
Hughes-Narborough Michael
Mitiche Imene
Morison Gordon
Nesbitt Alan
Stewart Brian G.
Publication venue: 'MDPI AG'
Publication date: 01/09/2018
Field of study

In this work, we aim to classify a wider range of Electromagnetic Interference (EMI) discharge sources collected from new power plant sites across multiple assets. This engenders a more complex and challenging classification task. The study involves an investigation and development of new and improved feature extraction and data dimension reduction algorithms based on image processing techniques. The approach is to exploit the Gramian Angular Field technique to map the measured EMI time signals to an image, from which the significant information is extracted while removing redundancy. The image of each discharge type contains a unique fingerprint. Two feature reduction methods called the Local Binary Pattern (LBP) and the Local Phase Quantisation (LPQ) are then used within the mapped images. This provides feature vectors that can be implemented into a Random Forest (RF) classifier. The performance of a previous and the two new proposed methods, on the new database set, is compared in terms of classification accuracy, precision, recall, and F-measure. Results show that the new methods have a higher performance than the previous one, where LBP features achieve the best outcome

Multidisciplinary Digital Publishing Institute

University of Strathclyde Institutional Repository

Directory of Open Access Journals

ResearchOnline@GCU

Proceedings of the 1st Computer Science Student Workshop: Koc University Istinye Campus, Istanbul, Turkey, February 21, 2010

Author
Publication venue: Sabancı University
Publication date: 01/01/2010
Field of study

Sabanci University Research Database

A flexible class of dependence-aware multi-label loss functions

Author: Fürnkranz Johannes
Hüllermeier Eyke
Loza Mencia Eneldo
Rapp Michael
Wever Marcel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2022
Field of study

Open Access LMU