Search CORE

187 research outputs found

Learning Local Feature Aggregation Functions with Backpropagation

Author: csurka
krizhevsky
liu
moosmann
perronnin
simonyan
soomro
Publication venue
Publication date: 26/06/2017
Field of study

This paper introduces a family of local feature aggregation functions and a novel method to estimate their parameters, such that they generate optimal representations for classification (or any task that can be expressed as a cost function minimization problem). To achieve that, we compose the local feature aggregation function with the classifier cost function and we backpropagate the gradient of this cost function in order to update the local feature aggregation function parameters. Experiments on synthetic datasets indicate that our method discovers parameters that model the class-relevant information in addition to the local feature space. Further experiments on a variety of motion and visual descriptors, both on image and video datasets, show that our method outperforms other state-of-the-art local feature aggregation functions, such as Bag of Words, Fisher Vectors and VLAD, by a large margin.Comment: In Proceedings of the 25th European Signal Processing Conference (EUSIPCO 2017

arXiv.org e-Print Archive

Crossref

A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition

Author: Bellet Aurelien
Collins Michael
Fan Linxi
Garakani Alireza Bagheri
Guo Dong
Kingsbury Brian
Liu Kuan
Lu Zhiyun
May Avner
Picheny Michael
Sha Fei
Publication venue
Publication date: 18/03/2016
Field of study

We study large-scale kernel methods for acoustic modeling and compare to DNNs on performance metrics related to both acoustic modeling and recognition. Measuring perplexity and frame-level classification accuracy, kernel-based acoustic models are as effective as their DNN counterparts. However, on token-error-rates DNN models can be significantly better. We have discovered that this might be attributed to DNN's unique strength in reducing both the perplexity and the entropy of the predicted posterior probabilities. Motivated by our findings, we propose a new technique, entropy regularized perplexity, for model selection. This technique can noticeably improve the recognition performance of both types of models, and reduces the gap between them. While effective on Broadcast News, this technique could be also applicable to other tasks.Comment: arXiv admin note: text overlap with arXiv:1411.400

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

AXES at TRECVid 2011

Author: Aly Robin
Arandjelovic Relja
Beunders Henri
Chen Shu
Frappier Mathieu
Jawahar C. V.
Juneja Mayank
Lee Hyowon
Martijn Kleppe
McGuinness Kevin
O'Connor Noel E.
Ordelman Roeland
Schneider Daniel
Schwenninger Jochen
Smeaton Alan F.
Tschopel Sebastian
Vedaldi Andrea
Zisserman Andrew
Publication venue
Publication date: 01/01/2011
Field of study

The AXES project participated in the interactive known-item search task (KIS) and the interactive instance search task (INS) for TRECVid 2011. We used the same system architecture and a nearly identical user interface for both the KIS and INS tasks. Both systems made use of text search on ASR, visual concept detectors, and visual similarity search. The user experiments were carried out with media professionals and media students at the Netherlands Institute for Sound and Vision, with media professionals performing the KIS task and media students participating in the INS task. This paper describes the results and findings of our experiments

Fraunhofer-ePrints

Irish Universities

DCU Online Research Access Service

Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording

Author: Badlani Rohan
Elizalde Benjamin
Kumar Anurag
Lane Ian
Raj Bhiksha
Shah Ankit
Vincent Emmanuel
Publication venue
Publication date: 25/08/2016
Field of study

In this paper we present our work on Task 1 Acoustic Scene Classi- fication and Task 3 Sound Event Detection in Real Life Recordings. Among our experiments we have low-level and high-level features, classifier optimization and other heuristics specific to each task. Our performance for both tasks improved the baseline from DCASE: for Task 1 we achieved an overall accuracy of 78.9% compared to the baseline of 72.6% and for Task 3 we achieved a Segment-Based Error Rate of 0.76 compared to the baseline of 0.91

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1