Search CORE

19,170 research outputs found

Incremental Art: A Neural Network System for Recognition by Incremental Feature Extraction

Author: Aguilar Mario J.
Ross William D.
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/02/1994
Field of study

Abstract Incremental ART extends adaptive resonance theory (ART) by incorporating mechanisms for efficient recognition through incremental feature extraction. The system achieves efficient confident prediction through the controlled acquisition of only those features necessary to discriminate an input pattern. These capabilities are achieved through three modifications to the fuzzy ART system: (1) A partial feature vector complement coding rule extends fuzzy ART logic to allow recognition based on partial feature vectors. (2) The addition of a F2 decision criterion to measure ART predictive confidence. (3) An incremental feature extraction layer computes the next feature to extract based on a measure of predictive value. Our system is demonstrated on a face recognition problem but has general applicability as a machine vision solution and as model for studying scanning patterns.Office of Naval Research (N00014-92-J-4015, N00014-92-J-1309, N00014-91-4100); Air Force Office of Scientific Research (90-0083); National Science Foundation (IRI 90-00530

Boston University Institutional Repository (OpenBU)

Adversarial Network Bottleneck Features for Noise Robust Speaker Verification

Author: Guo Jun
Ma Zhanyu
Tan Zheng-Hua
Yu Hong
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we propose a noise robust bottleneck feature representation which is generated by an adversarial network (AN). The AN includes two cascade connected networks, an encoding network (EN) and a discriminative network (DN). Mel-frequency cepstral coefficients (MFCCs) of clean and noisy speech are used as input to the EN and the output of the EN is used as the noise robust feature. The EN and DN are trained in turn, namely, when training the DN, noise types are selected as the training labels and when training the EN, all labels are set as the same, i.e., the clean speech label, which aims to make the AN features invariant to noise and thus achieve noise robustness. We evaluate the performance of the proposed feature on a Gaussian Mixture Model-Universal Background Model based speaker verification system, and make comparison to MFCC features of speech enhanced by short-time spectral amplitude minimum mean square error (STSA-MMSE) and deep neural network-based speech enhancement (DNN-SE) methods. Experimental results on the RSR2015 database show that the proposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSE and DNN-SE based MFCCs for different noise types and signal-to-noise ratios. Furthermore, the AN-BN feature is able to improve the speaker verification performance under the clean condition

arXiv.org e-Print Archive

Crossref

VBN

SEGAN: Speech Enhancement Generative Adversarial Network

Author: Bonafonte Antonio
Pascual Santiago
Serrà Joan
Publication venue
Publication date: 09/06/2017
Field of study

Current speech enhancement techniques operate on the spectral domain and/or exploit some higher-level feature. The majority of them tackle a limited number of noise conditions and rely on first-order statistics. To circumvent these issues, deep networks are being increasingly used, thanks to their ability to learn complex functions from large example sets. In this work, we propose the use of generative adversarial networks for speech enhancement. In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them. We evaluate the proposed model using an independent, unseen test set with two speakers and 20 alternative noise conditions. The enhanced samples confirm the viability of the proposed model, and both objective and subjective evaluations confirm the effectiveness of it. With that, we open the exploration of generative architectures for speech enhancement, which may progressively incorporate further speech-centric design choices to improve their performance.Comment: 5 pages, 4 figures, accepted in INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

The neural space: a physiologically inspired noise reduction strategy based on fractional derivatives

Author: Bleeck Stefan
Hu Hongmei
Sang Jinqiu
Winter I.M.
Wright M.C.M.
Publication venue
Publication date
Field of study

Southampton (e-Prints Soton)