Search CORE

4,671 research outputs found

Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

Author: Glass James
Sarkar Achintya kr.
Shon Suwon
Tan Zheng-Hua
Tang Hao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/05/2019
Field of study

There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV). However, a moderate success has been achieved. A recent study [1] presented a time contrastive learning (TCL) concept to explore the non-stationarity of brain signals for classification of brain states. Speech signals have similar non-stationarity property, and TCL further has the advantage of having no need for labeled data. We therefore present a TCL based BN feature extraction method. The method uniformly partitions each speech utterance in a training dataset into a predefined number of multi-frame segments. Each segment in an utterance corresponds to one class, and class labels are shared across utterances. DNNs are then trained to discriminate all speech frames among the classes to exploit the temporal structure of speech. In addition, we propose a segment-based unsupervised clustering algorithm to re-assign class labels to the segments. TD-SV experiments were conducted on the RedDots challenge database. The TCL-DNNs were trained using speech data of fixed pass-phrases that were excluded from the TD-SV evaluation set, so the learned features can be considered phrase-independent. We compare the performance of the proposed TCL bottleneck (BN) feature with those of short-time cepstral features and BN features extracted from DNNs discriminating speakers, pass-phrases, speaker+pass-phrase, as well as monophones whose labels and boundaries are generated by three different automatic speech recognition (ASR) systems. Experimental results show that the proposed TCL-BN outperforms cepstral features and speaker+pass-phrase discriminant BN features, and its performance is on par with those of ASR derived BN features. Moreover,....Comment: Copyright (c) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

arXiv.org e-Print Archive

VBN

G2C: A Generator-to-Classifier Framework Integrating Multi-Stained Visual Cues for Pathological Glomerulus Classification

Author: Liu Zhihong
Sun Guangyu
Wu Bingzhe
Xie Lingxi
Zeng Caihong
Zhang Xiaolu
Zhao Shiwan
Publication venue
Publication date: 07/03/2019
Field of study

Pathological glomerulus classification plays a key role in the diagnosis of nephropathy. As the difference between different subcategories is subtle, doctors often refer to slides from different staining methods to make decisions. However, creating correspondence across various stains is labor-intensive, bringing major difficulties in collecting data and training a vision-based algorithm to assist nephropathy diagnosis. This paper provides an alternative solution for integrating multi-stained visual cues for glomerulus classification. Our approach, named generator-to-classifier (G2C), is a two-stage framework. Given an input image from a specified stain, several generators are first applied to estimate its appearances in other staining methods, and a classifier follows to combine visual cues from different stains for prediction (whether it is pathological, or which type of pathology it has). We optimize these two stages in a joint manner. To provide a reasonable initialization, we pre-train the generators in an unlabeled reference set under an unpaired image-to-image translation task, and then fine-tune them together with the classifier. We conduct experiments on a glomerulus type classification dataset collected by ourselves (there are no publicly available datasets for this purpose). Although joint optimization slightly harms the authenticity of the generated patches, it boosts classification performance, suggesting more effective visual cues are extracted in an automatic way. We also transfer our model to a public dataset for breast cancer classification, and outperform the state-of-the-arts significantly.Comment: Accepted by AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications