2,555 research outputs found
Factorization of Discriminatively Trained i-vector Extractor for Speaker Recognition
In this work, we continue in our research on i-vector extractor for speaker
verification (SV) and we optimize its architecture for fast and effective
discriminative training. We were motivated by computational and memory
requirements caused by the large number of parameters of the original
generative i-vector model. Our aim is to preserve the power of the original
generative model, and at the same time focus the model towards extraction of
speaker-related information. We show that it is possible to represent a
standard generative i-vector extractor by a model with significantly less
parameters and obtain similar performance on SV tasks. We can further refine
this compact model by discriminative training and obtain i-vectors that lead to
better performance on various SV benchmarks representing different acoustic
domains.Comment: Submitted to Interspeech 2019, Graz, Austria. arXiv admin note:
substantial text overlap with arXiv:1810.1318
Joint 3D Proposal Generation and Object Detection from View Aggregation
We present AVOD, an Aggregate View Object Detection network for autonomous
driving scenarios. The proposed neural network architecture uses LIDAR point
clouds and RGB images to generate features that are shared by two subnetworks:
a region proposal network (RPN) and a second stage detector network. The
proposed RPN uses a novel architecture capable of performing multimodal feature
fusion on high resolution feature maps to generate reliable 3D object proposals
for multiple object classes in road scenes. Using these proposals, the second
stage detection network performs accurate oriented 3D bounding box regression
and category classification to predict the extents, orientation, and
classification of objects in 3D space. Our proposed architecture is shown to
produce state of the art results on the KITTI 3D object detection benchmark
while running in real time with a low memory footprint, making it a suitable
candidate for deployment on autonomous vehicles. Code is at:
https://github.com/kujason/avodComment: For any inquiries contact aharakeh(at)uwaterloo(dot)c
Text-Independent Speaker Verification Using 3D Convolutional Neural Networks
In this paper, a novel method using 3D Convolutional Neural Network (3D-CNN)
architecture has been proposed for speaker verification in the text-independent
setting. One of the main challenges is the creation of the speaker models. Most
of the previously-reported approaches create speaker models based on averaging
the extracted features from utterances of the speaker, which is known as the
d-vector system. In our paper, we propose an adaptive feature learning by
utilizing the 3D-CNNs for direct speaker model creation in which, for both
development and enrollment phases, an identical number of spoken utterances per
speaker is fed to the network for representing the speakers' utterances and
creation of the speaker model. This leads to simultaneously capturing the
speaker-related information and building a more robust system to cope with
within-speaker variation. We demonstrate that the proposed method significantly
outperforms the traditional d-vector verification system. Moreover, the
proposed system can also be an alternative to the traditional d-vector system
which is a one-shot speaker modeling system by utilizing 3D-CNNs.Comment: Accepted to be published in IEEE International Conference on
Multimedia and Expo (ICME) 201
- …