407 research outputs found
Phonetic Temporal Neural Model for Language Identification
Deep neural models, particularly the LSTM-RNN model, have shown great
potential for language identification (LID). However, the use of phonetic
information has been largely overlooked by most existing neural LID methods,
although this information has been used very successfully in conventional
phonetic LID systems. We present a phonetic temporal neural model for LID,
which is an LSTM-RNN LID system that accepts phonetic features produced by a
phone-discriminative DNN as the input, rather than raw acoustic features. This
new model is similar to traditional phonetic LID methods, but the phonetic
knowledge here is much richer: it is at the frame level and involves compacted
information of all phones. Our experiments conducted on the Babel database and
the AP16-OLR database demonstrate that the temporal phonetic neural approach is
very effective, and significantly outperforms existing acoustic neural models.
It also outperforms the conventional i-vector approach on short utterances and
in noisy conditions.Comment: Submitted to TASL
A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification
For practical automatic speaker verification (ASV) systems, replay attack
poses a true risk. By replaying a pre-recorded speech signal of the genuine
speaker, ASV systems tend to be easily fooled. An effective replay detection
method is therefore highly desirable. In this study, we investigate a major
difficulty in replay detection: the over-fitting problem caused by variability
factors in speech signal. An F-ratio probing tool is proposed and three
variability factors are investigated using this tool: speaker identity, speech
content and playback & recording device. The analysis shows that device is the
most influential factor that contributes the highest over-fitting risk. A
frequency warping approach is studied to alleviate the over-fitting problem, as
verified on the ASV-spoof 2017 database
Phone-aware Neural Language Identification
Pure acoustic neural models, particularly the LSTM-RNN model, have shown
great potential in language identification (LID). However, the phonetic
information has been largely overlooked by most of existing neural LID models,
although this information has been used in the conventional phonetic LID
systems with a great success. We present a phone-aware neural LID architecture,
which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR
system. By utilizing the phonetic knowledge, the LID performance can be
significantly improved. Interestingly, even if the test language is not
involved in the ASR training, the phonetic knowledge still presents a large
contribution. Our experiments conducted on four languages within the Babel
corpus demonstrated that the phone-aware approach is highly effective.Comment: arXiv admin note: text overlap with arXiv:1705.0315
Deep Speaker Feature Learning for Text-independent Speaker Verification
Recently deep neural networks (DNNs) have been used to learn speaker
features. However, the quality of the learned features is not sufficiently
good, so a complex back-end model, either neural or probabilistic, has to be
used to address the residual uncertainty when applied to speaker verification,
just as with raw features. This paper presents a convolutional time-delay deep
neural network structure (CT-DNN) for speaker feature learning. Our
experimental results on the Fisher database demonstrated that this CT-DNN can
produce high-quality speaker features: even with a single feature (0.3 seconds
including the context), the EER can be as low as 7.68%. This effectively
confirmed that the speaker trait is largely a deterministic short-time property
rather than a long-time distributional pattern, and therefore can be extracted
from just dozens of frames.Comment: deep neural networks, speaker verification, speaker featur
Exploring Communities in Large Profiled Graphs
Given a graph and a vertex , the community search (CS) problem
aims to efficiently find a subgraph of whose vertices are closely related
to . Communities are prevalent in social and biological networks, and can be
used in product advertisement and social event recommendation. In this paper,
we study profiled community search (PCS), where CS is performed on a profiled
graph. This is a graph in which each vertex has labels arranged in a
hierarchical manner. Extensive experiments show that PCS can identify
communities with themes that are common to their vertices, and is more
effective than existing CS approaches. As a naive solution for PCS is highly
expensive, we have also developed a tree index, which facilitate efficient and
online solutions for PCS
An Elman Model Based on GMDH Algorithm for Exchange Rate Forecasting
Since the Elman Neural Networks was proposed, it has attracted wide attention. This method has fast convergence and high prediction accuracy. In this study, a new hybrid model that combines the Elman Neural Networks and the group method of data handling (GMDH) is used to forecast the exchange rate. The GMDH algorithm is used for system modeling. Input variables are selected by the external standards. Based on the output of the GMDH algorithm, valid input variables can be used as an input for the Elman Neural Networks for time series prediction. The empirical results show that the new hybrid algorithm is a useful tool.
- …