Search CORE

407 research outputs found

Phonetic Temporal Neural Model for Language Identification

Author: Abel Andrew
Chen Yixiang
Li Lantian
Tang Zhiyuan
Wang Dong
Publication venue
Publication date: 25/08/2017
Field of study

Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is similar to traditional phonetic LID methods, but the phonetic knowledge here is much richer: it is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing acoustic neural models. It also outperforms the conventional i-vector approach on short utterances and in noisy conditions.Comment: Submitted to TASL

arXiv.org e-Print Archive

Crossref

University of Strathclyde Institutional Repository

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification

Author: Chen Yixiang
Li Lantian
Wang Dong
Zheng Thomas Fang
Publication venue
Publication date: 07/06/2017
Field of study

For practical automatic speaker verification (ASV) systems, replay attack poses a true risk. By replaying a pre-recorded speech signal of the genuine speaker, ASV systems tend to be easily fooled. An effective replay detection method is therefore highly desirable. In this study, we investigate a major difficulty in replay detection: the over-fitting problem caused by variability factors in speech signal. An F-ratio probing tool is proposed and three variability factors are investigated using this tool: speaker identity, speech content and playback & recording device. The analysis shows that device is the most influential factor that contributes the highest over-fitting risk. A frequency warping approach is studied to alleviate the over-fitting problem, as verified on the ASV-spoof 2017 database

arXiv.org e-Print Archive

Crossref

Phone-aware Neural Language Identification

Author: Chen Yixiang
Li Lantian
Shi Ying
Tang Zhiyuan
Wang Dong
Publication venue
Publication date: 22/05/2017
Field of study

Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID models, although this information has been used in the conventional phonetic LID systems with a great success. We present a phone-aware neural LID architecture, which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR system. By utilizing the phonetic knowledge, the LID performance can be significantly improved. Interestingly, even if the test language is not involved in the ASR training, the phonetic knowledge still presents a large contribution. Our experiments conducted on four languages within the Babel corpus demonstrated that the phone-aware approach is highly effective.Comment: arXiv admin note: text overlap with arXiv:1705.0315

arXiv.org e-Print Archive

Crossref

Deep Speaker Feature Learning for Text-independent Speaker Verification

Author: Chen Yixiang
Li Lantian
Shi Ying
Tang Zhiyuan
Wang Dong
Publication venue
Publication date: 10/05/2017
Field of study

Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.Comment: deep neural networks, speaker verification, speaker featur

arXiv.org e-Print Archive

Crossref

Exploring Communities in Large Profiled Graphs

Author: Chen Xiaojun
Chen Yankai
Cheng Reynold
Fang Yixiang
Li Yun
Zhang Jie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Given a graph

G

and a vertex

q\in G

, the community search (CS) problem aims to efficiently find a subgraph of

G

whose vertices are closely related to

q

. Communities are prevalent in social and biological networks, and can be used in product advertisement and social event recommendation. In this paper, we study profiled community search (PCS), where CS is performed on a profiled graph. This is a graph in which each vertex has labels arranged in a hierarchical manner. Extensive experiments show that PCS can identify communities with themes that are common to their vertices, and is more effective than existing CS approaches. As a naive solution for PCS is highly expensive, we have also developed a tree index, which facilitate efficient and online solutions for PCS

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

An Elman Model Based on GMDH Algorithm for Exchange Rate Forecasting

Author: LI Qiumin
TIAN Yixiang
ZHANG Gaoxun
Publication venue: Management Science and Engineering
Publication date: 16/12/2014
Field of study

Since the Elman Neural Networks was proposed, it has attracted wide attention. This method has fast convergence and high prediction accuracy. In this study, a new hybrid model that combines the Elman Neural Networks and the group method of data handling (GMDH) is used to forecast the exchange rate. The GMDH algorithm is used for system modeling. Input variables are selected by the external standards. Based on the output of the GMDH algorithm, valid input variables can be used as an input for the Elman Neural Networks for time series prediction. The empirical results show that the new hybrid algorithm is a useful tool.

CSCanada.net: E-Journals (Canadian Academy of Oriental and Occidental Culture, Canadian Research & Development Center of Sciences and Cultures)