Search CORE

9,753 research outputs found

Phone-aware Neural Language Identification

Author: Chen Yixiang
Li Lantian
Shi Ying
Tang Zhiyuan
Wang Dong
Publication venue
Publication date: 22/05/2017
Field of study

Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID models, although this information has been used in the conventional phonetic LID systems with a great success. We present a phone-aware neural LID architecture, which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR system. By utilizing the phonetic knowledge, the LID performance can be significantly improved. Interestingly, even if the test language is not involved in the ASR training, the phonetic knowledge still presents a large contribution. Our experiments conducted on four languages within the Babel corpus demonstrated that the phone-aware approach is highly effective.Comment: arXiv admin note: text overlap with arXiv:1705.0315

arXiv.org e-Print Archive

Crossref

Deep Speaker Feature Learning for Text-independent Speaker Verification

Author: Chen Yixiang
Li Lantian
Shi Ying
Tang Zhiyuan
Wang Dong
Publication venue
Publication date: 10/05/2017
Field of study

Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.Comment: deep neural networks, speaker verification, speaker featur

arXiv.org e-Print Archive

Crossref

Recommended from our members

Papillary cystadenoma of the parotid gland: A case report.

Author: Ha Patrick K
Ma Ying
Wang Li
Wang Zhi-Ming
Zhang Shi-Kun
Publication venue: eScholarship, University of California
Publication date: 01/02/2019
Field of study

BackgroundPapillary cystadenoma is a rare benign epithelial tumor of the salivary gland, which is characterized by papillary structures and oncocytic cells with rich eosinophilic cytoplasm. We found only one case of papillary cystadenoma in nearly 700 cases of salivary gland tumors. Our case was initially mistaken for a tumor of the right temporomandibular joint (TMJ) capsule rather than of parotid gland origin. Preoperative magnetic resonance imaging (MRI) and computed tomography (CT) should be carefully studied, which allows for appropriate preoperative counseling and operative planning.Case summaryHere, we report an unusual case of a 54-year-old woman with a parotid gland papillary cystadenoma (PGPC) that was misdiagnosed as a tumor of the right TMJ capsule. She was initially admitted to our hospital due to a mass anterior to her right ear inadvertently found 5 d ago. Preoperative CT and MRI revealed a well circumscribed tumor that was attached to the right TMJ capsule. The patient underwent a resection through an incision for TMJ, but evaluation of an intraoperative frozen section revealed a benign tumor of the parotid gland. Then we removed part of the parotid gland above the temporal facial trunk. The facial nerve was preserved. Postoperative histopathological findings revealed that the tumor was PGPC. No additional treatment was performed. There was no recurrence during a 20-mo follow-up period.ConclusionThe integrity of the interstitial space around the condyle in MRI or CT should be carefully evaluated for parotid gland or TMJ tumors

eScholarship - University of California

The possible members of the $5^1S_0$ meson nonet

Author: Li De-Min
Li Guan-Nan
Wang En
Wang Guan-Ying
Xue Shi-Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/05/2018
Field of study

The strong decays of the

5^1S_0

q\bar{q}

states are evaluated in the

^3P_0

model with two types of space wave functions. Comparing the model expectations with the experimental data for the

\pi(2360)

\eta(2320)

X(2370)

, and

X(2500)

, we suggest that the

\pi(2360)

\eta(2320)

, and

X(2500)

can be assigned as the members of the

5^1S_0

meson nonet, while the

5^1S_0

assignment for the

X(2370)

is not favored by its width. The

5^1S_0

kaon is predicted to have a mass of about 2418 MeV and a width of about 163 MeV or 225 MeV.Comment: 10 pages, 5 figures, version accepted by Eur. Phys. J.

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

Deep factorization for speech signal

Author: Chen Yixiang
Li Lantian
Shi Ying
Tang Zhiyuan
Wang Dong
Zheng Thomas Fang
Publication venue
Publication date: 27/02/2018
Field of study

Various informative factors mixed in speech signals, leading to great difficulty when decoding any of the factors. An intuitive idea is to factorize each speech frame into individual informative factors, though it turns out to be highly difficult. Recently, we found that speaker traits, which were assumed to be long-term distributional properties, are actually short-time patterns, and can be learned by a carefully designed deep neural network (DNN). This discovery motivated a cascade deep factorization (CDF) framework that will be presented in this paper. The proposed framework infers speech factors in a sequential way, where factors previously inferred are used as conditional variables when inferring other factors. We will show that this approach can effectively factorize speech signals, and using these factors, the original speech spectrum can be recovered with a high accuracy. This factorization and reconstruction approach provides potential values for many speech processing tasks, e.g., speaker recognition and emotion recognition, as will be demonstrated in the paper.Comment: Accepted by ICASSP 2018. arXiv admin note: substantial text overlap with arXiv:1706.0177

arXiv.org e-Print Archive

Crossref