9,551 research outputs found

    Phone-aware Neural Language Identification

    Full text link
    Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID models, although this information has been used in the conventional phonetic LID systems with a great success. We present a phone-aware neural LID architecture, which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR system. By utilizing the phonetic knowledge, the LID performance can be significantly improved. Interestingly, even if the test language is not involved in the ASR training, the phonetic knowledge still presents a large contribution. Our experiments conducted on four languages within the Babel corpus demonstrated that the phone-aware approach is highly effective.Comment: arXiv admin note: text overlap with arXiv:1705.0315

    Deep Speaker Feature Learning for Text-independent Speaker Verification

    Full text link
    Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.Comment: deep neural networks, speaker verification, speaker featur

    The possible members of the 51S05^1S_0 meson nonet

    Full text link
    The strong decays of the 51S05^1S_0 qqˉq\bar{q} states are evaluated in the 3P0^3P_0 model with two types of space wave functions. Comparing the model expectations with the experimental data for the π(2360)\pi(2360), η(2320)\eta(2320), X(2370)X(2370), and X(2500)X(2500), we suggest that the π(2360)\pi(2360), η(2320)\eta(2320), and X(2500)X(2500) can be assigned as the members of the 51S05^1S_0 meson nonet, while the 51S05^1S_0 assignment for the X(2370)X(2370) is not favored by its width. The 51S05^1S_0 kaon is predicted to have a mass of about 2418 MeV and a width of about 163 MeV or 225 MeV.Comment: 10 pages, 5 figures, version accepted by Eur. Phys. J.

    Deep factorization for speech signal

    Full text link
    Various informative factors mixed in speech signals, leading to great difficulty when decoding any of the factors. An intuitive idea is to factorize each speech frame into individual informative factors, though it turns out to be highly difficult. Recently, we found that speaker traits, which were assumed to be long-term distributional properties, are actually short-time patterns, and can be learned by a carefully designed deep neural network (DNN). This discovery motivated a cascade deep factorization (CDF) framework that will be presented in this paper. The proposed framework infers speech factors in a sequential way, where factors previously inferred are used as conditional variables when inferring other factors. We will show that this approach can effectively factorize speech signals, and using these factors, the original speech spectrum can be recovered with a high accuracy. This factorization and reconstruction approach provides potential values for many speech processing tasks, e.g., speaker recognition and emotion recognition, as will be demonstrated in the paper.Comment: Accepted by ICASSP 2018. arXiv admin note: substantial text overlap with arXiv:1706.0177
    • …
    corecore