9,753 research outputs found
Phone-aware Neural Language Identification
Pure acoustic neural models, particularly the LSTM-RNN model, have shown
great potential in language identification (LID). However, the phonetic
information has been largely overlooked by most of existing neural LID models,
although this information has been used in the conventional phonetic LID
systems with a great success. We present a phone-aware neural LID architecture,
which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR
system. By utilizing the phonetic knowledge, the LID performance can be
significantly improved. Interestingly, even if the test language is not
involved in the ASR training, the phonetic knowledge still presents a large
contribution. Our experiments conducted on four languages within the Babel
corpus demonstrated that the phone-aware approach is highly effective.Comment: arXiv admin note: text overlap with arXiv:1705.0315
Deep Speaker Feature Learning for Text-independent Speaker Verification
Recently deep neural networks (DNNs) have been used to learn speaker
features. However, the quality of the learned features is not sufficiently
good, so a complex back-end model, either neural or probabilistic, has to be
used to address the residual uncertainty when applied to speaker verification,
just as with raw features. This paper presents a convolutional time-delay deep
neural network structure (CT-DNN) for speaker feature learning. Our
experimental results on the Fisher database demonstrated that this CT-DNN can
produce high-quality speaker features: even with a single feature (0.3 seconds
including the context), the EER can be as low as 7.68%. This effectively
confirmed that the speaker trait is largely a deterministic short-time property
rather than a long-time distributional pattern, and therefore can be extracted
from just dozens of frames.Comment: deep neural networks, speaker verification, speaker featur
Recommended from our members
Papillary cystadenoma of the parotid gland: A case report.
BackgroundPapillary cystadenoma is a rare benign epithelial tumor of the salivary gland, which is characterized by papillary structures and oncocytic cells with rich eosinophilic cytoplasm. We found only one case of papillary cystadenoma in nearly 700 cases of salivary gland tumors. Our case was initially mistaken for a tumor of the right temporomandibular joint (TMJ) capsule rather than of parotid gland origin. Preoperative magnetic resonance imaging (MRI) and computed tomography (CT) should be carefully studied, which allows for appropriate preoperative counseling and operative planning.Case summaryHere, we report an unusual case of a 54-year-old woman with a parotid gland papillary cystadenoma (PGPC) that was misdiagnosed as a tumor of the right TMJ capsule. She was initially admitted to our hospital due to a mass anterior to her right ear inadvertently found 5 d ago. Preoperative CT and MRI revealed a well circumscribed tumor that was attached to the right TMJ capsule. The patient underwent a resection through an incision for TMJ, but evaluation of an intraoperative frozen section revealed a benign tumor of the parotid gland. Then we removed part of the parotid gland above the temporal facial trunk. The facial nerve was preserved. Postoperative histopathological findings revealed that the tumor was PGPC. No additional treatment was performed. There was no recurrence during a 20-mo follow-up period.ConclusionThe integrity of the interstitial space around the condyle in MRI or CT should be carefully evaluated for parotid gland or TMJ tumors
The possible members of the meson nonet
The strong decays of the states are evaluated in the
model with two types of space wave functions. Comparing the model
expectations with the experimental data for the , ,
, and , we suggest that the , , and
can be assigned as the members of the meson nonet, while the
assignment for the is not favored by its width. The
kaon is predicted to have a mass of about 2418 MeV and a width of about 163 MeV
or 225 MeV.Comment: 10 pages, 5 figures, version accepted by Eur. Phys. J.
Deep factorization for speech signal
Various informative factors mixed in speech signals, leading to great
difficulty when decoding any of the factors. An intuitive idea is to factorize
each speech frame into individual informative factors, though it turns out to
be highly difficult. Recently, we found that speaker traits, which were assumed
to be long-term distributional properties, are actually short-time patterns,
and can be learned by a carefully designed deep neural network (DNN). This
discovery motivated a cascade deep factorization (CDF) framework that will be
presented in this paper. The proposed framework infers speech factors in a
sequential way, where factors previously inferred are used as conditional
variables when inferring other factors. We will show that this approach can
effectively factorize speech signals, and using these factors, the original
speech spectrum can be recovered with a high accuracy. This factorization and
reconstruction approach provides potential values for many speech processing
tasks, e.g., speaker recognition and emotion recognition, as will be
demonstrated in the paper.Comment: Accepted by ICASSP 2018. arXiv admin note: substantial text overlap
with arXiv:1706.0177
- …