43 research outputs found
Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models
We investigate the effectiveness of using a large ensemble of advanced neural
language models (NLMs) for lattice rescoring on automatic speech recognition
(ASR) hypotheses. Previous studies have reported the effectiveness of combining
a small number of NLMs. In contrast, in this study, we combine up to eight
NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are
trained with two different random initialization seeds. We combine these NLMs
through iterative lattice generation. Since these NLMs work complementarily
with each other, by combining them one by one at each rescoring iteration,
language scores attached to given lattice arcs can be gradually refined.
Consequently, errors of the ASR hypotheses can be gradually reduced. We also
investigate the effectiveness of carrying over contextual information (previous
rescoring results) across a lattice sequence of a long speech such as a lecture
speech. In experiments using a lecture speech corpus, by combining the eight
NLMs and using context carry-over, we obtained a 24.4% relative word error rate
reduction from the ASR 1-best baseline. For further comparison, we performed
simultaneous (i.e., non-iterative) NLM combination and 100-best rescoring using
the large ensemble of NLMs, which confirmed the advantage of lattice rescoring
with iterative NLM combination.Comment: Accepted to ICASSP 202
NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization
This paper details our speaker diarization system designed for multi-domain,
multi-microphone casual conversations. The proposed diarization pipeline uses
weighted prediction error (WPE)-based dereverberation as a front end, then
applies end-to-end neural diarization with vector clustering (EEND-VC) to each
channel separately. It integrates the diarization result obtained from each
channel using diarization output voting error reduction plus overlap
(DOVER-LAP). To harness the knowledge from the target domain and results
integrated across all channels, we apply self-supervised adaptation for each
session by retraining the EEND-VC with pseudo-labels derived from DOVER-LAP.
The proposed system was incorporated into NTT's submission for the distant
automatic speech recognition task in the CHiME-7 challenge. Our system achieved
65 % and 62 % relative improvements on development and eval sets compared to
the organizer-provided VC-based baseline diarization system, securing third
place in diarization performance.Comment: 5 pages, 5 figures, Submitted to ICASSP 202
Subjective intelligibility of speech sounds enhanced by ideal ratio mask via crowdsourced remote experiments with effective data screening
It is essential to perform speech intelligibility (SI) experiments with human
listeners to evaluate the effectiveness of objective intelligibility measures.
Recently crowdsourced remote testing has become popular to collect a massive
amount and variety of data with relatively small cost and in short time.
However, careful data screening is essential for attaining reliable SI data. We
compared the results of laboratory and crowdsourced remote experiments to
establish an effective data screening technique. We evaluated the SI of noisy
speech sounds enhanced by a single-channel ideal ratio mask (IRM) and
multi-channel mask-based beamformers. The results demonstrated that the SI
scores were improved by these enhancement methods. In particular, the
IRM-enhanced sounds were much better than the unprocessed and other enhanced
sounds, indicating IRM enhancement may give the upper limit of speech
enhancement performance. Moreover, tone pip tests, for which participants were
asked to report the number of audible tone pips, reduced the variability of
crowdsourced remote results so that the laboratory results became similar. Tone
pip tests could be useful for future crowdsourced experiments because of their
simplicity and effectiveness for data screening.Comment: This paper was submitted to Interspeech 2022
(http://www.interspeech2022.org
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization
Combining end-to-end neural speaker diarization (EEND) with vector clustering
(VC), known as EEND-VC, has gained interest for leveraging the strengths of
both methods. EEND-VC estimates activities and speaker embeddings for all
speakers within an audio chunk and uses VC to associate these activities with
speaker identities across different chunks. EEND-VC generates thus multiple
streams of embeddings, one for each speaker in a chunk. We can cluster these
embeddings using constrained agglomerative hierarchical clustering (cAHC),
ensuring embeddings from the same chunk belong to different clusters. This
paper introduces an alternative clustering approach, a multi-stream extension
of the successful Bayesian HMM clustering of x-vectors (VBx), called MS-VBx.
Experiments on three datasets demonstrate that MS-VBx outperforms cAHC in
diarization and speaker counting performance.Comment: Accepted at Interspeech 202
Achieving LDL cholesterol target levels <1.81 mmol/L may provide extra cardiovascular protection in patients at high risk: Exploratory analysis of the Standard Versus Intensive Statin Therapy for Patients with Hypercholesterolaemia and Diabetic Retinopathy study
Aims To assess the benefits of intensive statin therapy on reducing cardiovascular (CV) events in patients with type 2 diabetes complicated with hyperlipidaemia and retinopathy in a primary prevention setting in Japan. In the intension-to-treat population, intensive therapy [targeting LDL cholesterol = 2.59 to = 100 to = 2.59 to <3.10 mmol/L in patients with hypercholesterolaemia and diabetic retinopathy
Diverse perspectives to address for the future treatment of heterogeneous hepatocellular carcinoma
Hepatocellular carcinomas (HCCs), which often arise from chronic liver damage, have poor conditional 5-year survival and are recognized as heterogeneous tumors. Considering the heterogeneity of HCCs, diverse perspectives need to be addressed for treating such tumors, besides the findings of conventional imaging modalities and tumor markers. Data from the latest technologies, such as liquid biopsy, and the detection of the presence of cancer cells with stem/progenitor cell markers, gene mutations and diverse pathways, crosstalk with immune cells and cancer-associated fibroblasts, and mechanisms of epithelial–mesenchymal transition provide diverse lines of information. Integration of these data with clinical data might be necessary to develop effective therapies for precision medicine. Here, we review several aspects of dealing with the complexity of heterogeneous HCCs