43 research outputs found

    Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models

    Full text link
    We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses. Previous studies have reported the effectiveness of combining a small number of NLMs. In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with two different random initialization seeds. We combine these NLMs through iterative lattice generation. Since these NLMs work complementarily with each other, by combining them one by one at each rescoring iteration, language scores attached to given lattice arcs can be gradually refined. Consequently, errors of the ASR hypotheses can be gradually reduced. We also investigate the effectiveness of carrying over contextual information (previous rescoring results) across a lattice sequence of a long speech such as a lecture speech. In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline. For further comparison, we performed simultaneous (i.e., non-iterative) NLM combination and 100-best rescoring using the large ensemble of NLMs, which confirmed the advantage of lattice rescoring with iterative NLM combination.Comment: Accepted to ICASSP 202

    NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization

    Full text link
    This paper details our speaker diarization system designed for multi-domain, multi-microphone casual conversations. The proposed diarization pipeline uses weighted prediction error (WPE)-based dereverberation as a front end, then applies end-to-end neural diarization with vector clustering (EEND-VC) to each channel separately. It integrates the diarization result obtained from each channel using diarization output voting error reduction plus overlap (DOVER-LAP). To harness the knowledge from the target domain and results integrated across all channels, we apply self-supervised adaptation for each session by retraining the EEND-VC with pseudo-labels derived from DOVER-LAP. The proposed system was incorporated into NTT's submission for the distant automatic speech recognition task in the CHiME-7 challenge. Our system achieved 65 % and 62 % relative improvements on development and eval sets compared to the organizer-provided VC-based baseline diarization system, securing third place in diarization performance.Comment: 5 pages, 5 figures, Submitted to ICASSP 202

    Subjective intelligibility of speech sounds enhanced by ideal ratio mask via crowdsourced remote experiments with effective data screening

    Full text link
    It is essential to perform speech intelligibility (SI) experiments with human listeners to evaluate the effectiveness of objective intelligibility measures. Recently crowdsourced remote testing has become popular to collect a massive amount and variety of data with relatively small cost and in short time. However, careful data screening is essential for attaining reliable SI data. We compared the results of laboratory and crowdsourced remote experiments to establish an effective data screening technique. We evaluated the SI of noisy speech sounds enhanced by a single-channel ideal ratio mask (IRM) and multi-channel mask-based beamformers. The results demonstrated that the SI scores were improved by these enhancement methods. In particular, the IRM-enhanced sounds were much better than the unprocessed and other enhanced sounds, indicating IRM enhancement may give the upper limit of speech enhancement performance. Moreover, tone pip tests, for which participants were asked to report the number of audible tone pips, reduced the variability of crowdsourced remote results so that the laboratory results became similar. Tone pip tests could be useful for future crowdsourced experiments because of their simplicity and effectiveness for data screening.Comment: This paper was submitted to Interspeech 2022 (http://www.interspeech2022.org

    Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

    Full text link
    Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddings, one for each speaker in a chunk. We can cluster these embeddings using constrained agglomerative hierarchical clustering (cAHC), ensuring embeddings from the same chunk belong to different clusters. This paper introduces an alternative clustering approach, a multi-stream extension of the successful Bayesian HMM clustering of x-vectors (VBx), called MS-VBx. Experiments on three datasets demonstrate that MS-VBx outperforms cAHC in diarization and speaker counting performance.Comment: Accepted at Interspeech 202

    Achieving LDL cholesterol target levels <1.81 mmol/L may provide extra cardiovascular protection in patients at high risk: Exploratory analysis of the Standard Versus Intensive Statin Therapy for Patients with Hypercholesterolaemia and Diabetic Retinopathy study

    Get PDF
    Aims To assess the benefits of intensive statin therapy on reducing cardiovascular (CV) events in patients with type 2 diabetes complicated with hyperlipidaemia and retinopathy in a primary prevention setting in Japan. In the intension-to-treat population, intensive therapy [targeting LDL cholesterol = 2.59 to = 100 to = 2.59 to <3.10 mmol/L in patients with hypercholesterolaemia and diabetic retinopathy

    Feature Based Domain Adaptation for Neural Network Language Models with Factorised Hidden Layers

    No full text

    Histopathological Features of Cysts in Wild Medaka Fish

    No full text

    Diverse perspectives to address for the future treatment of heterogeneous hepatocellular carcinoma

    No full text
    Hepatocellular carcinomas (HCCs), which often arise from chronic liver damage, have poor conditional 5-year survival and are recognized as heterogeneous tumors. Considering the heterogeneity of HCCs, diverse perspectives need to be addressed for treating such tumors, besides the findings of conventional imaging modalities and tumor markers. Data from the latest technologies, such as liquid biopsy, and the detection of the presence of cancer cells with stem/progenitor cell markers, gene mutations and diverse pathways, crosstalk with immune cells and cancer-associated fibroblasts, and mechanisms of epithelial–mesenchymal transition provide diverse lines of information. Integration of these data with clinical data might be necessary to develop effective therapies for precision medicine. Here, we review several aspects of dealing with the complexity of heterogeneous HCCs
    corecore