Search CORE

43 research outputs found

Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models

Author: Araki Shoko
Delcroix Marc
Ogawa Atsunori
Tawara Naohiro
Publication venue
Publication date: 19/12/2023
Field of study

We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses. Previous studies have reported the effectiveness of combining a small number of NLMs. In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with two different random initialization seeds. We combine these NLMs through iterative lattice generation. Since these NLMs work complementarily with each other, by combining them one by one at each rescoring iteration, language scores attached to given lattice arcs can be gradually refined. Consequently, errors of the ASR hypotheses can be gradually reduced. We also investigate the effectiveness of carrying over contextual information (previous rescoring results) across a lattice sequence of a long speech such as a lecture speech. In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline. For further comparison, we performed simultaneous (i.e., non-iterative) NLM combination and 100-best rescoring using the large ensemble of NLMs, which confirmed the advantage of lattice rescoring with iterative NLM combination.Comment: Accepted to ICASSP 202

arXiv.org e-Print Archive

NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization

Author: Ando Atsushi
Delcroix Marc
Ogawa Atsunori
Tawara Naohiro
Publication venue
Publication date: 22/09/2023
Field of study

This paper details our speaker diarization system designed for multi-domain, multi-microphone casual conversations. The proposed diarization pipeline uses weighted prediction error (WPE)-based dereverberation as a front end, then applies end-to-end neural diarization with vector clustering (EEND-VC) to each channel separately. It integrates the diarization result obtained from each channel using diarization output voting error reduction plus overlap (DOVER-LAP). To harness the knowledge from the target domain and results integrated across all channels, we apply self-supervised adaptation for each session by retraining the EEND-VC with pseudo-labels derived from DOVER-LAP. The proposed system was incorporated into NTT's submission for the distant automatic speech recognition task in the CHiME-7 challenge. Our system achieved 65 % and 62 % relative improvements on development and eval sets compared to the organizer-provided VC-based baseline diarization system, securing third place in diarization performance.Comment: 5 pages, 5 figures, Submitted to ICASSP 202

arXiv.org e-Print Archive

Subjective intelligibility of speech sounds enhanced by ideal ratio mask via crowdsourced remote experiments with effective data screening

Author: Arai Kenichi
Araki Shoko
Irino Toshio
Kinoshita Keisuke
Nakatani Tomohiro
Ogawa Atsunori
Yamamoto Ayako
Publication venue
Publication date: 30/03/2022
Field of study

It is essential to perform speech intelligibility (SI) experiments with human listeners to evaluate the effectiveness of objective intelligibility measures. Recently crowdsourced remote testing has become popular to collect a massive amount and variety of data with relatively small cost and in short time. However, careful data screening is essential for attaining reliable SI data. We compared the results of laboratory and crowdsourced remote experiments to establish an effective data screening technique. We evaluated the SI of noisy speech sounds enhanced by a single-channel ideal ratio mask (IRM) and multi-channel mask-based beamformers. The results demonstrated that the SI scores were improved by these enhancement methods. In particular, the IRM-enhanced sounds were much better than the unprocessed and other enhanced sounds, indicating IRM enhancement may give the upper limit of speech enhancement performance. Moreover, tone pip tests, for which participants were asked to report the number of audible tone pips, reduced the variability of crowdsourced remote results so that the laboratory results became similar. Tone pip tests could be useful for future crowdsourced experiments because of their simplicity and effectiveness for data screening.Comment: This paper was submitted to Interspeech 2022 (http://www.interspeech2022.org

arXiv.org e-Print Archive

Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

Author: Araki Shoko
Burget Lukas
Delcroix Marc
Diez Mireia
Landini Federico
Nakatani Tomohiro
Ogawa Atsunori
Silnova Anna
Tawara Naohiro
Publication venue
Publication date: 22/05/2023
Field of study

Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddings, one for each speaker in a chunk. We can cluster these embeddings using constrained agglomerative hierarchical clustering (cAHC), ensuring embeddings from the same chunk belong to different clusters. This paper introduces an alternative clustering approach, a multi-stream extension of the successful Bayesian HMM clustering of x-vectors (VBx), called MS-VBx. Experiments on three datasets demonstrate that MS-VBx outperforms cAHC in diarization and speaker counting performance.Comment: Accepted at Interspeech 202

arXiv.org e-Print Archive

Achieving LDL cholesterol target levels <1.81 mmol/L may provide extra cardiovascular protection in patients at high risk: Exploratory analysis of the Standard Versus Intensive Statin Therapy for Patients with Hypercholesterolaemia and Diabetic Retinopathy study

Aims To assess the benefits of intensive statin therapy on reducing cardiovascular (CV) events in patients with type 2 diabetes complicated with hyperlipidaemia and retinopathy in a primary prevention setting in Japan. In the intension-to-treat population, intensive therapy [targeting LDL cholesterol = 2.59 to = 100 to = 2.59 to <3.10 mmol/L in patients with hypercholesterolaemia and diabetic retinopathy

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Institutional Repositories DataBase (IRDB)

Kobe University Repository Kernel

Feature Based Domain Adaptation for Neural Network Language Models with Factorised Hidden Layers

Author: Atsunori OGAWA
Marc DELCROIX
Michael HENTSCHEL
Tomoharu IWATA
Tomohiro NAKATANI
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date
Field of study

Crossref

Histopathological Features of Cysts in Wild Medaka Fish

Author: Atsunori Oga
Hiroki Oota
Motoyuki Ogawa
Shoji Oda
Takafumi Katsumura
Toshiyuki Nishimaki
Publication venue: 'The Japanese Society of Fish Pathology'
Publication date: 01/01/2016
Field of study

Crossref

Diverse perspectives to address for the future treatment of heterogeneous hepatocellular carcinoma

Author: Atsunori Tsuchiya
Junji Yokoyama
Kazunao Hayashi
Masahiro Ogawa
Naruhiro Kimura
Shuji Terai
Suguru Takeuchi
Takayuki Watanabe
Yuichi Kojima
Yusuke Watanabe
Publication venue: 'Elsevier BV'
Publication date: 01/03/2019
Field of study

Hepatocellular carcinomas (HCCs), which often arise from chronic liver damage, have poor conditional 5-year survival and are recognized as heterogeneous tumors. Considering the heterogeneity of HCCs, diverse perspectives need to be addressed for treating such tumors, besides the findings of conventional imaging modalities and tumor markers. Data from the latest technologies, such as liquid biopsy, and the detection of the presence of cancer cells with stem/progenitor cell markers, gene mutations and diverse pathways, crosstalk with immune cells and cancer-associated fibroblasts, and mechanisms of epithelial–mesenchymal transition provide diverse lines of information. Integration of these data with clinical data might be necessary to develop effective therapies for precision medicine. Here, we review several aspects of dealing with the complexity of heterogeneous HCCs

Directory of Open Access Journals