59 research outputs found
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
This paper presents a streaming speaker-attributed automatic speech
recognition (SA-ASR) model that can recognize "who spoke what" with low latency
even when multiple people are speaking simultaneously. Our model is based on
token-level serialized output training (t-SOT) which was recently proposed to
transcribe multi-talker speech in a streaming fashion. To further recognize
speaker identities, we propose an encoder-decoder based speaker embedding
extractor that can estimate a speaker representation for each recognized token
not only from non-overlapping speech but also from overlapping speech. The
proposed speaker embedding, named t-vector, is extracted synchronously with the
t-SOT ASR model, enabling joint execution of speaker identification (SID) or
speaker diarization (SD) with the multi-talker transcription with low latency.
We evaluate the proposed model for a joint task of ASR and SID/SD by using
LibriSpeechMix and LibriCSS corpora. The proposed model achieves substantially
better accuracy than a prior streaming model and shows comparable or sometimes
even superior results to the state-of-the-art offline SA-ASR model.Comment: Submitted to Interspeech 202
Critical role of the gut microbiota in immune responses and cancer immunotherapy
The gut microbiota plays a critical role in the progression of human diseases, especially cancer. In recent decades, there has been accumulating evidence of the connections between the gut microbiota and cancer immunotherapy. Therefore, understanding the functional role of the gut microbiota in regulating immune responses to cancer immunotherapy is crucial for developing precision medicine. In this review, we extract insights from state-of-the-art research to decipher the complicated crosstalk among the gut microbiota, the systemic immune system, and immunotherapy in the context of cancer. Additionally, as the gut microbiota can account for immune-related adverse events, we discuss potential interventions to minimize these adverse effects and discuss the clinical application of five microbiota-targeted strategies that precisely increase the efficacy of cancer immunotherapy. Finally, as the gut microbiota holds promising potential as a target for precision cancer immunotherapeutics, we summarize current challenges and provide a general outlook on future directions in this field
Dietary patterns and the risk of tuberculosis-drug-induced liver injury: a cohort study
Background and purposeNutrition is associated with tuberculosis drug-induced liver injury (TBLI). How dietary patterns relate to tuberculosis drug-induced liver injury is still unknown. The objective of this study is to explore the relation between dietary patterns and the risk of tuberculosis drug-induced liver injury.MethodsThis cohort study was conducted at two hospitals in Shandong Province, China, between 2011 and 2013. A total of 605 tuberculosis patients were included in the final analysis. The blood aspartate aminotransferase or alanine aminotransferase level was monitored through the 6-month tuberculosis treatment. The semi-quantitative food frequency questionnaires were used to survey dietary intake in the second month of the tuberculosis treatment. The China Healthy Diet Index (CHDI), which was previously validated in the Chinese population, was used as an a priori dietary pattern. A posteriori dietary patterns were extracted by principal component analysis (PCA).ResultsThe CHDI was negatively associated with the risk of liver injury [adjusted odds ratio (aOR) per standard deviation (SD) (95% CI): 0.61 (0.40–0.94)] and liver dysfunction [aOR per SD (95% CI): 0.47 (0.35–0.64)] in the multivariate logistic model. A positive association between “Organ meat, poultry, and vegetable oil” dietary pattern scores (extracted by PCA) and the risk of liver injury [aOR (95% CI): 3.02 (1.42–6.41)] and liver dysfunction [aOR (95% CI): 1.83 (1.09–3.05)] was observed.ConclusionIn conclusion, a high CHDI score was a protective factor for tuberculosis drug-induced liver injury, while the “Organ meat, poultry, and vegetable oil” dietary pattern, which was rich in organ meat, poultry, and vegetable oil and low in vegetables, was an independent risk factor for tuberculosis drug-induced liver injury
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Self-supervised learning (SSL) achieves great success in speech recognition,
while limited exploration has been attempted for other speech processing tasks.
As speech signal contains multi-faceted information including speaker identity,
paralinguistics, spoken content, etc., learning universal representations for
all speech tasks is challenging. To tackle the problem, we propose a new
pre-trained model, WavLM, to solve full-stack downstream speech tasks. WavLM
jointly learns masked speech prediction and denoising in pre-training. By this
means, WavLM does not only keep the speech content modeling capability by the
masked speech prediction, but also improves the potential to non-ASR tasks by
the speech denoising. In addition, WavLM employs gated relative position bias
for the Transformer structure to better capture the sequence ordering of input
speech. We also scale up the training dataset from 60k hours to 94k hours.
WavLM Large achieves state-of-the-art performance on the SUPERB benchmark, and
brings significant improvements for various speech processing tasks on their
representative benchmarks. The code and pre-trained models are available at
https://aka.ms/wavlm.Comment: Submitted to the Journal of Selected Topics in Signal Processing
(JSTSP
- …