732 research outputs found
Personalized Acoustic Modeling by Weakly Supervised Multi-Task Deep Learning using Acoustic Tokens Discovered from Unlabeled Data
It is well known that recognizers personalized to each user are much more
effective than user-independent recognizers. With the popularity of smartphones
today, although it is not difficult to collect a large set of audio data for
each user, it is difficult to transcribe it. However, it is now possible to
automatically discover acoustic tokens from unlabeled personal data in an
unsupervised way. We therefore propose a multi-task deep learning framework
called a phoneme-token deep neural network (PTDNN), jointly trained from
unsupervised acoustic tokens discovered from unlabeled data and very limited
transcribed data for personalized acoustic modeling. We term this scenario
"weakly supervised". The underlying intuition is that the high degree of
similarity between the HMM states of acoustic token models and phoneme models
may help them learn from each other in this multi-task learning framework.
Initial experiments performed over a personalized audio data set recorded from
Facebook posts demonstrated that very good improvements can be achieved in both
frame accuracy and word accuracy over popularly-considered baselines such as
fDLR, speaker code and lightly supervised adaptation. This approach complements
existing speaker adaptation approaches and can be used jointly with such
techniques to yield improved results.Comment: 5 pages, 5 figures, published in IEEE ICASSP 201
Alternative Ingredient Recommendation: A Co-occurrence and Ingredient Category Importance Based Approach
As many people will refer to a recipe when cooking, there are several recipe-sharing websites that include lots of recipes and make recipes easier to access than before. However, there is often the case that we could not get all the ingredients listed on the recipe. Prior research on alternative ingredient substitution has built a recommendation system considering the suitability of a recommended ingredient with the remained ingredients. In this paper, in addition to suitability, we also take the diversity of the ingredient categories and the novelty of new combination of ingredients into account. Besides, we combine suitability with novelty as an index, to see whether our method could help find out a new combination of ingredients that is possibly to be a new dish. Our evaluation results show that our proposed method attains a comparable or even better performance on each perspective
THE DISCRIMINATION OF BARBELL WEIGHT FOR WEIGHTLIFTERS
Ten college weightlifters were recruited in this study. The standard barbell weight (Ws) of each participant was set at 80% of personal best snatch record. The test barbell weights that include Ws, Ws+-1kg, Ws+-2kg, and Ws+-5kg were given randomly, then each lifter was asked to identify the difference between the test weight and standard weight. The discrimination was over 86% when the test weight was Ws+-5kg. For the test weight equal to the standard weight, the discrimination was significantly less than that of other test weights (p less than 01). Based on the results, the weightlifter seems to have good discrimination in the barbell mass at the difference of 5 kg. It seems that they could not be aware of the slight difference (ex: less than 2kg) of barbell mass by 80% of their best snatch record
Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision
This work evaluated several cutting-edge large-scale foundation models based
on self-supervision or weak supervision, including SeamlessM4T, SeamlessM4T v2,
and Whisper-large-v3, on three code-switched corpora. We found that
self-supervised models can achieve performances close to the supervised model,
indicating the effectiveness of multilingual self-supervised pre-training. We
also observed that these models still have room for improvement as they kept
making similar mistakes and had unsatisfactory performances on modeling
intra-sentential code-switching. In addition, the validity of several variants
of Whisper was explored, and we concluded that they remained effective in a
code-switching scenario, and similar techniques for self-supervised models are
worth studying to boost the performance of code-switched tasks.Comment: Submitted to ICASSP 2024 Self-supervision in Audio, Speech and Beyond
worksho
- …