46 research outputs found
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
This paper proposes a powerful Visual Speech Recognition (VSR) method for
multiple languages, especially for low-resource languages that have a limited
number of labeled data. Different from previous methods that tried to improve
the VSR performance for the target language by using knowledge learned from
other languages, we explore whether we can increase the amount of training data
itself for the different languages without human intervention. To this end, we
employ a Whisper model which can conduct both language identification and
audio-based speech recognition. It serves to filter data of the desired
languages and transcribe labels from the unannotated, multilingual audio-visual
data pool. By comparing the performances of VSR models trained on automatic
labels and the human-annotated labels, we show that we can achieve similar VSR
performance to that of human-annotated labels even without utilizing human
annotations. Through the automated labeling process, we label large-scale
unlabeled multilingual databases, VoxCeleb2 and AVSpeech, producing 1,002 hours
of data for four low VSR resource languages, French, Italian, Spanish, and
Portuguese. With the automatic labels, we achieve new state-of-the-art
performance on mTEDx in four languages, significantly surpassing the previous
methods. The automatic labels are available online:
https://github.com/JeongHun0716/Visual-Speech-Recognition-for-Low-Resource-LanguagesComment: Accepted at ICASSP 202
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
This paper proposes a novel lip reading framework, especially for
low-resource languages, which has not been well addressed in the previous
literature. Since low-resource languages do not have enough video-text paired
data to train the model to have sufficient power to model lip movements and
language, it is regarded as challenging to develop lip reading models for
low-resource languages. In order to mitigate the challenge, we try to learn
general speech knowledge, the ability to model lip movements, from a
high-resource language through the prediction of speech units. It is known that
different languages partially share common phonemes, thus general speech
knowledge learned from one language can be extended to other languages. Then,
we try to learn language-specific knowledge, the ability to model language, by
proposing Language-specific Memory-augmented Decoder (LMDecoder). LMDecoder
saves language-specific audio features into memory banks and can be trained on
audio-text paired data which is more easily accessible than video-text paired
data. Therefore, with LMDecoder, we can transform the input speech units into
language-specific audio features and translate them into texts by utilizing the
learned rich language knowledge. Finally, by combining general speech knowledge
and language-specific knowledge, we can efficiently develop lip reading models
even for low-resource languages. Through extensive experiments using five
languages, English, Spanish, French, Italian, and Portuguese, the effectiveness
of the proposed method is evaluated.Comment: Accepted at ICCV 202
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Visual Speech Recognition (VSR) is the task of predicting spoken words from
silent lip movements. VSR is regarded as a challenging task because of the
insufficient information on lip movements. In this paper, we propose an Audio
Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement
the insufficient speech information of visual modality by using audio modality.
Different from the previous methods, the proposed AKVSR 1) utilizes rich audio
knowledge encoded by a large-scale pretrained audio model, 2) saves the
linguistic information of audio knowledge in compact audio memory by discarding
the non-linguistic information from the audio through quantization, and 3)
includes Audio Bridging Module which can find the best-matched audio features
from the compact audio memory, which makes our training possible without audio
inputs, once after the compact audio memory is composed. We validate the
effectiveness of the proposed method through extensive experiments, and achieve
new state-of-the-art performances on the widely-used datasets, LRS2 and LRS3
The cortical activation pattern by a rehabilitation robotic hand: a functional NIRS study
Introduction: Clarification of the relationship between external stimuli and brain response has been an important topic in neuroscience and brain rehabilitation. In the current study, using functional near infrared spectroscopy (fNIRS), we attempted to investigate cortical activation patterns generated during execution of a rehabilitation robotic hand. Methods: Ten normal subjects were recruited for this study. Passive movements of the right fingers were performed using a rehabilitation robotic hand at a frequency of 0.5 Hz. We measured values of oxy-hemoglobin (HbO), deoxy-hemoglobin (HbR) and total-hemoglobin (HbT) in five regions of interest: the primary sensory-motor cortex (SM1), hand somatotopy of the contralateral SM1, supplementary motor area (SMA), premotor cortex (PMC), and prefrontal cortex (PFC). Results: HbO and HbT values indicated significant activation in the left SM1, left SMA, left PMC, and left PFC during execution of the rehabilitation robotic hand (uncorrected, p < 0.01). By contrast, HbR value indicated significant activation only in the hand somatotopic area of the left SM1 (uncorrected, p < 0.01). Conclusions: Our results appear to indicate that execution of the rehabilitation robotic hand could induce cortical activation. © 2014 Chang, Lee, Gu, Lee, Jin, Yeo, Seo and Jang.1
A versatile strategy for hybridizing small experimental and large simulation data: A case for ceramic tape-casting process
In manufacturing industry, finding optimal design parameters for targeted properties has traditionally been guided by trial and error. However, limited data availability to few hundreds sets of experimental data in typical materials processes, the machine-learning capabilities and other data-driven modeling (DDM) techniques are too far from it to be practical. In this study, we show how a versatile design strategy, tightly coupled with physics-based modeling (PBM) data, can be applied to small set of experimental data to improve the optimization of process parameters. Our strategy uses PBM to achieve augmented data that includes essential physics: in other words, the PBM data allows the inverse design model to ‘learn’ physics, indirectly. We demonstrated the accuracy of both forward-prediction and inverse-optimization have been dramatically improved with the help of PBM data, which are relatively cheap and abundant. Furthermore, we found that the inverse model with augmented data can accurately optimize process parameters, even for ones those were not considered in the simulation. Such versatile strategy can be helpful for processes/experiments for the cases where the number of collectable data is limited, which is most of the case in industries
Spin-coated Ag nanoparticles for enhancing light absorption of thin film a-Si:H solar cells
We fabricate silver (Ag) nanoparticles (NPs) on the rear surface of thin film hydrogenated amorphous silicon (a-Si: H) solar cells to enhance the light absorption using spin-coating Ag ink, which can produce Ag NPs by a simple, fast, and inexpensive method. Ink concentration and sintering temperature of the spin-coating Ag ink are optimized to maximize the light absorption in the solar cell by tuning the size and distribution as well as the surface coverage of the Ag NPs. The thickness of a SiNx spacer layer, which was embedded between the solar cell and the Ag NPs for electrical isolation, dependent optical properties of the solar cell is also systematically investigated. The thin film a-Si: H solar cell with a thin SiNx spacer layer and the Ag NPs showed great potential for realizing cost-effective high-efficiency solar cells.close0
Tenderization of Beef Semitendinosus Muscle by Pulsed Electric Field Treatment with a Direct Contact Chamber and Its Impact on Proteolysis and Physicochemical Properties
In this study, the effects of pulse electric field (PEF) treatment on the tenderization of beef semitendinosus muscle were investigated. An adjustable PEF chamber was designed to make direct contact with the surface of the beef sample without water as the PEF-transmitting medium. PEF treatment was conducted with electric field strengths between 0.5 and 2.0 kV/cm. The pulse width and pulse number were fixed as 30 μs and 100 pulses, respectively. The impedance spectrum of PEF-treated beef indicated that PEF treatments induced structural changes in beef muscle, and the degree of the structural changes was dependent on the strength of the electric field. Cutting force, hardness, and chewiness were significantly decreased at 2.0 kV/cm (35, 37, and 34%, respectively) (p < 0.05). Troponin-T was more degraded by PEF treatment at 2.0 kV/cm intensity (being degraded by 90%). The fresh quality factors such as color and lipid oxidation were retained under a certain level of PEF intensity (1.0 kV/cm). These findings suggest that PEF treatment could tenderize beef texture while retaining its fresh quality
Growth Differentiation Factor-15 as a Predictor of Idiopathic Membranous Nephropathy Progression: A Retrospective Study
Idiopathic membranous nephropathy (IMN) is a major cause of nephrotic syndrome. No biomarker to predict the long-term prognosis of IMN is currently available. Growth differentiation factor-15 (GDF-15) is a member of the transforming growth factor-β superfamily and has been associated with chronic inflammatory disease. It has the potential to be a useful prognostic marker in patients with renal diseases, such as diabetic nephropathy and IgA nephropathy. This study examined whether GDF-15 is associated with the clinical parameters in IMN and showed that GDF-15 can predict IMN disease progression. A total of 35 patients with biopsy-proven IMN, treated at Chungnam National University Hospital from January 2010 to December 2015, were included. Patients younger than 18 years, those with secondary membranous nephropathy, and those lost to follow-up before 12 months were excluded. Levels of GDF-15 at the time of biopsy were measured using enzyme-linked immunosorbent assays. Disease progression was defined as a ≥30% decline in estimated glomerular filtration rate (eGFR) or the development of end-stage renal disease. The mean follow-up was 44.1 months (range: 16–72 months). Using receiver operating curve analysis, the best serum GDF-15 cut-off value for predicting disease progression was 2.15 ng/ml (sensitivity: 75.0%, specificity: 82.1%, p=0.007). GDF-15 was significantly related to age and initial renal function. In the Kaplan-Meier analysis, the risk of disease progression increased in patients with GDF-15 ≥ 2.15 ng/ml when compared with those with GDF-15 < 2.15 ng/ml (50.0% versus 9.7%) (p=0.012). In the multivariate Cox regression analysis adjusted for potential confounders, only GDF-15 was significantly associated with disease progression in IMN (p=0.032). In conclusion, the GDF-15 level at the time of diagnosis has a significant negative correlation with initial renal function and is associated with a poor prognosis in IMN. Our results suggest that GDF-15 provides useful prognostic information in patients with IMN
Korean clinical practice guideline for benign prostatic hyperplasia
In 2014, the Korean Urological Association organized the Benign Prostatic Hyperplasia Guideline Developing Committee composed
of experts in the field of benign prostatic hyperplasia (BPH) with the participation of the Korean Academy of Family Medicine and
the Korean Continence Society to develop a Korean clinical practice guideline for BPH. The purpose of this clinical practice guideline is to provide current and comprehensive recommendations for the evaluation and treatment of BPH. The committee developed the guideline mainly by adapting existing guidelines and partially by using the de novo method. A comprehensive literature review was carried out primarily from 2009 to 2013 by using medical search engines including data from Korea. Based on the published evidence, recommendations were synthesized, and the level of evidence of the recommendations was determined by using methods adapted from the 2011 Oxford Centre for Evidence-Based Medicine. Meta-analysis was done for one key question and four recommendations. A draft guideline was reviewed by expert peer reviewers and discussed at an expert consensus meeting
until final agreement was achieved. This evidence-based guideline for BPH provides recommendations to primary practitioners and urologists for the diagnosis and treatment of BPH in men older than 40 years