46 research outputs found

    Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

    Full text link
    This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple languages, especially for low-resource languages that have a limited number of labeled data. Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the different languages without human intervention. To this end, we employ a Whisper model which can conduct both language identification and audio-based speech recognition. It serves to filter data of the desired languages and transcribe labels from the unannotated, multilingual audio-visual data pool. By comparing the performances of VSR models trained on automatic labels and the human-annotated labels, we show that we can achieve similar VSR performance to that of human-annotated labels even without utilizing human annotations. Through the automated labeling process, we label large-scale unlabeled multilingual databases, VoxCeleb2 and AVSpeech, producing 1,002 hours of data for four low VSR resource languages, French, Italian, Spanish, and Portuguese. With the automatic labels, we achieve new state-of-the-art performance on mTEDx in four languages, significantly surpassing the previous methods. The automatic labels are available online: https://github.com/JeongHun0716/Visual-Speech-Recognition-for-Low-Resource-LanguagesComment: Accepted at ICASSP 202

    Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

    Full text link
    This paper proposes a novel lip reading framework, especially for low-resource languages, which has not been well addressed in the previous literature. Since low-resource languages do not have enough video-text paired data to train the model to have sufficient power to model lip movements and language, it is regarded as challenging to develop lip reading models for low-resource languages. In order to mitigate the challenge, we try to learn general speech knowledge, the ability to model lip movements, from a high-resource language through the prediction of speech units. It is known that different languages partially share common phonemes, thus general speech knowledge learned from one language can be extended to other languages. Then, we try to learn language-specific knowledge, the ability to model language, by proposing Language-specific Memory-augmented Decoder (LMDecoder). LMDecoder saves language-specific audio features into memory banks and can be trained on audio-text paired data which is more easily accessible than video-text paired data. Therefore, with LMDecoder, we can transform the input speech units into language-specific audio features and translate them into texts by utilizing the learned rich language knowledge. Finally, by combining general speech knowledge and language-specific knowledge, we can efficiently develop lip reading models even for low-resource languages. Through extensive experiments using five languages, English, Spanish, French, Italian, and Portuguese, the effectiveness of the proposed method is evaluated.Comment: Accepted at ICCV 202

    AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

    Full text link
    Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements. VSR is regarded as a challenging task because of the insufficient information on lip movements. In this paper, we propose an Audio Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement the insufficient speech information of visual modality by using audio modality. Different from the previous methods, the proposed AKVSR 1) utilizes rich audio knowledge encoded by a large-scale pretrained audio model, 2) saves the linguistic information of audio knowledge in compact audio memory by discarding the non-linguistic information from the audio through quantization, and 3) includes Audio Bridging Module which can find the best-matched audio features from the compact audio memory, which makes our training possible without audio inputs, once after the compact audio memory is composed. We validate the effectiveness of the proposed method through extensive experiments, and achieve new state-of-the-art performances on the widely-used datasets, LRS2 and LRS3

    The cortical activation pattern by a rehabilitation robotic hand: a functional NIRS study

    Get PDF
    Introduction: Clarification of the relationship between external stimuli and brain response has been an important topic in neuroscience and brain rehabilitation. In the current study, using functional near infrared spectroscopy (fNIRS), we attempted to investigate cortical activation patterns generated during execution of a rehabilitation robotic hand. Methods: Ten normal subjects were recruited for this study. Passive movements of the right fingers were performed using a rehabilitation robotic hand at a frequency of 0.5 Hz. We measured values of oxy-hemoglobin (HbO), deoxy-hemoglobin (HbR) and total-hemoglobin (HbT) in five regions of interest: the primary sensory-motor cortex (SM1), hand somatotopy of the contralateral SM1, supplementary motor area (SMA), premotor cortex (PMC), and prefrontal cortex (PFC). Results: HbO and HbT values indicated significant activation in the left SM1, left SMA, left PMC, and left PFC during execution of the rehabilitation robotic hand (uncorrected, p < 0.01). By contrast, HbR value indicated significant activation only in the hand somatotopic area of the left SM1 (uncorrected, p < 0.01). Conclusions: Our results appear to indicate that execution of the rehabilitation robotic hand could induce cortical activation. © 2014 Chang, Lee, Gu, Lee, Jin, Yeo, Seo and Jang.1

    A versatile strategy for hybridizing small experimental and large simulation data: A case for ceramic tape-casting process

    No full text
    In manufacturing industry, finding optimal design parameters for targeted properties has traditionally been guided by trial and error. However, limited data availability to few hundreds sets of experimental data in typical materials processes, the machine-learning capabilities and other data-driven modeling (DDM) techniques are too far from it to be practical. In this study, we show how a versatile design strategy, tightly coupled with physics-based modeling (PBM) data, can be applied to small set of experimental data to improve the optimization of process parameters. Our strategy uses PBM to achieve augmented data that includes essential physics: in other words, the PBM data allows the inverse design model to ‘learn’ physics, indirectly. We demonstrated the accuracy of both forward-prediction and inverse-optimization have been dramatically improved with the help of PBM data, which are relatively cheap and abundant. Furthermore, we found that the inverse model with augmented data can accurately optimize process parameters, even for ones those were not considered in the simulation. Such versatile strategy can be helpful for processes/experiments for the cases where the number of collectable data is limited, which is most of the case in industries

    Spin-coated Ag nanoparticles for enhancing light absorption of thin film a-Si:H solar cells

    No full text
    We fabricate silver (Ag) nanoparticles (NPs) on the rear surface of thin film hydrogenated amorphous silicon (a-Si: H) solar cells to enhance the light absorption using spin-coating Ag ink, which can produce Ag NPs by a simple, fast, and inexpensive method. Ink concentration and sintering temperature of the spin-coating Ag ink are optimized to maximize the light absorption in the solar cell by tuning the size and distribution as well as the surface coverage of the Ag NPs. The thickness of a SiNx spacer layer, which was embedded between the solar cell and the Ag NPs for electrical isolation, dependent optical properties of the solar cell is also systematically investigated. The thin film a-Si: H solar cell with a thin SiNx spacer layer and the Ag NPs showed great potential for realizing cost-effective high-efficiency solar cells.close0

    Tenderization of Beef Semitendinosus Muscle by Pulsed Electric Field Treatment with a Direct Contact Chamber and Its Impact on Proteolysis and Physicochemical Properties

    No full text
    In this study, the effects of pulse electric field (PEF) treatment on the tenderization of beef semitendinosus muscle were investigated. An adjustable PEF chamber was designed to make direct contact with the surface of the beef sample without water as the PEF-transmitting medium. PEF treatment was conducted with electric field strengths between 0.5 and 2.0 kV/cm. The pulse width and pulse number were fixed as 30 μs and 100 pulses, respectively. The impedance spectrum of PEF-treated beef indicated that PEF treatments induced structural changes in beef muscle, and the degree of the structural changes was dependent on the strength of the electric field. Cutting force, hardness, and chewiness were significantly decreased at 2.0 kV/cm (35, 37, and 34%, respectively) (p < 0.05). Troponin-T was more degraded by PEF treatment at 2.0 kV/cm intensity (being degraded by 90%). The fresh quality factors such as color and lipid oxidation were retained under a certain level of PEF intensity (1.0 kV/cm). These findings suggest that PEF treatment could tenderize beef texture while retaining its fresh quality

    Growth Differentiation Factor-15 as a Predictor of Idiopathic Membranous Nephropathy Progression: A Retrospective Study

    No full text
    Idiopathic membranous nephropathy (IMN) is a major cause of nephrotic syndrome. No biomarker to predict the long-term prognosis of IMN is currently available. Growth differentiation factor-15 (GDF-15) is a member of the transforming growth factor-β superfamily and has been associated with chronic inflammatory disease. It has the potential to be a useful prognostic marker in patients with renal diseases, such as diabetic nephropathy and IgA nephropathy. This study examined whether GDF-15 is associated with the clinical parameters in IMN and showed that GDF-15 can predict IMN disease progression. A total of 35 patients with biopsy-proven IMN, treated at Chungnam National University Hospital from January 2010 to December 2015, were included. Patients younger than 18 years, those with secondary membranous nephropathy, and those lost to follow-up before 12 months were excluded. Levels of GDF-15 at the time of biopsy were measured using enzyme-linked immunosorbent assays. Disease progression was defined as a ≥30% decline in estimated glomerular filtration rate (eGFR) or the development of end-stage renal disease. The mean follow-up was 44.1 months (range: 16–72 months). Using receiver operating curve analysis, the best serum GDF-15 cut-off value for predicting disease progression was 2.15 ng/ml (sensitivity: 75.0%, specificity: 82.1%, p=0.007). GDF-15 was significantly related to age and initial renal function. In the Kaplan-Meier analysis, the risk of disease progression increased in patients with GDF-15 ≥ 2.15 ng/ml when compared with those with GDF-15 < 2.15 ng/ml (50.0% versus 9.7%) (p=0.012). In the multivariate Cox regression analysis adjusted for potential confounders, only GDF-15 was significantly associated with disease progression in IMN (p=0.032). In conclusion, the GDF-15 level at the time of diagnosis has a significant negative correlation with initial renal function and is associated with a poor prognosis in IMN. Our results suggest that GDF-15 provides useful prognostic information in patients with IMN

    Advantages of laparoscopy in gynecologic surgery in elderly patients

    No full text
    Objective Geriatric patients requiring gynecological surgery is increasing worldwide. However, older patients are at higher risk of postoperative morbidity and mortality, particularly cardiopulmonary complications. Laparoscopic surgery is widely used as a minimally invasive method for reducing postoperative morbidities. We compared the outcomes of open and laparoscopic gynecologic surgeries in patients older than 55 years. Methods We included patients aged >55 years who underwent gynecological surgery at a single tertiary center between 2010 and 2020, excluding vaginal or ovarian cancer surgeries were excluded. Surgical outcomes were compared between the open surgery and laparoscopic groups, with age cutoff was set at 65 years for optimal discriminative power. We performed linear or logistic regression analyses to compare the surgical outcomes according to age and operation type. Results Among 2,983 patients, 28.6% underwent open surgery and 71.4% underwent laparoscopic surgery. Perioperative outcomes of laparoscopic surgery were better than those of open surgery in all groups. In both the open and laparoscopic surgery groups, the older patients showed worse overall surgical outcomes. However, age-related differences in perioperative outcomes were less severe in the laparoscopic group. In the linear regression analysis, the differences in estimated blood loss, transfusion, and hospital stay between the age groups were smaller in the laparoscopy group. Similar results were observed in cancer-only and benign-only cohorts. Conclusion Although the surgical outcomes were worse in the older patients, the difference between age groups was smaller for laparoscopic surgery. Laparoscopic surgery offers more advantages and safety in patients aged >65 years
    corecore