320 research outputs found

    AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

    Full text link
    Speech enhancement systems are typically trained using pairs of clean and noisy speech. In audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data available; most audio-visual datasets are collected in real-world environments with background noise and reverberation, hampering the development of AVSE. In this work, we introduce AV2Wav, a resynthesis-based audio-visual speech enhancement approach that can generate clean speech despite the challenges of real-world training data. We obtain a subset of nearly clean speech from an audio-visual corpus using a neural quality estimator, and then train a diffusion model on this subset to generate waveforms conditioned on continuous speech representations from AV-HuBERT with noise-robust training. We use continuous rather than discrete representations to retain prosody and speaker information. With this vocoding task alone, the model can perform speech enhancement better than a masking-based baseline. We further fine-tune the diffusion model on clean/noisy utterance pairs to improve the performance. Our approach outperforms a masking-based baseline in terms of both automatic metrics and a human listening test and is close in quality to the target speech in the listening test. Audio samples can be found at https://home.ttic.edu/~jcchou/demo/avse/avse_demo.html.Comment: Submitted to ICASSP 202

    Few-Shot Spoken Language Understanding via Joint Speech-Text Models

    Full text link
    Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space. In this paper, we leverage such shared representations to address the persistent challenge of limited data availability in spoken language understanding tasks. By employing a pre-trained speech-text model, we find that models fine-tuned on text can be effectively transferred to speech testing data. With as little as 1 hour of labeled speech data, our proposed approach achieves comparable performance on spoken language understanding tasks (specifically, sentiment analysis and named entity recognition) when compared to previous methods using speech-only pre-trained models fine-tuned on 10 times more data. Beyond the proof-of-concept study, we also analyze the latent representations. We find that the bottom layers of speech-text models are largely task-agnostic and align speech and text representations into a shared space, while the top layers are more task-specific

    The effects of rear-wheel camber on the kinematics of upper extremity during wheelchair propulsion

    Get PDF
    BACKGROUND: The rear-wheel camber, defined as the inclination of the rear wheels, is usually used in wheelchair sports, but it is becoming increasingly employed in daily propulsion. Although the rear-wheel camber can increase stability, it alters physiological performance during propulsion. The purpose of the study is to investigate the effects of rear-wheel cambers on temporal-spatial parameters, joint angles, and propulsion patterns. METHODS: Twelve inexperienced subjects (22.3±1.6 yr) participated in the study. None had musculoskeletal disorders in their upper extremities. An eight-camera motion capture system was used to collect the three-dimensional trajectory data of markers attached to the wheelchair-user system during propulsion. All participants propelled the same wheelchair, which had an instrumented wheel with cambers of 0°, 9°, and 15°, respectively, at an average velocity of 1 m/s. RESULTS: The results show that the rear-wheel camber significantly affects the average acceleration, maximum end angle, trunk movement, elbow joint movement, wrist joint movement, and propulsion pattern. The effects are especially significant between 0° and 15°. For a 15° camber, the average acceleration and joint peak angles significantly increased (p < 0.01). A single loop pattern (SLOP) was adopted by most of the subjects. CONCLUSIONS: The rear-wheel camber affects propulsion patterns and joint range of motion. When choosing a wheelchair with camber adjustment, the increase of joint movements and the base of support should be taken into consideration

    Changes in corneal curvature after wearing the orthokeratology lens

    Get PDF
    AbstractIntroductionThe orthokeratology lens (OK lens) is designed to reshape the cornea and correct refraction error. Owing to the convenience of ceasing the use of glasses during the day, the use of the OK lens is increasing in myopic children. In this study, changes in corneal curvature and astigmatism after wearing the OK lens were analyzed.MethodsThis retrospective cohort study included 65 children (130 eyes) who underwent full and regular examinations. None of the participants had any ocular disease other than myopia and astigmatism. The OK lenses used in this study were four-zone, reverse-geometry lenses. The corneal curvature of each patient was checked annually after the patients discontinued daily wearing of the OK lens for 10 days. Student t test and repeated measures analysis of variance (ANOVA) analyses were performed to compare the results.ResultsThe radius of corneal curvature showed a progressive annual increase with significant differences, both in the steepest and flattest radius of the corneal curvature (p < 0.001 and p = 0.001, respectively). The mean radius of the steepest and flattest corneal curvature increased significantly from baseline to the following years consecutively (all p < 0.001). Nevertheless, astigmatism did not change significantly in any of the tests.ConclusionCorneal curvature changed as the patients grew older. There was a statistically significant increase in the radius of the corneal curvature in the myopic children studied. For correct fit of OK lenses, the radius of the corneal curvature should be regularly checked prior to dispensing a new set of lenses

    Toward Joint Language Modeling for Speech Units and Text

    Full text link
    Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly. In light of this, we explore joint language modeling for speech units and text. Specifically, we compare different speech tokenizers to transform continuous speech signals into discrete units and use different methods to construct mixed speech-text data. We introduce automatic metrics to evaluate how well the joint LM mixes speech and text. We also fine-tune the LM on downstream spoken language understanding (SLU) tasks with different modalities (speech or text) and test its performance to assess the model's learning of shared representations. Our results show that by mixing speech units and text with our proposed mixing techniques, the joint LM improves over a speech-only baseline on SLU tasks and shows zero-shot cross-modal transferability.Comment: EMNLP findings 202

    Severe pulmonary complications after initial treatment with rituximab for the Asian-variant of intravascular lymphoma

    Get PDF
    Rituximab improves response to treatment and outcome for patients with CD20+ B-cell lymphoma. Herein, however, we report the occurrence of severe pulmonary complications shortly after rituximab infusion in three patients with the newly diagnosed Asian variant of intravascular lymphoma. It is suggested that patients with this sub-type of lymphoma are monitored carefully for possible drug reactions during the use of rituximab
    • …
    corecore