320 research outputs found
AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement
Speech enhancement systems are typically trained using pairs of clean and
noisy speech. In audio-visual speech enhancement (AVSE), there is not as much
ground-truth clean data available; most audio-visual datasets are collected in
real-world environments with background noise and reverberation, hampering the
development of AVSE. In this work, we introduce AV2Wav, a resynthesis-based
audio-visual speech enhancement approach that can generate clean speech despite
the challenges of real-world training data. We obtain a subset of nearly clean
speech from an audio-visual corpus using a neural quality estimator, and then
train a diffusion model on this subset to generate waveforms conditioned on
continuous speech representations from AV-HuBERT with noise-robust training. We
use continuous rather than discrete representations to retain prosody and
speaker information. With this vocoding task alone, the model can perform
speech enhancement better than a masking-based baseline. We further fine-tune
the diffusion model on clean/noisy utterance pairs to improve the performance.
Our approach outperforms a masking-based baseline in terms of both automatic
metrics and a human listening test and is close in quality to the target speech
in the listening test. Audio samples can be found at
https://home.ttic.edu/~jcchou/demo/avse/avse_demo.html.Comment: Submitted to ICASSP 202
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Recent work on speech representation models jointly pre-trained with text has
demonstrated the potential of improving speech representations by encoding
speech and text in a shared space. In this paper, we leverage such shared
representations to address the persistent challenge of limited data
availability in spoken language understanding tasks. By employing a pre-trained
speech-text model, we find that models fine-tuned on text can be effectively
transferred to speech testing data. With as little as 1 hour of labeled speech
data, our proposed approach achieves comparable performance on spoken language
understanding tasks (specifically, sentiment analysis and named entity
recognition) when compared to previous methods using speech-only pre-trained
models fine-tuned on 10 times more data. Beyond the proof-of-concept study, we
also analyze the latent representations. We find that the bottom layers of
speech-text models are largely task-agnostic and align speech and text
representations into a shared space, while the top layers are more
task-specific
The effects of rear-wheel camber on the kinematics of upper extremity during wheelchair propulsion
BACKGROUND: The rear-wheel camber, defined as the inclination of the rear wheels, is usually used in wheelchair sports, but it is becoming increasingly employed in daily propulsion. Although the rear-wheel camber can increase stability, it alters physiological performance during propulsion. The purpose of the study is to investigate the effects of rear-wheel cambers on temporal-spatial parameters, joint angles, and propulsion patterns. METHODS: Twelve inexperienced subjects (22.3±1.6 yr) participated in the study. None had musculoskeletal disorders in their upper extremities. An eight-camera motion capture system was used to collect the three-dimensional trajectory data of markers attached to the wheelchair-user system during propulsion. All participants propelled the same wheelchair, which had an instrumented wheel with cambers of 0°, 9°, and 15°, respectively, at an average velocity of 1 m/s. RESULTS: The results show that the rear-wheel camber significantly affects the average acceleration, maximum end angle, trunk movement, elbow joint movement, wrist joint movement, and propulsion pattern. The effects are especially significant between 0° and 15°. For a 15° camber, the average acceleration and joint peak angles significantly increased (p < 0.01). A single loop pattern (SLOP) was adopted by most of the subjects. CONCLUSIONS: The rear-wheel camber affects propulsion patterns and joint range of motion. When choosing a wheelchair with camber adjustment, the increase of joint movements and the base of support should be taken into consideration
Recommended from our members
Measurements of Natural Carbonate Rare Earth Elements in Femtogram Quantities by Inductive Coupled Plasma Sector Field Mass Spectrometry
A rapid and precise standard-bracketing method has been developed for measuring femtogram quantity rare earth element (REE) levels in natural carbonate samples by inductively coupled plasma sector field mass spectrometry that does not require chemical separation steps. A desolvation nebulization system was used to effectively reduce polyatomic interference and enhance sensitivity. REE/Ca ratios are calculated directly from the intensities of the ion beams of 46Ca, 139La, 140Ce, 141Pr, 146Nd, 147Sm, 153Eu, 160Gd, 159Tb, 163Dy, 165Ho, 166Er, 169Tm, 172Yb, and 175Lu using external matrix-matched synthetic standards to correct for instrumental ratio drifting and mass discrimination. A routine measurement time of 3 min is typical for one sample containing 20-40 ppm Ca. Replicate measurements made on natural coral and foraminiferal samples with REE/Ca ratios of 2-242 nmol/mol show that external precisions of 1.9-6.5% (2 RSD) can be achieved with only 10-1000 fg of REEs in 10-20 ÎĽg of carbonate. We show that different sources for monthly resolved coral ultratrace REE variability can be distinguished using this method. For natural slow growth-rate carbonate materials, such as sclerosponges, tufa, and speleothems, the high sample throughput, high precision, and high temporal resolution REE records that can be produced with this procedure have the potential to provide valuable time-series records to advance our understanding of paleoclimatic and paleoenvironmental dynamics on different time scales
Changes in corneal curvature after wearing the orthokeratology lens
AbstractIntroductionThe orthokeratology lens (OK lens) is designed to reshape the cornea and correct refraction error. Owing to the convenience of ceasing the use of glasses during the day, the use of the OK lens is increasing in myopic children. In this study, changes in corneal curvature and astigmatism after wearing the OK lens were analyzed.MethodsThis retrospective cohort study included 65 children (130 eyes) who underwent full and regular examinations. None of the participants had any ocular disease other than myopia and astigmatism. The OK lenses used in this study were four-zone, reverse-geometry lenses. The corneal curvature of each patient was checked annually after the patients discontinued daily wearing of the OK lens for 10 days. Student t test and repeated measures analysis of variance (ANOVA) analyses were performed to compare the results.ResultsThe radius of corneal curvature showed a progressive annual increase with significant differences, both in the steepest and flattest radius of the corneal curvature (p < 0.001 and p = 0.001, respectively). The mean radius of the steepest and flattest corneal curvature increased significantly from baseline to the following years consecutively (all p < 0.001). Nevertheless, astigmatism did not change significantly in any of the tests.ConclusionCorneal curvature changed as the patients grew older. There was a statistically significant increase in the radius of the corneal curvature in the myopic children studied. For correct fit of OK lenses, the radius of the corneal curvature should be regularly checked prior to dispensing a new set of lenses
Toward Joint Language Modeling for Speech Units and Text
Speech and text are two major forms of human language. The research community
has been focusing on mapping speech to text or vice versa for many years.
However, in the field of language modeling, very little effort has been made to
model them jointly. In light of this, we explore joint language modeling for
speech units and text. Specifically, we compare different speech tokenizers to
transform continuous speech signals into discrete units and use different
methods to construct mixed speech-text data. We introduce automatic metrics to
evaluate how well the joint LM mixes speech and text. We also fine-tune the LM
on downstream spoken language understanding (SLU) tasks with different
modalities (speech or text) and test its performance to assess the model's
learning of shared representations. Our results show that by mixing speech
units and text with our proposed mixing techniques, the joint LM improves over
a speech-only baseline on SLU tasks and shows zero-shot cross-modal
transferability.Comment: EMNLP findings 202
Severe pulmonary complications after initial treatment with rituximab for the Asian-variant of intravascular lymphoma
Rituximab improves response to treatment and outcome for patients with CD20+ B-cell lymphoma. Herein, however, we report the occurrence of severe pulmonary complications shortly after rituximab infusion in three patients with the newly diagnosed Asian variant of intravascular lymphoma. It is suggested that patients with this sub-type of lymphoma are monitored carefully for possible drug reactions during the use of rituximab
- …