115 research outputs found
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
Recent years have witnessed great strides in self-supervised learning (SSL)
on the speech processing. The SSL model is normally pre-trained on a great
variety of unlabelled data and a large model size is preferred to increase the
modeling capacity. However, this might limit its potential applications due to
the expensive computation and memory costs introduced by the oversize model.
Miniaturization for SSL models has become an important research direction of
practical value. To this end, we explore the effective distillation of
HuBERT-based SSL models for automatic speech recognition (ASR). First, in order
to establish a strong baseline, a comprehensive study on different student
model structures is conducted. On top of this, as a supplement to the
regression loss widely adopted in previous works, a discriminative loss is
introduced for HuBERT to enhance the distillation performance, especially in
low-resource scenarios. In addition, we design a simple and effective algorithm
to distill the front-end input from waveform to Fbank feature, resulting in 17%
parameter reduction and doubling inference speed, at marginal performance
degradation.Comment: Submitted to ICASSP 202
Proximity effect at superconducting Sn-Bi2Se3 interface
We have investigated the conductance spectra of Sn-Bi2Se3 interface junctions
down to 250 mK and in different magnetic fields. A number of conductance
anomalies were observed below the superconducting transition temperature of Sn,
including a small gap different from that of Sn, and a zero-bias conductance
peak growing up at lower temperatures. We discussed the possible origins of the
smaller gap and the zero-bias conductance peak. These phenomena support that a
proximity-effect-induced chiral superconducting phase is formed at the
interface between the superconducting Sn and the strong spin-orbit coupling
material Bi2Se3.Comment: 7 pages, 8 figure
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Audio-visual large language models (LLM) have drawn significant attention,
yet the fine-grained combination of both input streams is rather
under-explored, which is challenging but necessary for LLMs to understand
general video inputs. To this end, a fine-grained audio-visual joint
representation (FAVOR) learning framework for multimodal LLMs is proposed in
this paper, which extends a text-based LLM to simultaneously perceive speech
and audio events in the audio input stream and images or videos in the visual
input stream, at the frame level. To fuse the audio and visual feature streams
into joint representations and to align the joint space with the LLM input
embedding space, we propose a causal Q-Former structure with a causal attention
module to enhance the capture of causal relations of the audio-visual frames
across time. An audio-visual evaluation benchmark (AVEB) is also proposed which
comprises six representative single-modal tasks with five cross-modal tasks
reflecting audio-visual co-reasoning abilities. While achieving competitive
single-modal performance on audio, speech and image tasks in AVEB, FAVOR
achieved over 20% accuracy improvements on the video question-answering task
when fine-grained information or temporal causal reasoning is required. FAVOR,
in addition, demonstrated remarkable video comprehension and reasoning
abilities on tasks that are unprecedented by other multimodal LLMs. An
interactive demo of FAVOR is available at
https://github.com/BriansIDP/AudioVisualLLM.git, and the training code and
model checkpoints will be released soon
Connecting Speech Encoder and Large Language Model for ASR
The impressive capability and versatility of large language models (LLMs)
have aroused increasing attention in automatic speech recognition (ASR), with
several pioneering studies attempting to build integrated ASR models by
connecting a speech encoder with an LLM. This paper presents a comparative
study of three commonly used structures as connectors, including fully
connected layers, multi-head cross-attention, and Q-Former. Speech encoders
from the Whisper model series as well as LLMs from the Vicuna model series with
different model sizes were studied. Experiments were performed on the commonly
used LibriSpeech, Common Voice, and GigaSpeech datasets, where the LLMs with
Q-Formers demonstrated consistent and considerable word error rate (WER)
reductions over LLMs with other connector structures. Q-Former-based LLMs can
generalise well to out-of-domain datasets, where 12% relative WER reductions
over the Whisper baseline ASR model were achieved on the Eval2000 test set
without using any in-domain training data from Switchboard. Moreover, a novel
segment-level Q-Former is proposed to enable LLMs to recognise speech segments
with a duration exceeding the limitation of the encoders, which results in 17%
relative WER reductions over other connector structures on 90-second-long
speech data
A case report of adrenocorticotropic hormone to treat recurrent focal segmental glomerular sclerosis post-transplantation and biomarker monitoring
Background: Recurrent focal segmental glomerular sclerosis (rFSGS) in renal transplant recipients (RTR) is difficult to predict and treat. Early rFSGS is likely from circulating factors and preformed antibodies. Methods: We present the case of a 23-year-old white man who presented with rFSGS and acute renal failure requiring dialysis 9-months after a 1-haplotype matched living-related transplant. We retrospectively analyzed serum samples from various clinical stages for rFSGS biomarkers: serum glomerular albumin permeability (Palb), soluble urokinase-type plasminogen activator receptor (suPAR) serum level with suPAR-β3 integrin signaling on human podocytes, and angiotensin II type I receptor-antibody (AT1R-Ab) titer. Results: All biomarkers were abnormal at 1-year pre-transplant prior to initiation of dialysis and at the time of transplant. After initiation of hemodialysis, β3 integrin activity on human podocytes, in response to patient serum, as well as AT1R-Ab were further elevated. At the time of biopsy-proven recurrence, all biomarkers were abnormally high. One week after therapy with aborted plasmapheresis (secondary to intolerance), and high dose steroids, the Palb and suPAR- β3 integrin activity remained significantly positive. After 12-weeks of treatment with high-dose steroids, rituximab, and galactose, the patient remained hemodialysis-dependent. Three-months after his initial presentation we commenced adrenocorticotropic hormone (ACTH, Acthar® Gel), 80 units subcutaneously twice weekly. Four-weeks later he was able to discontinue dialysis. After 8-months of maintenance ACTH therapy, his serum creatinine stabilized at 1.79 mg/dL with less than 1 gram of proteinuria. Conclusion: ACTH therapy was associated with improvement in renal function within 4 weeks. The use of rFSGS biomarkers may aid in predicting development of rFSGS
Recommended from our members
Cardiovascular Disease Biomarkers and suPAR in Predicting Decline in Renal Function: A Prospective Cohort Study
Introduction: Soluble urokinase-type plasminogen activator receptor (suPAR) strongly predicts outcomes and incident chronic kidney disease (CKD) in patients with cardiovascular disease (CVD). Whether the association between suPAR and CKD is a reflection of its overall association with chronic inflammation and poor CVD outcomes is unclear. We examined whether CVD biomarkers, including high-sensitivity C-reactive protein (hs-CRP), fibrin-degradation products (FDPs), heat-shock protein 70 (HSP-70), and high-sensitivity troponin I (hs-TnI) were associated with a decline in kidney function in the Emory Cardiovascular Biobank cohort, in which suPAR levels were shown to be predictive of both incident CKD and CVD outcomes. Methods: We measured suPAR, hs-CRP, HSP-70, FDP, and hs-TnI plasma levels in 3282 adults (mean age 63 years, 64% male, 75% estimated glomerular filtration rate [eGFR] >60 ml/min per 1.73 m2). Glomerular filtration rate was estimated using Chronic Kidney Disease–Epidemiology Collaboration (eGFR) at enrollment (n = 3282) and follow-up (n = 2672; median 3.5 years). Urine protein by dipstick at baseline was available for 1335 subjects. Results: There was a weak correlation among biomarkers (r range: 0.17−0.28). hs-CRP, FDPs, hs-TnI, and suPAR were independently associated with baseline eGFR and proteinuria. The median yearly decline in eGFR was −0.6 ml/min per 1.73 m2. hs-CRP (β: −0.04; P = 0.46), FDPs (β: −0.13; P = 0.08), HSP-70 (β: 0.05; P = 0.84), or hs-TnI (β: −0.01; P = 0.76) were associated with eGFR decline. suPAR remained predictive of eGFR decline even after adjusting for all biomarkers. Discussion hs-CRP, FDP, HSP-70, and hs-TnI were not associated with eGFR decline. The specific association of suPAR with eGFR decline supported its involvement in pathways specific to the pathogenesis of kidney disease
Podocyte-Specific Overexpression of Wild Type or Mutant Trpc6 in Mice Is Sufficient to Cause Glomerular Disease
Mutations in the TRPC6 calcium channel (Transient receptor potential channel 6) gene have been associated with familiar forms of Focal and Segmental Glomerulosclerosis (FSGS) affecting children and adults. In addition, acquired glomerular diseases are associated with increased expression levels of TRPC6. However, the exact role of TRPC6 in the pathogenesis of FSGS remains to be elucidated. In this work we describe the generation and phenotypic characterization of three different transgenic mouse lines with podocyte-specific overexpression of the wild type or any of two mutant forms of Trpc6 (P111Q and E896K) previously related to FSGS. Consistent with the human phenotype a non-nephrotic range of albuminuria was detectable in almost all transgenic lines. The histological analysis demonstrated that the transgenic mice developed a kidney disease similar to human FSGS. Differences of 2–3 folds in the presence of glomerular lesions were found between the non transgenic and transgenic mice expressing Trpc6 in its wild type or mutant forms specifically in podocytes. Electron microscopy of glomerulus from transgenic mice showed extensive podocyte foot process effacement. We conclude that overexpression of Trpc6 (wild type or mutated) in podocytes is sufficient to cause a kidney disease consistent with FSGS. Our results contribute to reinforce the central role of podocytes in the etiology of FSGS. These mice constitute an important new model in which to study future therapies and outcomes of this complex disease
- …