Search CORE

115 research outputs found

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

Author: Chen Xie
Ma Ziyang
Tang Changli
Wang Yujin
Zhang Wei-Qiang
Zheng Zhisheng
Publication venue
Publication date: 27/10/2022
Field of study

Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing. The SSL model is normally pre-trained on a great variety of unlabelled data and a large model size is preferred to increase the modeling capacity. However, this might limit its potential applications due to the expensive computation and memory costs introduced by the oversize model. Miniaturization for SSL models has become an important research direction of practical value. To this end, we explore the effective distillation of HuBERT-based SSL models for automatic speech recognition (ASR). First, in order to establish a strong baseline, a comprehensive study on different student model structures is conducted. On top of this, as a supplement to the regression loss widely adopted in previous works, a discriminative loss is introduced for HuBERT to enhance the distillation performance, especially in low-resource scenarios. In addition, we design a simple and effective algorithm to distill the front-end input from waveform to Fbank feature, resulting in 17% parameter reduction and doubling inference speed, at marginal performance degradation.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Proximity effect at superconducting Sn-Bi2Se3 interface

Author: Chen Jun
Ding Yue
Fan Jie
Ji Zhongqing
Liu Guangtong
Lu Li
Qu Fanming
Shen Jie
Wei Zhongchao
Xiang Tao
Yang Changli
Yang Fan
Publication venue: 'American Physical Society (APS)'
Publication date: 20/02/2012
Field of study

We have investigated the conductance spectra of Sn-Bi2Se3 interface junctions down to 250 mK and in different magnetic fields. A number of conductance anomalies were observed below the superconducting transition temperature of Sn, including a small gap different from that of Sn, and a zero-bias conductance peak growing up at lower temperatures. We discussed the possible origins of the smaller gap and the zero-bias conductance peak. These phenomena support that a proximity-effect-induced chiral superconducting phase is formed at the interface between the superconducting Sn and the strong spin-orbit coupling material Bi2Se3.Comment: 7 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

Author: Chen Xianzhao
Li Wei
Lu Lu
Ma Zejun
Sun Guangzhi
Tan Tian
Tang Changli
Yu Wenyi
Zhang Chao
Publication venue
Publication date: 10/10/2023
Field of study

Audio-visual large language models (LLM) have drawn significant attention, yet the fine-grained combination of both input streams is rather under-explored, which is challenging but necessary for LLMs to understand general video inputs. To this end, a fine-grained audio-visual joint representation (FAVOR) learning framework for multimodal LLMs is proposed in this paper, which extends a text-based LLM to simultaneously perceive speech and audio events in the audio input stream and images or videos in the visual input stream, at the frame level. To fuse the audio and visual feature streams into joint representations and to align the joint space with the LLM input embedding space, we propose a causal Q-Former structure with a causal attention module to enhance the capture of causal relations of the audio-visual frames across time. An audio-visual evaluation benchmark (AVEB) is also proposed which comprises six representative single-modal tasks with five cross-modal tasks reflecting audio-visual co-reasoning abilities. While achieving competitive single-modal performance on audio, speech and image tasks in AVEB, FAVOR achieved over 20% accuracy improvements on the video question-answering task when fine-grained information or temporal causal reasoning is required. FAVOR, in addition, demonstrated remarkable video comprehension and reasoning abilities on tasks that are unprecedented by other multimodal LLMs. An interactive demo of FAVOR is available at https://github.com/BriansIDP/AudioVisualLLM.git, and the training code and model checkpoints will be released soon

arXiv.org e-Print Archive

Connecting Speech Encoder and Large Language Model for ASR

Author: Chen Xianzhao
Li Wei
Lu Lu
Ma Zejun
Sun Guangzhi
Tan Tian
Tang Changli
Yu Wenyi
Zhang Chao
Publication venue
Publication date: 26/09/2023
Field of study

The impressive capability and versatility of large language models (LLMs) have aroused increasing attention in automatic speech recognition (ASR), with several pioneering studies attempting to build integrated ASR models by connecting a speech encoder with an LLM. This paper presents a comparative study of three commonly used structures as connectors, including fully connected layers, multi-head cross-attention, and Q-Former. Speech encoders from the Whisper model series as well as LLMs from the Vicuna model series with different model sizes were studied. Experiments were performed on the commonly used LibriSpeech, Common Voice, and GigaSpeech datasets, where the LLMs with Q-Formers demonstrated consistent and considerable word error rate (WER) reductions over LLMs with other connector structures. Q-Former-based LLMs can generalise well to out-of-domain datasets, where 12% relative WER reductions over the Whisper baseline ASR model were achieved on the Eval2000 test set without using any in-domain training data from Switchboard. Moreover, a novel segment-level Q-Former is proposed to enable LLMs to recognise speech segments with a duration exceeding the limitation of the encoders, which results in 17% relative WER reductions over other connector structures on 90-second-long speech data

arXiv.org e-Print Archive

A case report of adrenocorticotropic hormone to treat recurrent focal segmental glomerular sclerosis post-transplantation and biomarker monitoring

Author: Anwar Siddiq
Ashraf Muhammad
Brennan Daniel C
Culiberk Nancy
Larson Derek S
Liapis Helen
Naimi Nimi
Reiser Jochen
Wei Changli
Publication venue: Digital Commons@Becker
Publication date: 01/01/2015
Field of study

Background: Recurrent focal segmental glomerular sclerosis (rFSGS) in renal transplant recipients (RTR) is difficult to predict and treat. Early rFSGS is likely from circulating factors and preformed antibodies. Methods: We present the case of a 23-year-old white man who presented with rFSGS and acute renal failure requiring dialysis 9-months after a 1-haplotype matched living-related transplant. We retrospectively analyzed serum samples from various clinical stages for rFSGS biomarkers: serum glomerular albumin permeability (Palb), soluble urokinase-type plasminogen activator receptor (suPAR) serum level with suPAR-β3 integrin signaling on human podocytes, and angiotensin II type I receptor-antibody (AT1R-Ab) titer. Results: All biomarkers were abnormal at 1-year pre-transplant prior to initiation of dialysis and at the time of transplant. After initiation of hemodialysis, β3 integrin activity on human podocytes, in response to patient serum, as well as AT1R-Ab were further elevated. At the time of biopsy-proven recurrence, all biomarkers were abnormally high. One week after therapy with aborted plasmapheresis (secondary to intolerance), and high dose steroids, the Palb and suPAR- β3 integrin activity remained significantly positive. After 12-weeks of treatment with high-dose steroids, rituximab, and galactose, the patient remained hemodialysis-dependent. Three-months after his initial presentation we commenced adrenocorticotropic hormone (ACTH, Acthar® Gel), 80 units subcutaneously twice weekly. Four-weeks later he was able to discontinue dialysis. After 8-months of maintenance ACTH therapy, his serum creatinine stabilized at 1.79 mg/dL with less than 1 gram of proteinuria. Conclusion: ACTH therapy was associated with improvement in renal function within 4 weeks. The use of rFSGS biomarkers may aid in predicting development of rFSGS

Crossref

Directory of Open Access Journals

Digital Commons@Becker

Frontiers - Publisher Connector

PubMed Central

Recommended from our members

Cardiovascular Disease Biomarkers and suPAR in Predicting Decline in Renal Function: A Prospective Cohort Study

Author: Ahmed Hina
Aida Hiroshi
Awad Mosaab
Gray Brandon
Hayek Salim S.
Hosny Kareem Mohammed
Ko Yi-An
Quyyumi Arshed A.
Reiser Jochen
Sever Sanja
Tracy Melissa J.
Wei Changli
Publication venue: 'Elsevier BV'
Publication date: 01/05/2017
Field of study

Introduction: Soluble urokinase-type plasminogen activator receptor (suPAR) strongly predicts outcomes and incident chronic kidney disease (CKD) in patients with cardiovascular disease (CVD). Whether the association between suPAR and CKD is a reflection of its overall association with chronic inflammation and poor CVD outcomes is unclear. We examined whether CVD biomarkers, including high-sensitivity C-reactive protein (hs-CRP), fibrin-degradation products (FDPs), heat-shock protein 70 (HSP-70), and high-sensitivity troponin I (hs-TnI) were associated with a decline in kidney function in the Emory Cardiovascular Biobank cohort, in which suPAR levels were shown to be predictive of both incident CKD and CVD outcomes. Methods: We measured suPAR, hs-CRP, HSP-70, FDP, and hs-TnI plasma levels in 3282 adults (mean age 63 years, 64% male, 75% estimated glomerular filtration rate [eGFR] >60 ml/min per 1.73 m2). Glomerular filtration rate was estimated using Chronic Kidney Disease–Epidemiology Collaboration (eGFR) at enrollment (n = 3282) and follow-up (n = 2672; median 3.5 years). Urine protein by dipstick at baseline was available for 1335 subjects. Results: There was a weak correlation among biomarkers (r range: 0.17−0.28). hs-CRP, FDPs, hs-TnI, and suPAR were independently associated with baseline eGFR and proteinuria. The median yearly decline in eGFR was −0.6 ml/min per 1.73 m2. hs-CRP (β: −0.04; P = 0.46), FDPs (β: −0.13; P = 0.08), HSP-70 (β: 0.05; P = 0.84), or hs-TnI (β: −0.01; P = 0.76) were associated with eGFR decline. suPAR remained predictive of eGFR decline even after adjusting for all biomarkers. Discussion hs-CRP, FDP, HSP-70, and hs-TnI were not associated with eGFR decline. The specific association of suPAR with eGFR decline supported its involvement in pathways specific to the pathogenesis of kidney disease

Harvard University - DASH

Directory of Open Access Journals

Regarding Maas's editorial letter on serum suPAR levels

Author: Barisoni Laura
Burke George W.
Faul Christian
Fornoni Alessia
Ghiggeri Gian M.
Gipson Debbie S.
Gupta Vineet
Kaskel Frederick
Reiser Jochen
Schaefer Franz
Thomas David B.
Trachtman Howard
Wei Changli
Publication venue: International Society of Nephrology. Published by Elsevier Inc.
Publication date: 01/08/2012
Field of study

Elsevier - Publisher Connector

Crossref

PubMed Central

University of Miami: Scholarship Miami

Podocyte-Specific Overexpression of Wild Type or Mutant Trpc6 in Mice Is Sufficient to Cause Glomerular Disease

Author: AB Fogo
C El-Aouni
C Faul
C Kitiyakara
C Montell
CC Möller
CC Yu
Cesar P. Canales
Changli Wei
Christos Chatziantoniou
DB Donoviel
DB Thomas
DE Clapham
EJ Brown
EJ Monteiro
G Boulay
H Kajiyama
H Putaala
J Juhila
J Reiser
J Reiser
J Schlöndorff
J. Daniel Carpio
JB Kopp
Jessica Molina
Jing Li
JL Michaud
JM Kaplan
Jochen Reiser
Juan I. Young
Katherina Walz
M Estacion
M Spassova
MJ Moeller
MJ Moeller
MP Winn
MR Pollak
N Boute
NY Shih
P Eder
P Mundel
Pamela Kairath
Paola Krall
Paulina Carmona-Mora
Phillip Ruiz
S Roselli
S Santín
SD Crowley
Sergio A. Mezzano
SF Heeringa
T Hofmann
T Okada
T Shigehara
TB Huber
VD D'Agati
VD D'Agati
W Kriz
Y Yu
Publication venue: Public Library of Science
Publication date: 20/09/2010
Field of study

Mutations in the TRPC6 calcium channel (Transient receptor potential channel 6) gene have been associated with familiar forms of Focal and Segmental Glomerulosclerosis (FSGS) affecting children and adults. In addition, acquired glomerular diseases are associated with increased expression levels of TRPC6. However, the exact role of TRPC6 in the pathogenesis of FSGS remains to be elucidated. In this work we describe the generation and phenotypic characterization of three different transgenic mouse lines with podocyte-specific overexpression of the wild type or any of two mutant forms of Trpc6 (P111Q and E896K) previously related to FSGS. Consistent with the human phenotype a non-nephrotic range of albuminuria was detectable in almost all transgenic lines. The histological analysis demonstrated that the transgenic mice developed a kidney disease similar to human FSGS. Differences of 2–3 folds in the presence of glomerular lesions were found between the non transgenic and transgenic mice expressing Trpc6 in its wild type or mutant forms specifically in podocytes. Electron microscopy of glomerulus from transgenic mice showed extensive podocyte foot process effacement. We conclude that overexpression of Trpc6 (wild type or mutated) in podocytes is sufficient to cause a kidney disease consistent with FSGS. Our results contribute to reinforce the central role of podocytes in the etiology of FSGS. These mice constitute an important new model in which to study future therapies and outcomes of this complex disease

Crossref

Directory of Open Access Journals

PubMed Central

University of Miami: Scholarship Miami