Search CORE

6,566 research outputs found

Can Whisper perform speech-based in-context learning

Author: Wang Siyin
Wu Ji
Yang Chao-Han Huck
Zhang Chao
Publication venue
Publication date: 13/09/2023
Field of study

This paper investigates the in-context learning abilities of the Whisper automatic speech recognition (ASR) models released by OpenAI. A novel speech-based in-context learning (SICL) approach is proposed for test-time adaptation, which can reduce the word error rates (WERs) with only a small number of labelled speech samples without gradient descent. Language-level adaptation experiments using Chinese dialects showed that when applying SICL to isolated word ASR, consistent and considerable relative WER reductions can be achieved using Whisper models of any size on two dialects, which is on average 32.3%. A k-nearest-neighbours-based in-context example selection technique can be applied to further improve the efficiency of SICL, which can increase the average relative WER reduction to 36.4%. The findings are verified using speaker adaptation or continuous speech recognition tasks, and both achieved considerable relative WER reductions. Detailed quantitative analyses are also provided to shed light on SICL's adaptability to phonological variances and dialect-specific lexical nuances.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Multi-bump solutions for the magnetic Schrödinger-Poisson system with critical growth

Author: Ji Chao
Rădulescu Vicenţiu D.
Zhang Yongde
Publication venue
Publication date: 01/01/2022
Field of study

University of Szeged

Detonation Output Properties of D-shape Structure

Author: Wang Shu-Shan
WEI Ji-feng
Zhang Chao
Publication venue: 'Defence Scientific Information and Documentation Centre'
Publication date: 12/08/2014
Field of study

The detonation wave propagation and output properties have been analyzed for D-shape structure. Four initiation modes were designed to compare wavefront profiles and output pressure distribution. Simulation results show that three-array-nine-point initiation mode (Mode-III) brings about the most match-up wave front for D-shape structure. Detonation output properties have great influence on fragment ejection velocity and distribution density. The statistical results reveal that fragment parameters of Mode-III are the largest. Compared with Mode-III, the kinetic energies of other three modes decrease by 31.6 per cent, 19.6 per cent, 4.5 per cent, respectively. The computational values and normal curve of fragments distribution are obtained. From these analyses, it can be concluded that initiation mode has great influence on output parameters of fragments. With the optimal initiation Mode-III, ideal hitting angle should be within the range of -10° to 10°, the probability of distribution density would be close to 70 per cent.Defence Science Journal, Vol. 64, No. 5, September 2014, pp.484-489, DOI:http://dx.doi.org/10.14429/dsj.64.474

Defence Science Journal

Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection

Author: Cui Ziyun
Wu Ji
Wu Wen
Zhang Chao
Zhang Wei-Qiang
Publication venue
Publication date: 06/10/2023
Field of study

The detection of Alzheimer's disease (AD) from spontaneous speech has attracted increasing attention while the sparsity of training data remains an important issue. This paper handles the issue by knowledge transfer, specifically from both speech-generic and depression-specific knowledge. The paper first studies sequential knowledge transfer from generic foundation models pretrained on large amounts of speech and text data. A block-wise analysis is performed for AD diagnosis based on the representations extracted from different intermediate blocks of different foundation models. Apart from the knowledge from speech-generic representations, this paper also proposes to simultaneously transfer the knowledge from a speech depression detection task based on the high comorbidity rates of depression and AD. A parallel knowledge transfer framework is studied that jointly learns the information shared between these two tasks. Experimental results show that the proposed method improves AD and depression detection, and produces a state-of-the-art F1 score of 0.928 for AD diagnosis on the commonly used ADReSSo dataset.Comment: 8 pages, 4 figures. Accepted by ASRU 202

arXiv.org e-Print Archive

4-Methoxy-3-nitrobiphenyl

Author: Chao Xuqiang
Chen Qiang
Ji Jun
Wang Kai
Zhang Xiuqin
Publication venue: International Union of Crystallography
Publication date: 01/01/2012
Field of study

In the title compound, C13H11NO3, the dihedral angle between the two benzene rings is 36.69 (2)° and the nitro and methyoxy groups are oriented at 29.12 (14) and 2.14 (12)° with respect to the benzene ring to which they are bonded

Directory of Open Access Journals

PubMed Central