6,566 research outputs found

    Can Whisper perform speech-based in-context learning

    Full text link
    This paper investigates the in-context learning abilities of the Whisper automatic speech recognition (ASR) models released by OpenAI. A novel speech-based in-context learning (SICL) approach is proposed for test-time adaptation, which can reduce the word error rates (WERs) with only a small number of labelled speech samples without gradient descent. Language-level adaptation experiments using Chinese dialects showed that when applying SICL to isolated word ASR, consistent and considerable relative WER reductions can be achieved using Whisper models of any size on two dialects, which is on average 32.3%. A k-nearest-neighbours-based in-context example selection technique can be applied to further improve the efficiency of SICL, which can increase the average relative WER reduction to 36.4%. The findings are verified using speaker adaptation or continuous speech recognition tasks, and both achieved considerable relative WER reductions. Detailed quantitative analyses are also provided to shed light on SICL's adaptability to phonological variances and dialect-specific lexical nuances.Comment: Submitted to ICASSP 202

    Detonation Output Properties of D-shape Structure

    Get PDF
    The detonation wave propagation and output properties have been analyzed for D-shape structure. Four initiation modes were designed to compare wavefront profiles and output pressure distribution. Simulation results show that three-array-nine-point initiation mode (Mode-III) brings about the most match-up wave front for D-shape structure. Detonation output properties have great influence on fragment ejection velocity and distribution density. The statistical results reveal that fragment parameters of Mode-III are the largest. Compared with Mode-III, the kinetic energies of other three modes decrease by 31.6 per cent, 19.6 per cent, 4.5 per cent, respectively. The computational values and normal curve of fragments distribution are obtained. From these analyses, it can be concluded that initiation mode has great influence on output parameters of fragments. With the optimal initiation Mode-III, ideal hitting angle should be within the range of -10° to 10°, the probability of distribution density would be close to 70 per cent.Defence Science Journal, Vol. 64, No. 5, September 2014, pp.484-489, DOI:http://dx.doi.org/10.14429/dsj.64.474

    Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection

    Full text link
    The detection of Alzheimer's disease (AD) from spontaneous speech has attracted increasing attention while the sparsity of training data remains an important issue. This paper handles the issue by knowledge transfer, specifically from both speech-generic and depression-specific knowledge. The paper first studies sequential knowledge transfer from generic foundation models pretrained on large amounts of speech and text data. A block-wise analysis is performed for AD diagnosis based on the representations extracted from different intermediate blocks of different foundation models. Apart from the knowledge from speech-generic representations, this paper also proposes to simultaneously transfer the knowledge from a speech depression detection task based on the high comorbidity rates of depression and AD. A parallel knowledge transfer framework is studied that jointly learns the information shared between these two tasks. Experimental results show that the proposed method improves AD and depression detection, and produces a state-of-the-art F1 score of 0.928 for AD diagnosis on the commonly used ADReSSo dataset.Comment: 8 pages, 4 figures. Accepted by ASRU 202

    4-Meth­oxy-3-nitro­biphen­yl

    Get PDF
    In the title compound, C13H11NO3, the dihedral angle between the two benzene rings is 36.69 (2)° and the nitro and methy­oxy groups are oriented at 29.12 (14) and 2.14 (12)° with respect to the benzene ring to which they are bonded
    corecore