6,839 research outputs found
Can Whisper perform speech-based in-context learning
This paper investigates the in-context learning abilities of the Whisper
automatic speech recognition (ASR) models released by OpenAI. A novel
speech-based in-context learning (SICL) approach is proposed for test-time
adaptation, which can reduce the word error rates (WERs) with only a small
number of labelled speech samples without gradient descent. Language-level
adaptation experiments using Chinese dialects showed that when applying SICL to
isolated word ASR, consistent and considerable relative WER reductions can be
achieved using Whisper models of any size on two dialects, which is on average
32.3%. A k-nearest-neighbours-based in-context example selection technique can
be applied to further improve the efficiency of SICL, which can increase the
average relative WER reduction to 36.4%. The findings are verified using
speaker adaptation or continuous speech recognition tasks, and both achieved
considerable relative WER reductions. Detailed quantitative analyses are also
provided to shed light on SICL's adaptability to phonological variances and
dialect-specific lexical nuances.Comment: Submitted to ICASSP 202
Detonation Output Properties of D-shape Structure
The detonation wave propagation and output properties have been analyzed for D-shape structure. Four initiation modes were designed to compare wavefront profiles and output pressure distribution. Simulation results show that three-array-nine-point initiation mode (Mode-III) brings about the most match-up wave front for D-shape structure. Detonation output properties have great influence on fragment ejection velocity and distribution density. The statistical results reveal that fragment parameters of Mode-III are the largest. Compared with Mode-III, the kinetic energies of other three modes decrease by 31.6 per cent, 19.6 per cent, 4.5 per cent, respectively. The computational values and normal curve of fragments distribution are obtained. From these analyses, it can be concluded that initiation mode has great influence on output parameters of fragments. With the optimal initiation Mode-III, ideal hitting angle should be within the range of -10° to 10°, the probability of distribution density would be close to 70 per cent.Defence Science Journal, Vol. 64, No. 5, September 2014, pp.484-489, DOI:http://dx.doi.org/10.14429/dsj.64.474
Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection
The detection of Alzheimer's disease (AD) from spontaneous speech has
attracted increasing attention while the sparsity of training data remains an
important issue. This paper handles the issue by knowledge transfer,
specifically from both speech-generic and depression-specific knowledge. The
paper first studies sequential knowledge transfer from generic foundation
models pretrained on large amounts of speech and text data. A block-wise
analysis is performed for AD diagnosis based on the representations extracted
from different intermediate blocks of different foundation models. Apart from
the knowledge from speech-generic representations, this paper also proposes to
simultaneously transfer the knowledge from a speech depression detection task
based on the high comorbidity rates of depression and AD. A parallel knowledge
transfer framework is studied that jointly learns the information shared
between these two tasks. Experimental results show that the proposed method
improves AD and depression detection, and produces a state-of-the-art F1 score
of 0.928 for AD diagnosis on the commonly used ADReSSo dataset.Comment: 8 pages, 4 figures. Accepted by ASRU 202
Learning Image Demoireing from Unpaired Real Data
This paper focuses on addressing the issue of image demoireing. Unlike the
large volume of existing studies that rely on learning from paired real data,
we attempt to learn a demoireing model from unpaired real data, i.e., moire
images associated with irrelevant clean images. The proposed method, referred
to as Unpaired Demoireing (UnDeM), synthesizes pseudo moire images from
unpaired datasets, generating pairs with clean images for training demoireing
models. To achieve this, we divide real moire images into patches and group
them in compliance with their moire complexity. We introduce a novel moire
generation framework to synthesize moire images with diverse moire features,
resembling real moire patches, and details akin to real moire-free images.
Additionally, we introduce an adaptive denoise method to eliminate the
low-quality pseudo moire images that adversely impact the learning of
demoireing models. We conduct extensive experiments on the commonly-used FHDMi
and UHDM datasets. Results manifest that our UnDeM performs better than
existing methods when using existing demoireing models such as MBCNN and
ESDNet-L. Code: https://github.com/zysxmu/UnDeMComment: AAAI202
4-Methoxy-3-nitrobiphenyl
In the title compound, C13H11NO3, the dihedral angle between the two benzene rings is 36.69 (2)° and the nitro and methyoxy groups are oriented at 29.12 (14) and 2.14 (12)° with respect to the benzene ring to which they are bonded
- …