193 research outputs found
An Ensemble Method to Automatically Grade Diabetic Retinopathy with Optical Coherence Tomography Angiography Images
Diabetic retinopathy (DR) is a complication of diabetes, and one of the major
causes of vision impairment in the global population. As the early-stage
manifestation of DR is usually very mild and hard to detect, an accurate
diagnosis via eye-screening is clinically important to prevent vision loss at
later stages. In this work, we propose an ensemble method to automatically
grade DR using ultra-wide optical coherence tomography angiography (UW-OCTA)
images available from Diabetic Retinopathy Analysis Challenge (DRAC) 2022.
First, we adopt the state-of-the-art classification networks, i.e., ResNet,
DenseNet, EfficientNet, and VGG, and train them to grade UW-OCTA images with
different splits of the available dataset. Ultimately, we obtain 25 models, of
which, the top 16 models are selected and ensembled to generate the final
predictions. During the training process, we also investigate the multi-task
learning strategy, and add an auxiliary classification task, the Image Quality
Assessment, to improve the model performance. Our final ensemble model achieved
a quadratic weighted kappa (QWK) of 0.9346 and an Area Under Curve (AUC) of
0.9766 on the internal testing dataset, and the QWK of 0.839 and the AUC of
0.8978 on the DRAC challenge testing dataset.Comment: 13 pages, 6 figures, 5 tables. To appear in Diabetic Retinopathy
Analysis Challenge (DRAC), Bin Sheng et al., MICCAI 2022 Challenge, Lecture
Notes in Computer Science, Springe
Analysis of spherical indentation of materials with plastically graded surface layer
AbstractIn the present work, a comprehensive parametric study for establishing contact mechanics of instrumented normal spherical indentation on homogeneous materials and materials with plastically graded surface layer (PGSL) was undertaken by dimensional analysis and finite element modeling. The spherical indentation response for homogeneous materials can be described only by two dimensionless parameters: strain hardening exponent and a unified parameter that can describe effects of both the normalized yield strength and the normalized indentation depth. The influences of these two parameters were investigated for a wide range of engineering materials, and the results may be used as an estimate of loading response and pile-up/sink-in behavior when the material properties are known. In the materials with PGSL, a linear gradient in yield strength, and no variation in elastic modulus and strain hardening exponent were explored. The indentation response of the materials with PGSL can be described only by three dimensionless parameters: the normalized indentation depth, the dimensionless strength gradient parameter, and the normalized PGSL thickness. The effects of these three parameters were studied systematically. The normalized pile-up/sink-in parameter is found to be an increasing function of the strength gradient parameter. The normalized pile-up/sink-in parameter increases (decreases) with increasing PGSL thickness for a fixed positive (negative) gradient case at large indentation depth. The results also indicate that the materials with positive PGSL can bear more loads and have significantly more resistance to contact crack formation
ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs
In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding
Prompt-and-Refine strategy (Figure 3), two simple but effective
\textbf{training-free} methods to decrease the Token Display Time (TDT) of
streaming ASR models \textbf{without any accuracy loss}. The core idea of
ZeroPrompt is to append zeroed content to each chunk during inference, which
acts like a prompt to encourage the model to predict future tokens even before
they were spoken. We argue that streaming acoustic encoders naturally have the
modeling ability of Masked Language Models and our experiments demonstrate that
ZeroPrompt is engineering cheap and can be applied to streaming acoustic
encoders on any dataset without any accuracy loss. Specifically, compared with
our baseline models, we achieve 350 700ms reduction on First Token
Display Time (TDT-F) and 100 400ms reduction on Last Token Display Time
(TDT-L), with theoretically and experimentally equal WER on both Aishell-1 and
Librispeech datasets.Comment: accepted by interspeech 202
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Recent advances in neural text-to-speech (TTS) models bring thousands of TTS
applications into daily life, where models are deployed in cloud to provide
services for customs. Among these models are diffusion probabilistic models
(DPMs), which can be stably trained and are more parameter-efficient compared
with other generative models. As transmitting data between customs and the
cloud introduces high latency and the risk of exposing private data, deploying
TTS models on edge devices is preferred. When implementing DPMs onto edge
devices, there are two practical problems. First, current DPMs are not
lightweight enough for resource-constrained devices. Second, DPMs require many
denoising steps in inference, which increases latency. In this work, we present
LightGrad, a lightweight DPM for TTS. LightGrad is equipped with a lightweight
U-Net diffusion decoder and a training-free fast sampling technique, reducing
both model parameters and inference latency. Streaming inference is also
implemented in LightGrad to reduce latency further. Compared with Grad-TTS,
LightGrad achieves 62.2% reduction in paramters, 65.7% reduction in latency,
while preserving comparable speech quality on both Chinese Mandarin and English
in 4 denoising steps.Comment: Accepted by ICASSP 202
The Effect of Key Opinion Leader Type on Purchase Intention: Considering the Moderating Effect of Product Type
TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty
In this paper, we present TrimTail, a simple but effective emission
regularization method to improve the latency of streaming ASR models. The core
idea of TrimTail is to apply length penalty (i.e., by trimming trailing frames,
see Fig. 1-(b)) directly on the spectrogram of input utterances, which does not
require any alignment. We demonstrate that TrimTail is computationally cheap
and can be applied online and optimized with any training loss or any model
architecture on any dataset without any extra effort by applying it on various
end-to-end streaming ASR networks either trained with CTC loss [1] or
Transducer loss [2]. We achieve 100 200ms latency reduction with equal
or even better accuracy on both Aishell-1 and Librispeech. Moreover, by using
TrimTail, we can achieve a 400ms algorithmic improvement of User Sensitive
Delay (USD) with an accuracy loss of less than 0.2.Comment: submitted to ICASSP 202
- …