71 research outputs found
Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring
Recent studies on pronunciation scoring have explored the effect of
introducing phone embeddings as reference pronunciation, but mostly in an
implicit manner, i.e., addition or concatenation of reference phone embedding
and actual pronunciation of the target phone as the phone-level pronunciation
quality representation. In this paper, we propose to use linguistic-acoustic
similarity to explicitly measure the deviation of non-native production from
its native reference for pronunciation assessment. Specifically, the deviation
is first estimated by the cosine similarity between reference phone embedding
and corresponding acoustic embedding. Next, a phone-level Goodness of
pronunciation (GOP) pre-training stage is introduced to guide this
similarity-based learning for better initialization of the aforementioned two
embeddings. Finally, a transformer-based hierarchical pronunciation scorer is
used to map a sequence of phone embeddings, acoustic embeddings along with
their similarity measures to predict the final utterance-level score.
Experimental results on the non-native databases suggest that the proposed
system significantly outperforms the baselines, where the acoustic and phone
embeddings are simply added or concatenated. A further examination shows that
the phone embeddings learned in the proposed approach are able to capture
linguistic-acoustic attributes of native pronunciation as reference.Comment: Accepted by ICASSP 202
An ASR-free Fluency Scoring Approach with Self-Supervised Learning
A typical fluency scoring system generally relies on an automatic speech
recognition (ASR) system to obtain time stamps in input speech for either the
subsequent calculation of fluency-related features or directly modeling speech
fluency with an end-to-end approach. This paper describes a novel ASR-free
approach for automatic fluency assessment using self-supervised learning (SSL).
Specifically, wav2vec2.0 is used to extract frame-level speech features,
followed by K-means clustering to assign a pseudo label (cluster index) to each
frame. A BLSTM-based model is trained to predict an utterance-level fluency
score from frame-level SSL features and the corresponding cluster indexes.
Neither speech transcription nor time stamp information is required in the
proposed system. It is ASR-free and can potentially avoid the ASR errors effect
in practice. Experimental results carried out on non-native English databases
show that the proposed approach significantly improves the performance in the
"open response" scenario as compared to previous methods and matches the
recently reported performance in the "read aloud" scenario.Comment: Accepted by ICASSP 202
Heat-Shock Protein 90 Promotes Nuclear Transport of Herpes Simplex Virus 1 Capsid Protein by Interacting with Acetylated Tubulin
Although it is known that inhibitors of heat shock protein 90 (Hsp90) can inhibit herpes simplex virus type 1 (HSV-1) infection, the role of Hsp90 in HSV-1 entry and the antiviral mechanisms of Hsp90 inhibitors remain unclear. In this study, we found that Hsp90 inhibitors have potent antiviral activity against standard or drug-resistant HSV-1 strains and viral gene and protein synthesis are inhibited in an early phase. More detailed studies demonstrated that Hsp90 is upregulated by virus entry and it interacts with virus. Hsp90 knockdown by siRNA or treatment with Hsp90 inhibitors significantly inhibited the nuclear transport of viral capsid protein (ICP5) at the early stage of HSV-1 infection. In contrast, overexpression of Hsp90 restored the nuclear transport that was prevented by the Hsp90 inhibitors, suggesting that Hsp90 is required for nuclear transport of viral capsid protein. Furthermore, HSV-1 infection enhanced acetylation of α-tubulin and Hsp90 interacted with the acetylated α-tubulin, which is suppressed by Hsp90 inhibition. These results demonstrate that Hsp90, by interacting with acetylated α-tubulin, plays a crucial role in viral capsid protein nuclear transport and may provide novel insight into the role of Hsp90 in HSV-1 infection and offer a promising strategy to overcome drug-resistance
An Entity Relation Extraction Method Based on Dynamic Context and Multi-Feature Fusion
Dynamic context selector, a kind of mask idea, will divide the matrix into some regions, selecting the information of region as the input of model dynamically. There is a novel thought that improvement is made on the entity relation extraction (ERE) by applying the dynamic context to the training. In reality, most existing models of joint extraction of entity and relation are based on static context, which always suffers from the feature missing issue, resulting in poor performance. To address the problem, we propose a span-based joint extraction method based on dynamic context and multi-feature fusion (SPERT-DC). The context area is picked dynamically with the help of threshold in feature selecting layer of the model. It is noted that we also use Bi-LSTM_ATT to improve compatibility of longer text in feature extracting layer and enhance context information by combining with the tags of entity in feature fusion layer. Furthermore, the model in this paper outperforms prior work by up to 1% F1 score on the public dataset, which has verified the efficiency of dynamic context on ERE model
- …