30 research outputs found
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
Contextual information plays a crucial role in speech recognition
technologies and incorporating it into the end-to-end speech recognition models
has drawn immense interest recently. However, previous deep bias methods lacked
explicit supervision for bias tasks. In this study, we introduce a contextual
phrase prediction network for an attention-based deep bias method. This network
predicts context phrases in utterances using contextual embeddings and
calculates bias loss to assist in the training of the contextualized model. Our
method achieved a significant word error rate (WER) reduction across various
end-to-end speech recognition models. Experiments on the LibriSpeech corpus
show that our proposed model obtains a 12.1% relative WER improvement over the
baseline model, and the WER of the context phrases decreases relatively by
40.5%. Moreover, by applying a context phrase filtering strategy, we also
effectively eliminate the WER degradation when using a larger biasing list.Comment: Accepted by interspeech202
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
By incorporating additional contextual information, deep biasing methods have
emerged as a promising solution for speech recognition of personalized words.
However, for real-world voice assistants, always biasing on such personalized
words with high prediction scores can significantly degrade the performance of
recognizing common words. To address this issue, we propose an adaptive
contextual biasing method based on Context-Aware Transformer Transducer (CATT)
that utilizes the biased encoder and predictor embeddings to perform streaming
prediction of contextual phrase occurrences. Such prediction is then used to
dynamically switch the bias list on and off, enabling the model to adapt to
both personalized and common scenarios. Experiments on Librispeech and internal
voice assistant datasets show that our approach can achieve up to 6.7% and
20.7% relative reduction in WER and CER compared to the baseline respectively,
mitigating up to 96.7% and 84.9% of the relative WER and CER increase for
common cases. Furthermore, our approach has a minimal performance impact in
personalized scenarios while maintaining a streaming inference pipeline with
negligible RTF increase
Active Metamaterial Antenna with Tunable Zeroth-Order Resonances for Narrowband Internet of Things
With unique electromagnetic properties, metamaterials (MTMs) provide more freedom for antenna design, particularly with the combination of active-device-enabling effective tuning. By integrating the active device and the periodical cells of MTMs, the electromagnetic characteristics of individual cells can be manipulated independently, thereby realizing multiple tunable states for MTM antennas consisting of several periodical cells. In this paper, we employ active devices such as PIN diodes to each periodical cell to tune each cell independently, thereby realizing 36 tunable zeroth-order resonances (ZORs) for the metamaterial antenna with three cells in a frequency range of 4.48–5.34 GHz. Moreover, each ZOR has a bandwidth as narrow as 0.09 GHz, indicating that the tunable ZOR antenna can be potentially applied to 5G Narrowband Internet of Things (NB-IoT)
A model-based method with geometric solutions for gaze correction in eye-tracking
The eyeball distortions caused by eye diseases, such as myopia and strabismus, can lead to the deviations of eye-tracking data. In this paper, a model-based method with geometric solutions is proposed for gaze correction. The deviations of estimated gaze points are geometrically analyzed based on the individual eyeball models with considerations of the distortions caused by myopia and strabismus. A set of integrated geometric solutions is derived from the varied situations including the case of strabismus and the case of myopia and strabismus, and then used for gaze correction in eyetracking. The experimental results demonstrate that this model-based method is effective to reduce deviations in estimated gaze points, and can be used to correct the modeling error in eye-tracking. Moreover, the proposed method has the potential to provide a simple approach to correct the eyetracking data for various populations with eye diseases
Mesh Generation and Flexible Shape Comparisons for Bio-Molecules
Novel approaches for generating and comparing flexible (non-rigid) molecular surface meshes are
developed. The mesh-generating method is fast and memory-efficient. The resulting meshes are smooth and
accurate, and possess high mesh quality. An isometric-invariant shape descriptor based on the Laplace-
Beltrami operator is then explored for mesh comparing. The new shape descriptor is more powerful in discriminating
different surface shapes but rely only on a small set of signature values. The shape descriptor is
applied to shape comparison between molecules with deformed structures. The proposed methods are implemented
into a program that can be used as a stand-alone software tool or as a plug-in to other existing
molecular modeling tools. Particularly, the code is encapsulated into a software toolkit with a user-friendly
graphical interface developed by the authors