Search CORE

1,961 research outputs found

Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies

Author: Chen Wei
Guo Dake
Guo Pengcheng
Mu Bingshen
Xie Lei
Zhou Pan
Publication venue
Publication date: 15/12/2023
Field of study

Automatic Speech Recognition (ASR) has shown remarkable progress, yet it still faces challenges in real-world distant scenarios across various array topologies each with multiple recording devices. The focal point of the CHiME-7 Distant ASR task is to devise a unified system capable of generalizing various array topologies that have multiple recording devices and offering reliable recognition performance in real-world environments. Addressing this task, we introduce an ASR system that demonstrates exceptional performance across various array topologies. First of all, we propose two attention-based automatic channel selection modules to select the most advantageous subset of multi-channel signals from multiple recording devices for each utterance. Furthermore, we introduce inter-channel spatial features to augment the effectiveness of multi-frame cross-channel attention, aiding it in improving the capability of spatial information awareness. Finally, we propose a multi-layer convolution fusion module drawing inspiration from the U-Net architecture to integrate the multi-channel output into a single-channel output. Experimental results on the CHiME-7 corpus with oracle segmentation demonstrate that the improvements introduced in our proposed ASR system lead to a relative reduction of 40.1% in the Macro Diarization Attributed Word Error Rates (DA-WER) when compared to the baseline ASR system on the Eval sets.Comment: Accepted by ICASSP 202

arXiv.org e-Print Archive

Highly efficient triazine/carbazole-based host material for green phosphorescent organic light-emitting diodes with low efficiency roll-off

Author: Chen Yi
Gao Lei
Hu Mingming
Huang Jinhai
Liu Yang
Mu Haichuan
Song Wenxuan
Su Jianhua
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2017
Field of study

Two novel triazin/carbazole-based host materials were designed and synthesized, which demonstrated outstanding EL performance with maximum CE, PE and EQE of 69.3 cd A−1, 54.2 lm W−1 and 21.9%, respectively.</p

Queen's University Belfast Research Portal

Crossref

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

Author: Chen Bin
Chen Liwei
Gai Kun
Huang Quzhe
Jin Yang
Lei Chenyi
Lei Xiaoqiang
Liao Chao
Liu An
Mu Yadong
Ou Wenwu
Song Chengru
Tan Jianchao
Xu Kun
Xu Kun
Zhang Di
Publication venue
Publication date: 29/09/2023
Field of study

Recently, the remarkable advance of the Large Language Model (LLM) has inspired researchers to transfer its extraordinary reasoning capability to both vision and language data. However, the prevailing approaches primarily regard the visual input as a prompt and focus exclusively on optimizing the text generation process conditioned upon vision content by a frozen LLM. Such an inequitable treatment of vision and language heavily constrains the model's potential. In this paper, we break through this limitation by representing both vision and language in a unified form. Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read. The resulting visual tokens encompass high-level semantics worthy of a word and also support dynamic sequence length varying from the image. Coped with this tokenizer, the presented foundation model called LaVIT can handle both image and text indiscriminately under the same generative learning paradigm. This unification empowers LaVIT to serve as an impressive generalist interface to understand and generate multi-modal content simultaneously. Extensive experiments further showcase that it outperforms the existing models by a large margin on massive vision-language tasks. Our code and models will be available at https://github.com/jy0205/LaVIT

arXiv.org e-Print Archive

Identification and bioactivity evaluation of two novel temporins from the skin secretion of the European edible frog, Pelophylax kl. Esculentus

Author: Chen Tianbao
Chen Xiaole
Shaw Chris
Wang He
Wang Lei
Yang Mu
Zhou Mei
Publication venue: 'Elsevier BV'
Publication date: 05/08/2016
Field of study

Queen's University Belfast Research Portal