Search CORE

105 research outputs found

Ant Colony Optimization Algorithm for Continuous Domains Based on Position Distribution Model of Ant Colony Foraging

Author: Jinyu Gao
Liqiang Liu
Yuntao Dai
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

Author: Ao Junyi
Dai Lirong
Li Jinyu
Liu Shujie
Wei Furu
Zhang Ziqiang
Zhou Long
Publication venue
Publication date: 07/10/2022
Field of study

The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods. In this paper, we propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representations of a speech encoder and a text decoder with a shared unit encoder. Leveraging hidden-unit as an interface to align speech and text, we can decompose the speech-to-text model into a speech-to-unit model and a unit-to-text model, which can be jointly pre-trained with unpaired speech and text data respectively. Our proposed SpeechUT is fine-tuned and evaluated on automatic speech recognition (ASR) and speech translation (ST) tasks. Experimental results show that SpeechUT gets substantial improvements over strong baselines, and achieves state-of-the-art performance on both the LibriSpeech ASR and MuST-C ST tasks. To better understand the proposed SpeechUT, detailed analyses are conducted. The code and pre-trained models are available at https://aka.ms/SpeechUT.Comment: 14 pages, accepted by EMNLP 202

arXiv.org e-Print Archive

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Author: Dai Lirong
Jiang Daxin
Jiao Binxing
Li Jinyu
Liu Shujie
Wei Furu
Zhang Jie
Zhang Ziqiang
Zhou Long
Zhu Qiushi
Publication venue
Publication date: 21/11/2022
Field of study

Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e.g., vision, text. How to design a unified framework to integrate different modal information and leverage different resources (e.g., visual-audio pairs, audio-text pairs, unlabeled speech, and unlabeled text) to facilitate speech representation learning was not well explored. In this paper, we propose a unified cross-modal representation learning framework VATLM (Visual-Audio-Text Language Model). The proposed VATLM employs a unified backbone network to model the modality-independent information and utilizes three simple modality-dependent modules to preprocess visual, speech, and text inputs. In order to integrate these three modalities into one shared semantic space, VATLM is optimized with a masked prediction task of unified tokens, given by our proposed unified tokenizer. We evaluate the pre-trained VATLM on audio-visual related downstream tasks, including audio-visual speech recognition (AVSR), visual speech recognition (VSR) tasks. Results show that the proposed VATLM outperforms previous the state-of-the-art models, such as audio-visual pre-trained AV-HuBERT model, and analysis also demonstrates that VATLM is capable of aligning different modalities into the same space. To facilitate future research, we release the code and pre-trained models at https://aka.ms/vatlm.Comment: 10 page

arXiv.org e-Print Archive

Probing the limits of optical cycling in a predissociative diatomic molecule

Author: Alexandrova Anastassia N.
Cheng Lan
Dai Jinyu
Dickerson Claire E.
Mitra Debayan
Neuhauser Daniel
Pope Isaac M.
Sun Qi
Zelevinsky Tanya
Publication venue
Publication date: 01/06/2023
Field of study

Molecular predissociation is the spontaneous, nonradiative bond breaking process that can occur upon excitation. In the context of laser cooling, predissociation is an unwanted consequence of molecular structure that limits the ability to scatter a large number of photons required to reach the ultracold regime. Unlike rovibrational branching, predissociation is irreversible since the fragments fly apart with high kinetic energy. Of particular interest is the simple diatomic molecule, CaH, for which the two lowest electronically excited states used in laser cooling lie above the dissociation threshold of the ground potential. In this work, we present measurements and calculations that quantify the predissociation probabilities affecting the cooling cycle. The results allow us to design a laser cooling scheme that will enable the creation of an ultracold and optically trapped cloud of CaH molecules. In addition, we use the results to propose a two-photon pathway to controlled dissociation of the molecules, in order to gain access to their ultracold fragments, including hydrogen.Comment: 16 pages, 4 figure

arXiv.org e-Print Archive

Experimental realization of a highly secure chaos communication under strong channel noise

Author: Blahut
Cuomo
Dachselt
Gang Hu
Guoning Tang
Huaping Lu
Jinyu Kuang
Kocarev
Kocarev
Kocarev
Nechvatal
Papadimitriou
Parker
Pecora
Perez
Qionglin Dai
Ronghuai Huang
Shannon
Shihong Wang
Short
Short
Tang
Van Wiggeren
Wang
Weiping Ye
Xiangqing Zhu
Xiao
Zhenfeng Zhao
Zhou
Publication venue: 'Elsevier BV'
Publication date: 23/02/2004
Field of study

A one-way coupled spatiotemporally chaotic map lattice is used to contruct cryptosystem. With the combinatorial applications of both chaotic computations and conventional algebraic operations, our system has optimal cryptographic properties much better than the separative applications of known chaotic and conventional methods. We have realized experiments to pratice duplex voice secure communications in realistic Wired Public Switched Telephone Network by applying our chaotic system and the system of Advanced Encryption Standard (AES), respectively, for cryptography. Our system can work stably against strong channel noise when AES fails to work.Comment: 15 pages, 5 figure

arXiv.org e-Print Archive

Crossref

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

Author: Chen Sanyuan
Dai Lirong
Gong Xun
Li Jinyu
Liu Shujie
Ren Shuo
Wei Furu
Wu Yu
Yao Zhuoyuan
Zhang Ziqiang
Zhou Long
Publication venue
Publication date: 30/09/2022
Field of study

How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation. Specifically, we introduce two alternative discrete tokenizers to bridge the speech and text modalities, including phoneme-unit and hidden-unit tokenizers, which can be trained using a small amount of paired speech-text data. Based on the trained tokenizers, we convert the unlabeled speech and text data into tokens of phoneme units or hidden units. The pre-training objective is designed to unify the speech and the text into the same discrete semantic space with a unified Transformer network. Leveraging only 10K text sentences, our SpeechLM gets a 16\% relative WER reduction over the best base model performance (from 6.8 to 5.7) on the public LibriSpeech ASR benchmark. Moreover, SpeechLM with fewer parameters even outperforms previous SOTA models on CoVoST-2 speech translation tasks. We also evaluate our SpeechLM on various spoken language processing tasks under the universal representation evaluation framework SUPERB, demonstrating significant improvements on content-related tasks. Our code and models are available at https://aka.ms/SpeechLM.Comment: 14 page

arXiv.org e-Print Archive

Chemical characteristics and source apportionment of PM<sub>2.5</sub> using PMF modelling coupled with 1-hr resolution online air pollutant dataset for Linfen, China

Author: Dai Qili
Feng Yinchang
Fu Ruichen
Gao Jinyu
Li Yafei
Liu Baoshuang
Song Congbo
Sun Xiaoyun
Tai Yonggang
Xue Zhigang
Zhang Yufen
Zheng Yajun
Publication venue: 'Elsevier BV'
Publication date: 01/08/2020
Field of study

University of Birmingham Research Portal

Ras-induced Epigenetic Inactivation of the RRAD ( Ras-related Associated with Diabetes) Gene Promotes Glucose Uptake in a Human Ovarian Cancer Model

Author: Chen Lin
Dai Wei
Li Guiling
Li Xianfeng
Liu Qi
Lv Lu
Mao Fengbiao
Sun Zhong Sheng
Tang Kai-Fu
Wang Guan
Wang Xin
Wang Yan
Wu Jinyu
Zhao Enfeng
Publication venue
Publication date: 19/03/2014
Field of study

Background: Increased glucose uptake is essential for carcinogenesis. Results: Ras(V12)-induced epigenetic inactivation of RRAD promotes glucose uptake and tumor formation. Conclusion: RRAD might act as a functional tumor suppressor by inhibiting glucose uptake. Significance: Down-regulation of RRAD in tumor tissues might be associated with the Warburg effect. RRAD (Ras-related associated with diabetes) is a small Ras-related GTPase that is frequently inactivated by DNA methylation of the CpG island in its promoter region in cancer tissues. However, the role of the methylation-induced RRAD inactivation in tumorigenesis remains unclear. In this study, the Ras-regulated transcriptome and epigenome were profiled by comparing T29H (a Ras(V12)-transformed human ovarian epithelial cell line) with T29 (an immortalized but non-transformed cell line) through reduced representation bisulfite sequencing and digital gene expression. We found that Ras(V12)-mediated oncogenic transformation was accompanied by RRAD promoter hypermethylation and a concomitant loss of RRAD expression. In addition, we found that the RRAD promoter was hypermethylated, and its transcription was reduced in ovarian cancer versus normal ovarian tissues. Treatment with the DNA methyltransferase inhibitor 5-aza-2-deoxycytidine resulted in demethylation in the RRAD promoter and restored RRAD expression in T29H cells. Additionally, treatment with farnesyltransferase inhibitor FTI277 resulted in restored RRAD expression and inhibited DNA methytransferase expression and activity in T29H cells. By employing knockdown and overexpression techniques in T29 and T29H, respectively, we found that RRAD inhibited glucose uptake and lactate production by repressing the expression of glucose transporters. Finally, RRAD overexpression in T29H cells inhibited tumor formation in nude mice, suggesting that RRAD is a tumor suppressor gene. Our results indicate that Ras(V12)-mediated oncogenic transformation induces RRAD epigenetic inactivation, which in turn promotes glucose uptake and may contribute to ovarian cancer tumorigenesis

Crossref

Institute of Psychology,Chinese Academy Of Sciences

PubMed Central

Institutional Repository of Institute of Psychology, Chinese Academy of Sciences

Effect of megarectum on postoperative defecation of female patients with congenital rectovestibular fistula or rectoperineal fistula

Author: Chunxiang Liu
Daqing Sun
Jian Li
Jinyu Dai
Lushun Ma
Song Wang
Xiaobing Sun
Xiaoxia Wu
Yong Liu
Zhiwei Yao
Publication venue: 'Frontiers Media SA'
Publication date: 01/03/2023
Field of study

BackgroundTo assess the effect of megarectum on postoperative defecation of female patients with congenital rectovestibular fistula or rectoperineal fistula.MethodsFrom March 2013 to February 2021, 74 female patients with congenital rectovestibular fistula or rectoperineal fistula were treated. The age of patients ranged from 3 months to 1 year. Barium enema and spinal cord MRI were performed in all children. 4 patients were removed from the study because of spinal cord and sacral agenesis. Finally, 70 patients underwent one-stage anterior sagittal anorectoplasty (ASARP). Anal endoscopy and anorectal pressure measurement were performed 1 year after surgery. All patients were divided into two groups depending on the presence of megarectum (+) and (−) and observed for constipation and anal sphincter function.Results16 patients (4 months to 1 year) were complicated with megarectum, and 5 patients (3 months to 9 months) were without megarectum. The incision infection was seen in 3 patients. All patients were followed up for 1 year to 5 years. Fecal soiling was seen in 2 patients and constipation in 14 patients. Among 16 patients with megarectum, soiling was seen in 1 patient and the constipation in 12 patients. Among 54 patients without megarectum, soiling was seen in 1 patient and constipation in 2 patients. There was a significant difference in the incidence of postoperative constipation between the two groups (megarectum (+) 75% vs. megarectum (−) 3.7% (P < 0.05)). However, there was no significant difference in the score of anal sphincters between the two groups (P < 0.05). And there was no significant difference in anal resting pressure (P = 0.49) and length of anal high pressure area (P = 0.76). 7 patients with constipation and megarectum acquired normal anal function after the dilated rectum was resected.ConclusionMegarectum increases the possibility of difficult postoperative defecation in the patients with congenital rectovestibular fistula or rectoperineal fistula. However, constipation was not associated with ASARP postoperative effects on sphincter function. Resection of megarectum is helpful to the improvement of constipation

Directory of Open Access Journals

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

Author: Ao Junyi
Dai Lirong
Ko Tom
Li Haizhou
Li Jinyu
Liu Shujie
Qian Yao
Wei Furu
Zhang Ziqiang
Zhou Long
Publication venue
Publication date: 31/03/2022
Field of study

This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-decoder based automatic speech recognition (ASR). Within a multi-task learning framework, we introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes, derived from an offline clustering model. One is to predict the pseudo codes via masked language modeling in encoder output, like HuBERT model, while the other lets the decoder learn to reconstruct pseudo codes autoregressively instead of generating textual scripts. In this way, the decoder learns to reconstruct original speech information with codes before learning to generate correct text. Comprehensive experiments on the LibriSpeech corpus show that the proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over the method without decoder pre-training, and also outperforms significantly the state-of-the-art wav2vec 2.0 and HuBERT on fine-tuning subsets of 10h and 100h.Comment: Submitted to INTERSPEECH 202

arXiv.org e-Print Archive