Search CORE

8,208 research outputs found

SDCL: Self-Distillation Contrastive Learning for Chinese Spell Checking

Author: Qiu Xipeng
Sun Yu
Yan Hang
Zhang Xiaotian
Publication venue
Publication date: 07/11/2022
Field of study

Due to the ambiguity of homophones, Chinese Spell Checking (CSC) has widespread applications. Existing systems typically utilize BERT for text encoding. However, CSC requires the model to account for both phonetic and graphemic information. To adapt BERT to the CSC task, we propose a token-level self-distillation contrastive learning method. We employ BERT to encode both the corrupted and corresponding correct sentence. Then, we use contrastive learning loss to regularize corrupted tokens' hidden states to be closer to counterparts in the correct sentence. On three CSC datasets, we confirmed our method provides a significant improvement above baselines

arXiv.org e-Print Archive

Synergetic information bottleneck for joint multi-view and ensemble clustering

Author: Qiu Xueying
Yan Xiaoqiang
Ye Yangdong
Yu Hui
Publication venue: 'Elsevier BV'
Publication date: 09/10/2019
Field of study

Portsmouth University Research Portal (Pure)

Expression, purification and crystallization of the (3R)-hydroxyacyl-ACP dehydratase HadAB complex from Mycobacterium tuberculosis

Author: Dong Yu
Li Jun
Li Xuemei
Qiu Xiaodi
Yan Chuanqiang
Publication venue: The Authors. Published by Elsevier Inc.
Publication date: 31/10/2015
Field of study

AbstractThe (3R)-hydroxyacyl-ACP dehydratase HadAB, involved in the biosynthetic pathway for mycolic acid (MA) of Mycobacterium tuberculosis, catalyzes the third step in the fatty acid (FA) elongation cycle, which is an ideal and actual target for anti-tubercular agent. Though HadAB is predicted to be a member of the hotdog superfamily, it shares no sequence identity with typical hotdog fold isoenzyme FabZ. To characterize the significance of HadAB from the perspective of structural biology, large amount of pure HadAB complex is required for biochemical characterization and crystallization. Here, we used a unique expression and purification method. HadA and HadB were cloned separately and co-expressed in Escherichia coli. After GST affinity chromatography, two steps of anion exchange chromatography and gel filtration, the purity of the protein as estimated by SDS–PAGE was >95%. Using hanging-drop vapor-diffusion method, crystals were obtained and diffracted X-rays to 1.75Å resolution. The crystal belongs to space group P41212, with unit-cell parameters a=b=82.0Å, c=139.8Å, α=β=γ=90.0°

Elsevier - Publisher Connector

Predicting risk of preterm birth in singleton pregnancies using machine learning algorithms

Author: Hemelaar Joris
Lin Ying
Yang Xin-Jun
Yu Qiu-Yan
Zhou Yu-Run
Publication venue: Frontiers Media
Publication date: 29/02/2024
Field of study

We aimed to develop, train, and validate machine learning models for predicting preterm birth (<37 weeks' gestation) in singleton pregnancies at different gestational intervals. Models were developed based on complete data from 22,603 singleton pregnancies from a prospective population-based cohort study that was conducted in 51 midwifery clinics and hospitals in Wenzhou City of China between 2014 and 2016. We applied Catboost, Random Forest, Stacked Model, Deep Neural Networks (DNN), and Support Vector Machine (SVM) algorithms, as well as logistic regression, to conduct feature selection and predictive modeling. Feature selection was implemented based on permutation-based feature importance lists derived from the machine learning models including all features, using a balanced training data set. To develop prediction models, the top 10%, 25%, and 50% most important predictive features were selected. Prediction models were developed with the training data set with 5-fold cross-validation for internal validation. Model performance was assessed using area under the receiver operating curve (AUC) values. The CatBoost-based prediction model after 26 weeks' gestation performed best with an AUC value of 0.70 (0.67, 0.73), accuracy of 0.81, sensitivity of 0.47, and specificity of 0.83. Number of antenatal care visits before 24 weeks' gestation, aspartate aminotransferase level at registration, symphysis fundal height, maternal weight, abdominal circumference, and blood pressure emerged as strong predictors after 26 completed weeks. The application of machine learning on pregnancy surveillance data is a promising approach to predict preterm birth and we identified several modifiable antenatal predictors

Oxford University Research Archive

CMIB: unsupervised image object categorization in multiple visual contexts

Author: Manic Milos
Qiu Xueying
Yan Xiaoqiang
Ye Yangdong
Yu Hui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2020
Field of study

Portsmouth University Research Portal (Pure)