8,208 research outputs found

    SDCL: Self-Distillation Contrastive Learning for Chinese Spell Checking

    Full text link
    Due to the ambiguity of homophones, Chinese Spell Checking (CSC) has widespread applications. Existing systems typically utilize BERT for text encoding. However, CSC requires the model to account for both phonetic and graphemic information. To adapt BERT to the CSC task, we propose a token-level self-distillation contrastive learning method. We employ BERT to encode both the corrupted and corresponding correct sentence. Then, we use contrastive learning loss to regularize corrupted tokens' hidden states to be closer to counterparts in the correct sentence. On three CSC datasets, we confirmed our method provides a significant improvement above baselines

    Expression, purification and crystallization of the (3R)-hydroxyacyl-ACP dehydratase HadAB complex from Mycobacterium tuberculosis

    Get PDF
    AbstractThe (3R)-hydroxyacyl-ACP dehydratase HadAB, involved in the biosynthetic pathway for mycolic acid (MA) of Mycobacterium tuberculosis, catalyzes the third step in the fatty acid (FA) elongation cycle, which is an ideal and actual target for anti-tubercular agent. Though HadAB is predicted to be a member of the hotdog superfamily, it shares no sequence identity with typical hotdog fold isoenzyme FabZ. To characterize the significance of HadAB from the perspective of structural biology, large amount of pure HadAB complex is required for biochemical characterization and crystallization. Here, we used a unique expression and purification method. HadA and HadB were cloned separately and co-expressed in Escherichia coli. After GST affinity chromatography, two steps of anion exchange chromatography and gel filtration, the purity of the protein as estimated by SDS–PAGE was >95%. Using hanging-drop vapor-diffusion method, crystals were obtained and diffracted X-rays to 1.75Å resolution. The crystal belongs to space group P41212, with unit-cell parameters a=b=82.0Å, c=139.8Å, α=β=γ=90.0°

    Predicting risk of preterm birth in singleton pregnancies using machine learning algorithms

    Get PDF
    We aimed to develop, train, and validate machine learning models for predicting preterm birth (<37 weeks' gestation) in singleton pregnancies at different gestational intervals. Models were developed based on complete data from 22,603 singleton pregnancies from a prospective population-based cohort study that was conducted in 51 midwifery clinics and hospitals in Wenzhou City of China between 2014 and 2016. We applied Catboost, Random Forest, Stacked Model, Deep Neural Networks (DNN), and Support Vector Machine (SVM) algorithms, as well as logistic regression, to conduct feature selection and predictive modeling. Feature selection was implemented based on permutation-based feature importance lists derived from the machine learning models including all features, using a balanced training data set. To develop prediction models, the top 10%, 25%, and 50% most important predictive features were selected. Prediction models were developed with the training data set with 5-fold cross-validation for internal validation. Model performance was assessed using area under the receiver operating curve (AUC) values. The CatBoost-based prediction model after 26 weeks' gestation performed best with an AUC value of 0.70 (0.67, 0.73), accuracy of 0.81, sensitivity of 0.47, and specificity of 0.83. Number of antenatal care visits before 24 weeks' gestation, aspartate aminotransferase level at registration, symphysis fundal height, maternal weight, abdominal circumference, and blood pressure emerged as strong predictors after 26 completed weeks. The application of machine learning on pregnancy surveillance data is a promising approach to predict preterm birth and we identified several modifiable antenatal predictors

    CMIB: unsupervised image object categorization in multiple visual contexts

    Get PDF
    • …
    corecore