897 research outputs found

    Revisiting Pre-Trained Models for Chinese Natural Language Processing

    Full text link
    Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we target on revisiting Chinese pre-trained language models to examine their effectiveness in a non-English language and release the Chinese pre-trained language model series to the community. We also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways, especially the masking strategy that adopts MLM as correction (Mac). We carried out extensive experiments on eight Chinese NLP tasks to revisit the existing pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. Resources available: https://github.com/ymcui/MacBERTComment: 12 pages, to appear at Findings of EMNLP 202

    Expressing a cytosolic pyruvate dehydrogenase complex to increase free fatty acid production in Saccharomyces cerevisiae

    Get PDF
    Background: Saccharomyces cerevisiae is being exploited as a cell factory to produce fatty acids and their derivatives as biofuels. Previous studies found that both precursor supply and fatty acid metabolism deregulation are essential for enhanced fatty acid synthesis. A bacterial pyruvate dehydrogenase (PDH) complex expressed in the yeast cytosol was reported to enable production of cytosolic acetyl-CoA with lower energy cost and no toxic intermediate. Results: Overexpression of the PDH complex significantly increased cell growth, ethanol consumption and reduced glycerol accumulation. Furthermore, to optimize the redox imbalance in production of fatty acids from glucose, two endogenous NAD+-dependent glycerol-3-phosphate dehydrogenases were deleted, and a heterologous NADP+-dependent glyceraldehyde-3-phosphate dehydrogenase was introduced. The best fatty acid producing strain PDH7 with engineering of precursor and co-factor metabolism could produce 840.5\ua0mg/L free fatty acids (FFAs) in shake flask, which was 83.2% higher than the control strain YJZ08. Profile analysis of free fatty acid suggested the cytosolic PDH complex mainly resulted in the increases of unsaturated fatty acids (C16:1 and C18:1). Conclusions: We demonstrated that cytosolic PDH pathway enabled more efficient acetyl-CoA provision with the lower ATP cost, and improved FFA production. Together with engineering of the redox factor rebalance, the cytosolic PDH pathway could achieve high level of FFA production at similar levels of other best acetyl-CoA producing pathways

    Cross-Lingual Machine Reading Comprehension

    Full text link
    Though the community has made great progress on Machine Reading Comprehension (MRC) task, most of the previous works are solving English-based MRC problems, and there are few efforts on other languages mainly due to the lack of large-scale training data. In this paper, we propose Cross-Lingual Machine Reading Comprehension (CLMRC) task for the languages other than English. Firstly, we present several back-translation approaches for CLMRC task, which is straightforward to adopt. However, to accurately align the answer into another language is difficult and could introduce additional noise. In this context, we propose a novel model called Dual BERT, which takes advantage of the large-scale training data provided by rich-resource language (such as English) and learn the semantic relations between the passage and question in a bilingual context, and then utilize the learned knowledge to improve reading comprehension performance of low-resource language. We conduct experiments on two Chinese machine reading comprehension datasets CMRC 2018 and DRCD. The results show consistent and significant improvements over various state-of-the-art systems by a large margin, which demonstrate the potentials in CLMRC task. Resources available: https://github.com/ymcui/Cross-Lingual-MRCComment: 10 pages, accepted as a conference paper at EMNLP-IJCNLP 2019 (long paper

    NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction

    Full text link
    Recent methods for neural surface representation and rendering, for example NeuS, have demonstrated remarkably high-quality reconstruction of static scenes. However, the training of NeuS takes an extremely long time (8 hours), which makes it almost impossible to apply them to dynamic scenes with thousands of frames. We propose a fast neural surface reconstruction approach, called NeuS2, which achieves two orders of magnitude improvement in terms of acceleration without compromising reconstruction quality. To accelerate the training process, we integrate multi-resolution hash encodings into a neural surface representation and implement our whole algorithm in CUDA. We also present a lightweight calculation of second-order derivatives tailored to our networks (i.e., ReLU-based MLPs), which achieves a factor two speed up. To further stabilize training, a progressive learning strategy is proposed to optimize multi-resolution hash encodings from coarse to fine. In addition, we extend our method for reconstructing dynamic scenes with an incremental training strategy. Our experiments on various datasets demonstrate that NeuS2 significantly outperforms the state-of-the-arts in both surface reconstruction accuracy and training speed. The video is available at https://vcai.mpi-inf.mpg.de/projects/NeuS2/

    Panoptic Scene Graph Generation with Semantics-prototype Learning

    Full text link
    Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language and visual scenes. However, different language preferences of annotators and semantic overlaps between predicates lead to biased predicate annotations in the dataset, i.e. different predicates for same object pairs. Biased predicate annotations make PSG models struggle in constructing a clear decision plane among predicates, which greatly hinders the real application of PSG models. To address the intrinsic bias above, we propose a novel framework named ADTrans to adaptively transfer biased predicate annotations to informative and unified ones. To promise consistency and accuracy during the transfer process, we propose to measure the invariance of representations in each predicate class, and learn unbiased prototypes of predicates with different intensities. Meanwhile, we continuously measure the distribution changes between each presentation and its prototype, and constantly screen potential biased data. Finally, with the unbiased predicate-prototype representation embedding space, biased annotations are easily identified. Experiments show that ADTrans significantly improves the performance of benchmark models, achieving a new state-of-the-art performance, and shows great generalization and effectiveness on multiple datasets

    Forest emissions reduction assessment from airborne LiDAR data using multiple machine learning approaches

    Get PDF
    Objective: This study aims to evaluate the accuracy of different modeling methods and tree structural parameters extracted from airborne LiDAR for estimating carbon emissions reduction and assess their reliability as Certified Emission Reduction (CER) assessment techniques.Methods: LiDAR data was collected from an afforestation project in Beijing, China. Various modeling methods, including statistical regression and machine learning algorithms, were used to estimate biomass and carbon emissions reduction. The models were evaluated under two schemes: tree-species-specific modeling scheme (Scheme 1) and all-sample modeling scheme (Scheme 2) using cross-validation and compared with ground-based estimations and pre-estimated emission reductions.Results: Totally, the biomass estimation models in scheme 1 showed better accuracy than scheme 2. In scheme 1, The Random Forest (RF) and Cubist models achieved the highest prediction accuracy (R2 = 0.89, RMSE = 22.87 kg, CV RMSE = 52.00 kg), followed by GDBT and Cubist, with SVR and GAM performing the weakest. In scheme 2, Cubist model had the highest accuracy (R2 = 0.75, RMSE = 33.95 kg, CV RMSE = 36.05 kg), followed by RF and GBDT, with SVR and GAM performing the weakest. LiDAR-based estimates of carbon emissions reduction were closer to ground-based estimations and higher than pre-estimated values.Conclusion: This study demonstrates that LiDAR-based models using tree structural parameters can accurately assess carbon emissions reduction. The models outperformed traditional methods in terms of cost and accuracy. Considering tree species in the modeling process improved the accuracy of the models. LiDAR technology has the potential to be a reliable assessment technique for carbon emissions reduction in forestry projects. The pre-trained models can be used for multiple predictions, reducing the cost of carbon sink surveys. Overall, LiDAR-based models provide a promising approach for assessing carbon emissions reduction and can contribute to mitigating climate change
    • …
    corecore