901 research outputs found
Revisiting Pre-Trained Models for Chinese Natural Language Processing
Bidirectional Encoder Representations from Transformers (BERT) has shown
marvelous improvements across various NLP tasks, and consecutive variants have
been proposed to further improve the performance of the pre-trained language
models. In this paper, we target on revisiting Chinese pre-trained language
models to examine their effectiveness in a non-English language and release the
Chinese pre-trained language model series to the community. We also propose a
simple but effective model called MacBERT, which improves upon RoBERTa in
several ways, especially the masking strategy that adopts MLM as correction
(Mac). We carried out extensive experiments on eight Chinese NLP tasks to
revisit the existing pre-trained language models as well as the proposed
MacBERT. Experimental results show that MacBERT could achieve state-of-the-art
performances on many NLP tasks, and we also ablate details with several
findings that may help future research. Resources available:
https://github.com/ymcui/MacBERTComment: 12 pages, to appear at Findings of EMNLP 202
Expressing a cytosolic pyruvate dehydrogenase complex to increase free fatty acid production in Saccharomyces cerevisiae
Background: Saccharomyces cerevisiae is being exploited as a cell factory to produce fatty acids and their derivatives as biofuels. Previous studies found that both precursor supply and fatty acid metabolism deregulation are essential for enhanced fatty acid synthesis. A bacterial pyruvate dehydrogenase (PDH) complex expressed in the yeast cytosol was reported to enable production of cytosolic acetyl-CoA with lower energy cost and no toxic intermediate. Results: Overexpression of the PDH complex significantly increased cell growth, ethanol consumption and reduced glycerol accumulation. Furthermore, to optimize the redox imbalance in production of fatty acids from glucose, two endogenous NAD+-dependent glycerol-3-phosphate dehydrogenases were deleted, and a heterologous NADP+-dependent glyceraldehyde-3-phosphate dehydrogenase was introduced. The best fatty acid producing strain PDH7 with engineering of precursor and co-factor metabolism could produce 840.5\ua0mg/L free fatty acids (FFAs) in shake flask, which was 83.2% higher than the control strain YJZ08. Profile analysis of free fatty acid suggested the cytosolic PDH complex mainly resulted in the increases of unsaturated fatty acids (C16:1 and C18:1). Conclusions: We demonstrated that cytosolic PDH pathway enabled more efficient acetyl-CoA provision with the lower ATP cost, and improved FFA production. Together with engineering of the redox factor rebalance, the cytosolic PDH pathway could achieve high level of FFA production at similar levels of other best acetyl-CoA producing pathways
Cross-Lingual Machine Reading Comprehension
Though the community has made great progress on Machine Reading Comprehension
(MRC) task, most of the previous works are solving English-based MRC problems,
and there are few efforts on other languages mainly due to the lack of
large-scale training data. In this paper, we propose Cross-Lingual Machine
Reading Comprehension (CLMRC) task for the languages other than English.
Firstly, we present several back-translation approaches for CLMRC task, which
is straightforward to adopt. However, to accurately align the answer into
another language is difficult and could introduce additional noise. In this
context, we propose a novel model called Dual BERT, which takes advantage of
the large-scale training data provided by rich-resource language (such as
English) and learn the semantic relations between the passage and question in a
bilingual context, and then utilize the learned knowledge to improve reading
comprehension performance of low-resource language. We conduct experiments on
two Chinese machine reading comprehension datasets CMRC 2018 and DRCD. The
results show consistent and significant improvements over various
state-of-the-art systems by a large margin, which demonstrate the potentials in
CLMRC task. Resources available: https://github.com/ymcui/Cross-Lingual-MRCComment: 10 pages, accepted as a conference paper at EMNLP-IJCNLP 2019 (long
paper
NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction
Recent methods for neural surface representation and rendering, for example
NeuS, have demonstrated remarkably high-quality reconstruction of static
scenes. However, the training of NeuS takes an extremely long time (8 hours),
which makes it almost impossible to apply them to dynamic scenes with thousands
of frames. We propose a fast neural surface reconstruction approach, called
NeuS2, which achieves two orders of magnitude improvement in terms of
acceleration without compromising reconstruction quality. To accelerate the
training process, we integrate multi-resolution hash encodings into a neural
surface representation and implement our whole algorithm in CUDA. We also
present a lightweight calculation of second-order derivatives tailored to our
networks (i.e., ReLU-based MLPs), which achieves a factor two speed up. To
further stabilize training, a progressive learning strategy is proposed to
optimize multi-resolution hash encodings from coarse to fine. In addition, we
extend our method for reconstructing dynamic scenes with an incremental
training strategy. Our experiments on various datasets demonstrate that NeuS2
significantly outperforms the state-of-the-arts in both surface reconstruction
accuracy and training speed. The video is available at
https://vcai.mpi-inf.mpg.de/projects/NeuS2/
Panoptic Scene Graph Generation with Semantics-prototype Learning
Panoptic Scene Graph Generation (PSG) parses objects and predicts their
relationships (predicate) to connect human language and visual scenes. However,
different language preferences of annotators and semantic overlaps between
predicates lead to biased predicate annotations in the dataset, i.e. different
predicates for same object pairs. Biased predicate annotations make PSG models
struggle in constructing a clear decision plane among predicates, which greatly
hinders the real application of PSG models. To address the intrinsic bias
above, we propose a novel framework named ADTrans to adaptively transfer biased
predicate annotations to informative and unified ones. To promise consistency
and accuracy during the transfer process, we propose to measure the invariance
of representations in each predicate class, and learn unbiased prototypes of
predicates with different intensities. Meanwhile, we continuously measure the
distribution changes between each presentation and its prototype, and
constantly screen potential biased data. Finally, with the unbiased
predicate-prototype representation embedding space, biased annotations are
easily identified. Experiments show that ADTrans significantly improves the
performance of benchmark models, achieving a new state-of-the-art performance,
and shows great generalization and effectiveness on multiple datasets
Forest emissions reduction assessment from airborne LiDAR data using multiple machine learning approaches
Objective: This study aims to evaluate the accuracy of different modeling methods and tree structural parameters extracted from airborne LiDAR for estimating carbon emissions reduction and assess their reliability as Certified Emission Reduction (CER) assessment techniques.Methods: LiDAR data was collected from an afforestation project in Beijing, China. Various modeling methods, including statistical regression and machine learning algorithms, were used to estimate biomass and carbon emissions reduction. The models were evaluated under two schemes: tree-species-specific modeling scheme (Scheme 1) and all-sample modeling scheme (Scheme 2) using cross-validation and compared with ground-based estimations and pre-estimated emission reductions.Results: Totally, the biomass estimation models in scheme 1 showed better accuracy than scheme 2. In scheme 1, The Random Forest (RF) and Cubist models achieved the highest prediction accuracy (R2 = 0.89, RMSE = 22.87Â kg, CV RMSE = 52.00Â kg), followed by GDBT and Cubist, with SVR and GAM performing the weakest. In scheme 2, Cubist model had the highest accuracy (R2 = 0.75, RMSE = 33.95Â kg, CV RMSE = 36.05Â kg), followed by RF and GBDT, with SVR and GAM performing the weakest. LiDAR-based estimates of carbon emissions reduction were closer to ground-based estimations and higher than pre-estimated values.Conclusion: This study demonstrates that LiDAR-based models using tree structural parameters can accurately assess carbon emissions reduction. The models outperformed traditional methods in terms of cost and accuracy. Considering tree species in the modeling process improved the accuracy of the models. LiDAR technology has the potential to be a reliable assessment technique for carbon emissions reduction in forestry projects. The pre-trained models can be used for multiple predictions, reducing the cost of carbon sink surveys. Overall, LiDAR-based models provide a promising approach for assessing carbon emissions reduction and can contribute to mitigating climate change
Recommended from our members
Contour error compensation based on feed rate adjustment
To improve the performance of computer numerical control (CNC) machining, especially for large-curvature trajectories, this paper presents a contour error compensation algorithm based on reference trajectory modification. In order to estimate the contour error accurately and efficiently, a contour error estimation model is established. The reference trajectory is modified on the basis of the estimated contour error and partitioned into different segments, which adopt different feed rates according to a corner detection algorithm. The effectiveness of this contour error compensation algorithm is verified by experiments on a CNC machine tool
- …