Search CORE

560 research outputs found

Revisiting Pre-Trained Models for Chinese Natural Language Processing

Author: Che Wanxiang
Cui Yiming
Hu Guoping
Liu Ting
Qin Bing
Wang Shijin
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we target on revisiting Chinese pre-trained language models to examine their effectiveness in a non-English language and release the Chinese pre-trained language model series to the community. We also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways, especially the masking strategy that adopts MLM as correction (Mac). We carried out extensive experiments on eight Chinese NLP tasks to revisit the existing pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. Resources available: https://github.com/ymcui/MacBERTComment: 12 pages, to appear at Findings of EMNLP 202

arXiv.org e-Print Archive

Crossref

THE INFLUENCE OF CORPORATE CULTURE INNOVATION ON ENTERPRISE MANAGEMENT INNOVATION BASED ON PSYCHOLOGY

Author: Hu Weixuan
Jiang Guoping
Shao Yijin
Publication venue
Publication date: 01/01/2022
Field of study

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Distinct microbial processes and functions of maize stalk- and fertilizer-N in arable soil

Author: He Hongbo
Hu Guoping
Zhang Xudong
Publication venue: Berichte aus dem Julius Kühn-Institut
Publication date: 05/09/2017
Field of study

JKI Open Journal Systems (Julius Kühn-Institut)

Transcribing Content from Structural Images with Spotlight Mechanism

Author: Chen Enhong
Hu Guoping
Huang Zhenya
Liu Qi
Xie Xing
Yin Yu
Zhang Fuzheng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/05/2019
Field of study

Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., structured symbols), which often follow a fine-grained grammar. To this end, in this paper, we propose a hierarchical Spotlight Transcribing Network (STN) framework followed by a two-stage "where-to-what" solution. Specifically, we first decide "where-to-look" through a novel spotlight mechanism to focus on different areas of the original image following its structure. Then, we decide "what-to-write" by developing a GRU based network with the spotlight areas for transcribing the content accordingly. Moreover, we propose two implementations on the basis of STN, i.e., STNM and STNR, where the spotlight movement follows the Markov property and Recurrent modeling, respectively. We also design a reinforcement method to refine the framework by self-improving the spotlight mechanism. We conduct extensive experiments on many structural image datasets, where the results clearly demonstrate the effectiveness of STN framework.Comment: Accepted by KDD2018 Research Track. In proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'18

arXiv.org e-Print Archive

Crossref

Simultaneous profiling of transcriptome and DNA methylome from a single cell.

Author: An Qin
Du Guizhen
Fan Guoping
Hu Ganlu
Hu Youjin
Huang Kevin
Wang Cun-Yu
Xue Jinfeng
Xue Zhigang
Zhu Xianmin
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

BackgroundSingle-cell transcriptome and single-cell methylome technologies have become powerful tools to study RNA and DNA methylation profiles of single cells at a genome-wide scale. A major challenge has been to understand the direct correlation of DNA methylation and gene expression within single-cells. Due to large cell-to-cell variability and the lack of direct measurements of transcriptome and methylome of the same cell, the association is still unclear.ResultsHere, we describe a novel method (scMT-seq) that simultaneously profiles both DNA methylome and transcriptome from the same cell. In sensory neurons, we consistently identify transcriptome and methylome heterogeneity among single cells but the majority of the expression variance is not explained by proximal promoter methylation, with the exception of genes that do not contain CpG islands. By contrast, gene body methylation is positively associated with gene expression for only those genes that contain a CpG island promoter. Furthermore, using single nucleotide polymorphism patterns from our hybrid mouse model, we also find positive correlation of allelic gene body methylation with allelic expression.ConclusionsOur method can be used to detect transcriptome, methylome, and single nucleotide polymorphism information within single cells to dissect the mechanisms of epigenetic gene regulation

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

A Span-Extraction Dataset for Chinese Machine Reading Comprehension

Author: Che Wanxiang
Chen Zhipeng
Cui Yiming
Hu Guoping
Liu Ting
Ma Wentao
Wang Shijin
Xiao Li
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attention. However, the existing reading comprehension datasets are mostly in English. In this paper, we introduce a Span-Extraction dataset for Chinese machine reading comprehension to add language diversities in this area. The dataset is composed by near 20,000 real questions annotated on Wikipedia paragraphs by human experts. We also annotated a challenge set which contains the questions that need comprehensive understanding and multi-sentence inference throughout the context. We present several baseline systems as well as anonymous submissions for demonstrating the difficulties in this dataset. With the release of the dataset, we hosted the Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018). We hope the release of the dataset could further accelerate the Chinese machine reading comprehension research. Resources are available: https://github.com/ymcui/cmrc2018Comment: 6 pages, accepted as a conference paper at EMNLP-IJCNLP 2019 (short paper

arXiv.org e-Print Archive

Crossref