3 research outputs found
Conceptualized Representation Learning for Chinese Biomedical Text Mining
Biomedical text mining is becoming increasingly important as the number of
biomedical documents and web data rapidly grows. Recently, word representation
models such as BERT has gained popularity among researchers. However, it is
difficult to estimate their performance on datasets containing biomedical texts
as the word distributions of general and biomedical corpora are quite
different. Moreover, the medical domain has long-tail concepts and
terminologies that are difficult to be learned via language models. For the
Chinese biomedical text, it is more difficult due to its complex structure and
the variety of phrase combinations. In this paper, we investigate how the
recently introduced pre-trained language model BERT can be adapted for Chinese
biomedical corpora and propose a novel conceptualized representation learning
approach. We also release a new Chinese Biomedical Language Understanding
Evaluation benchmark (\textbf{ChineseBLUE}). We examine the effectiveness of
Chinese pre-trained models: BERT, BERT-wwm, RoBERTa, and our approach.
Experimental results on the benchmark show that our approach could bring
significant gain. We release the pre-trained model on GitHub:
https://github.com/alibaba-research/ChineseBLUE.Comment: WSDM2020 Health Da
Learning from Very Few Samples: A Survey
Few sample learning (FSL) is significant and challenging in the field of
machine learning. The capability of learning and generalizing from very few
samples successfully is a noticeable demarcation separating artificial
intelligence and human intelligence since humans can readily establish their
cognition to novelty from just a single or a handful of examples whereas
machine learning algorithms typically entail hundreds or thousands of
supervised samples to guarantee generalization ability. Despite the long
history dated back to the early 2000s and the widespread attention in recent
years with booming deep learning technologies, little surveys or reviews for
FSL are available until now. In this context, we extensively review 300+ papers
of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive
survey for FSL. In this survey, we review the evolution history as well as the
current progress on FSL, categorize FSL approaches into the generative model
based and discriminative model based kinds in principle, and emphasize
particularly on the meta learning based FSL approaches. We also summarize
several recently emerging extensional topics of FSL and review the latest
advances on these topics. Furthermore, we highlight the important FSL
applications covering many research hotspots in computer vision, natural
language processing, audio and speech, reinforcement learning and robotic, data
analysis, etc. Finally, we conclude the survey with a discussion on promising
trends in the hope of providing guidance and insights to follow-up researches.Comment: 30 page