1,010 research outputs found
Chinese Named Entity Recognition Method for Domain-Specific Text
The Chinese named entity recognition (NER) is a critical task in natural language processing, aiming at identifying and classifying named entities in text. However, the specificity of domain texts and the lack of large-scale labelled datasets have led to the poor performance of NER methods trained on public domain corpora on domain texts. In this paper, a named entity recognition method incorporating sentence semantic information is proposed, mainly by adaptively incorporating sentence semantic information into character semantic information through an attention mechanism and a gating mechanism to enhance entity feature representation while attenuating the noise generated by irrelevant character information. In addition, to address the lack of large-scale labelled samples, we used data self-augmentation methods to expand the training samples. Furthermore, we introduced a Weighted Strategy considering that the low-quality samples generated by the data self-augmentation process can have a negative impact on the model. Experiments on the TCM prescriptions corpus showed that the F1 values of our method outperformed the comparison methods
Fast and Accurate Recognition of Chinese Clinical Named Entities with Residual Dilated Convolutions
Clinical Named Entity Recognition (CNER) aims to identify and classify
clinical terms such as diseases, symptoms, treatments, exams, and body parts in
electronic health records, which is a fundamental and crucial task for clinical
and translation research. In recent years, deep learning methods have achieved
significant success in CNER tasks. However, these methods depend greatly on
Recurrent Neural Networks (RNNs), which maintain a vector of hidden activations
that are propagated through time, thus causing too much time to train models.
In this paper, we propose a Residual Dilated Convolutional Neural Network with
Conditional Random Field (RD-CNN-CRF) to solve it. Specifically, Chinese
characters and dictionary features are first projected into dense vector
representations, then they are fed into the residual dilated convolutional
neural network to capture contextual features. Finally, a conditional random
field is employed to capture dependencies between neighboring tags.
Computational results on the CCKS-2017 Task 2 benchmark dataset show that our
proposed RD-CNN-CRF method competes favorably with state-of-the-art RNN-based
methods both in terms of computational performance and training time.Comment: 8 pages, 3 figures. Accepted as regular paper by 2018 IEEE
International Conference on Bioinformatics and Biomedicine. arXiv admin note:
text overlap with arXiv:1804.0501
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
- …