Search CORE

20 research outputs found

Выделение смысловых понятий в медицинских диагнозах при помощи машинного обучения

Author: Коваль Д. И.
Сушков И. В.
Тепляков А. Б.
Publication venue: Томский политехнический университет
Publication date: 01/01/2020
Field of study

Electronic archive of Tomsk Polytechnic University

NIL is not nothing: Recognition of Chinese network informal language expressions

Author: GAO Wei
GAO Wei
WONG
XIA Yunqing
Publication venue
Publication date: 01/10/2005
Field of study

Institutional Knowledge at Singapore Management University

NIL Is not nothing: Recognition of Chinese network informal language expressions

Author: GAO Wei
WONG Kam-Fai
XIA Yunqing
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2005
Field of study

Informal language is actively used in network-mediated communication, e.g. chat room, BBS, email and text message. We refer the anomalous terms used in such context as network informal language (NIL) expressions. For example, “�(ou3) ” is used to replace “�(wo3) ” in Chinese ICQ. Without unconventional resource, knowledge and techniques, the existing natural language processing approaches exhibit less effectiveness in dealing with NIL text. We propose to study NIL expressions with a NIL corpus and investigate techniques in processing NIL expressions. Two methods for Chinese NIL expressio

CiteSeerX

Institutional Knowledge at Singapore Management University

Allocation of semantic concepts in medical diagnoses by means of machine learning

Author: Коваль Д. И.
Сушков И. В.
Тепляков А. Б.
Publication venue
Publication date: 01/01/2019
Field of study

Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by using the BioNLP/NLPBA 2004 shared task

Electronic archive of Tomsk Polytechnic University

Named Entity Recognition Only from Word Embeddings

Author: Luo Ying
Zhan Junlang
Zhao Hai
Publication venue
Publication date: 01/01/2020
Field of study

Deep neural network models have helped named entity (NE) recognition achieve amazing performance without handcrafting features. However, existing systems require large amounts of human annotated training data. Efforts have been made to replace human annotations with external knowledge (e.g., NE dictionary, part-of-speech tags), while it is another challenge to obtain such effective resources. In this work, we propose a fully unsupervised NE recognition model which only needs to take informative clues from pre-trained word embeddings. We first apply Gaussian Hidden Markov Model and Deep Autoencoding Gaussian Mixture Model on word embeddings for entity span detection and type prediction, and then further design an instance selector based on reinforcement learning to distinguish positive sentences from noisy sentences and refine these coarse-grained annotations through neural networks. Extensive experiments on CoNLL benchmark datasets demonstrate that our proposed light NE recognition model achieves remarkable performance without using any annotated lexicon or corpus.Comment: Accepted by EMNLP202

arXiv.org e-Print Archive

Crossref

Text Classification of Cancer Clinical Trial Eligibility Criteria

Author: Jayaraj Soumya
Ludmir Ethan B
Roberts Kirk
Yang Yumeng
Publication venue
Publication date: 15/09/2023
Field of study

Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility is stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yields the highest average performance across all criteria.Comment: AMIA Annual Symposium Proceedings 202

arXiv.org e-Print Archive

Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Crossref

NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition

Author: Dai Hong-Jie
Hsu Wen-Lian
Hung Hsieh-Chuan
Sung Cheng-Lung
Sung Ting-Yi
Tsai Richard Tzong-Han
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Biomedical named entity recognition (Bio-NER) is a challenging problem because, in general, biomedical named entities of the same category (e.g., proteins and genes) do not follow one standard nomenclature. They have many irregularities and sometimes appear in ambiguous contexts. In recent years, machine-learning (ML) approaches have become increasingly common and now represent the cutting edge of Bio-NER technology. This paper addresses three problems faced by ML-based Bio-NER systems. First, most ML approaches usually employ singleton features that comprise one linguistic property (e.g., the current word is capitalized) and at least one class tag (e.g., B-protein, the beginning of a protein name). However, such features may be insufficient in cases where multiple properties must be considered. Adding conjunction features that contain multiple properties can be beneficial, but it would be infeasible to include all conjunction features in an NER model since memory resources are limited and some features are ineffective. To resolve the problem, we use a sequential forward search algorithm to select an effective set of features. Second, variations in the numerical parts of biomedical terms (e.g., "2" in the biomedical term IL2) cause data sparseness and generate many redundant features. In this case, we apply numerical normalization, which solves the problem by replacing all numerals in a term with one representative numeral to help classify named entities. Third, the assignment of NE tags does not depend solely on the target word's closest neighbors, but may depend on words outside the context window (e.g., a context window of five consists of the current word plus two preceding and two subsequent words). We use global patterns generated by the Smith-Waterman local alignment algorithm to identify such structures and modify the results of our ML-based tagger. This is called pattern-based post-processing. RESULTS: To develop our ML-based Bio-NER system, we employ conditional random fields, which have performed effectively in several well-known tasks, as our underlying ML model. Adding selected conjunction features, applying numerical normalization, and employing pattern-based post-processing improve the F-scores by 1.67%, 1.04%, and 0.57%, respectively. The combined increase of 3.28% yields a total score of 72.98%, which is better than the baseline system that only uses singleton features. CONCLUSION: We demonstrate the benefits of using the sequential forward search algorithm to select effective conjunction feature groups. In addition, we show that numerical normalization can effectively reduce the number of redundant and unseen features. Furthermore, the Smith-Waterman local alignment algorithm can help ML-based Bio-NER deal with difficult cases that need longer context windows

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central