20 research outputs found

    Korean Language Resources for Everyone

    Get PDF

    Extrinsic Factors Affecting the Accuracy of Biomedical NER

    Full text link
    Biomedical named entity recognition (NER) is a critial task that aims to identify structured information in clinical text, which is often replete with complex, technical terms and a high degree of variability. Accurate and reliable NER can facilitate the extraction and analysis of important biomedical information, which can be used to improve downstream applications including the healthcare system. However, NER in the biomedical domain is challenging due to limited data availability, as the high expertise, time, and expenses are required to annotate its data. In this paper, by using the limited data, we explore various extrinsic factors including the corpus annotation scheme, data augmentation techniques, semi-supervised learning and Brill transformation, to improve the performance of a NER model on a clinical text dataset (i2b2 2012, \citet{sun-rumshisky-uzuner:2013}). Our experiments demonstrate that these approaches can significantly improve the model's F1 score from original 73.74 to 77.55. Our findings suggest that considering different extrinsic factors and combining these techniques is a promising approach for improving NER performance in the biomedical domain where the size of data is limited

    Recommending the Meanings of Newly Coined Words

    Get PDF
    AbstractIn this paper, we investigate how to recommend the meanings of newly coined words, such as newly coined named entities and Internet jargon. Our approach automatically chooses a document explaining a given newly coined word among candidate documents from multiple web references using Probabilistic Latent Semantic Analysis [1]. Briefly, it involves finding the topic of a document containing the newly coined word and computing the conditional probability of the topic given each candidate document. We validate our methodology with two real datasets from MySpace forums and Twitter by referencing three web services, Google, Urbandictionary, and Wikipedia, and we show that we properly recommend the meanings of a set of given newly coined words with 69.5% and 80.5% accuracies based on our three recommendations, respectively. Moreover, we compare our approach against three baselines where one references the result from each web service and our approach outperforms them

    Yet Another Format of Universal Dependencies for Korean

    Full text link
    In this study, we propose a morpheme-based scheme for Korean dependency parsing and adopt the proposed scheme to Universal Dependencies. We present the linguistic rationale that illustrates the motivation and the necessity of adopting the morpheme-based format, and develop scripts that convert between the original format used by Universal Dependencies and the proposed morpheme-based format automatically. The effectiveness of the proposed format for Korean dependency parsing is then testified by both statistical and neural models, including UDPipe and Stanza, with our carefully constructed morpheme-based word embedding for Korean. morphUD outperforms parsing results for all Korean UD treebanks, and we also present detailed error analyses.Comment: COLING2022, Poste

    Trance parser model for Korean: Sejong treebank

    No full text
    <p>Trance parsing model + Embedding vector.</p> <p>See https://github.com/tarowatanabe/trance for the parser and its usage. We also provide parsing and learning scripts for the Trance parser that we used for the paper; </p> <p>1/ parsing model: ptb_train.txt.model-d100.tar.gz        </p> <p>2/ embedding vector: embedding-d100.vec.gz</p> <p>3/ trance parser parsing script: trance-parsing.sh</p> <p>4/ trance parser (batch) learning script: trance-training-batch.sh</p> <p>5/ test.txt (gold file) and test.txt.leaf is for the parser input. </p> <p>## Jungyeul Park, A Note on Constituent Parsing for Korean Using the Sejong Treebank (submitted to TALLIP). October 2017.</p> <p>See https://github.com/jungyeul/tallip-sjtree-parsing for more detail. </p

    Universal Dependencies for Korean: Hani (ver1.0)

    No full text
    <p>Universal Dependencies for Korean: Hani (ver1.0)</p> <p> </p

    MaltParser model for Korean: Sejong treebank

    No full text
    <p>MaltParser model for Korean:  Sejong treebank</p> <p>Jungyeul Park, Jeen-Pyo Hong, and Jeong-Won Cha (2016). Korean Language Resources for Everyone. In Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation (PACLIC 30). Seoul, Korea. [pdf]</p> <p>@inproceedings{park-hong-cha:2016:PACLIC, <br> address = {Seoul, Korea}, <br> author = {Park, Jungyeul and Hong, Jeen-Pyo and Cha, Jeong-Won}, <br> booktitle = {Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation (PACLIC 30)}, <br> pages = {49--58}, <br> title = {{Korean Language Resources for Everyone}}, <br> year = {2016} <br> }</p> <p>It requires Espresso's POS tagging results for input. Espresso is available at https://zenodo.org/record/884606 </p
    corecore