21 research outputs found

    Korean Language Resources for Everyone

    Get PDF

    Extrinsic Factors Affecting the Accuracy of Biomedical NER

    Full text link
    Biomedical named entity recognition (NER) is a critial task that aims to identify structured information in clinical text, which is often replete with complex, technical terms and a high degree of variability. Accurate and reliable NER can facilitate the extraction and analysis of important biomedical information, which can be used to improve downstream applications including the healthcare system. However, NER in the biomedical domain is challenging due to limited data availability, as the high expertise, time, and expenses are required to annotate its data. In this paper, by using the limited data, we explore various extrinsic factors including the corpus annotation scheme, data augmentation techniques, semi-supervised learning and Brill transformation, to improve the performance of a NER model on a clinical text dataset (i2b2 2012, \citet{sun-rumshisky-uzuner:2013}). Our experiments demonstrate that these approaches can significantly improve the model's F1 score from original 73.74 to 77.55. Our findings suggest that considering different extrinsic factors and combining these techniques is a promising approach for improving NER performance in the biomedical domain where the size of data is limited

    Recommending the Meanings of Newly Coined Words

    Get PDF
    AbstractIn this paper, we investigate how to recommend the meanings of newly coined words, such as newly coined named entities and Internet jargon. Our approach automatically chooses a document explaining a given newly coined word among candidate documents from multiple web references using Probabilistic Latent Semantic Analysis [1]. Briefly, it involves finding the topic of a document containing the newly coined word and computing the conditional probability of the topic given each candidate document. We validate our methodology with two real datasets from MySpace forums and Twitter by referencing three web services, Google, Urbandictionary, and Wikipedia, and we show that we properly recommend the meanings of a set of given newly coined words with 69.5% and 80.5% accuracies based on our three recommendations, respectively. Moreover, we compare our approach against three baselines where one references the result from each web service and our approach outperforms them

    Yet Another Format of Universal Dependencies for Korean

    Full text link
    In this study, we propose a morpheme-based scheme for Korean dependency parsing and adopt the proposed scheme to Universal Dependencies. We present the linguistic rationale that illustrates the motivation and the necessity of adopting the morpheme-based format, and develop scripts that convert between the original format used by Universal Dependencies and the proposed morpheme-based format automatically. The effectiveness of the proposed format for Korean dependency parsing is then testified by both statistical and neural models, including UDPipe and Stanza, with our carefully constructed morpheme-based word embedding for Korean. morphUD outperforms parsing results for all Korean UD treebanks, and we also present detailed error analyses.Comment: COLING2022, Poste

    Neural Automated Writing Evaluation with Corrective Feedback

    Full text link
    The utilization of technology in second language learning and teaching has become ubiquitous. For the assessment of writing specifically, automated writing evaluation (AWE) and grammatical error correction (GEC) have become immensely popular and effective methods for enhancing writing proficiency and delivering instant and individualized feedback to learners. By leveraging the power of natural language processing (NLP) and machine learning algorithms, AWE and GEC systems have been developed separately to provide language learners with automated corrective feedback and more accurate and unbiased scoring that would otherwise be subject to examiners. In this paper, we propose an integrated system for automated writing evaluation with corrective feedback as a means of bridging the gap between AWE and GEC results for second language learners. This system enables language learners to simulate the essay writing tests: a student writes and submits an essay, and the system returns the assessment of the writing along with suggested grammatical error corrections. Given that automated scoring and grammatical correction are more efficient and cost-effective than human grading, this integrated system would also alleviate the burden of manually correcting innumerable essays.Comment: Supported by the SoTL Seed Program at UB

    Trance parser model for Korean: Sejong treebank

    No full text
    <p>Trance parsing model + Embedding vector.</p> <p>See https://github.com/tarowatanabe/trance for the parser and its usage. We also provide parsing and learning scripts for the Trance parser that we used for the paper; </p> <p>1/ parsing model: ptb_train.txt.model-d100.tar.gz        </p> <p>2/ embedding vector: embedding-d100.vec.gz</p> <p>3/ trance parser parsing script: trance-parsing.sh</p> <p>4/ trance parser (batch) learning script: trance-training-batch.sh</p> <p>5/ test.txt (gold file) and test.txt.leaf is for the parser input. </p> <p>## Jungyeul Park, A Note on Constituent Parsing for Korean Using the Sejong Treebank (submitted to TALLIP). October 2017.</p> <p>See https://github.com/jungyeul/tallip-sjtree-parsing for more detail. </p

    Universal Dependencies for Korean: Hani (ver1.0)

    No full text
    <p>Universal Dependencies for Korean: Hani (ver1.0)</p> <p> </p
    corecore