6,411 research outputs found

    Building a semantically annotated corpus of clinical texts

    Get PDF
    In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

    Analysing Lexical Semantic Change with Contextualised Word Representations

    Get PDF
    This paper presents the first unsupervised approach to lexical semantic change that makes use of contextualised word representations. We propose a novel method that exploits the BERT neural language model to obtain representations of word usages, clusters these representations into usage types, and measures change along time with three proposed metrics. We create a new evaluation dataset and show that the model representations and the detected semantic shifts are positively correlated with human judgements. Our extensive qualitative analysis demonstrates that our method captures a variety of synchronic and diachronic linguistic phenomena. We expect our work to inspire further research in this direction.Comment: To appear in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL-2020

    Towards a balanced named entity corpus for Dutch

    Get PDF

    The impact of machine translation error types on post-editing effort indicators

    Get PDF
    In this paper, we report on a post-editing study for general text types from English into Dutch conducted with master's students of translation. We used a fine-grained machine translation (MT) quality assessment method with error weights that correspond to severity levels and are related to cognitive load. Linear mixed effects models are applied to analyze the impact of MT quality on potential post-editing effort indicators. The impact of MT quality is evaluated on three different levels, each with an increasing granularity. We find that MT quality is a significant predictor of all different types of post-editing effort indicators and that different types of MT errors predict different post-editing effort indicators

    The Impact of Concept Representation in Interactive Concept Validation (ICV)

    Get PDF
    Large scale ideation has developed as a promising new way of obtaining large numbers of highly diverse ideas for a given challenge. However, due to the scale of these challenges, algorithmic support based on a computational understanding of the ideas is a crucial component in these systems. One promising solution is the use of knowledge graphs to provide meaning. A significant obstacle lies in word-sense disambiguation, which cannot be solved by automatic approaches. In previous work, we introduce \textit{Interactive Concept Validation} (ICV) as an approach that enables ideators to disambiguate terms used in their ideas. To test the impact of different ways of representing concepts (should we show images of concepts, or only explanatory texts), we conducted experiments comparing three representations. The results show that while the impact on ideation metrics was marginal, time/click effort was lowest in the images only condition, while data quality was highest in the both condition

    Identifying the machine translation error types with the greatest impact on post-editing effort

    Get PDF
    Translation Environment Tools make translators' work easier by providing them with term lists, translation memories and machine translation output. Ideally, such tools automatically predict whether it is more effortful to post-edit than to translate from scratch, and determine whether or not to provide translators with machine translation output. Current machine translation quality estimation systems heavily rely on automatic metrics, even though they do not accurately capture actual post-editing effort. In addition, these systems do not take translator experience into account, even though novices' translation processes are different from those of professional translators. In this paper, we report on the impact of machine translation errors on various types of post-editing effort indicators, for professional translators as well as student translators. We compare the impact of MT quality on a product effort indicator (HTER) with that on various process effort indicators. The translation and post-editing process of student translators and professional translators was logged with a combination of keystroke logging and eye-tracking, and the MT output was analyzed with a fine-grained translation quality assessment approach. We find that most post-editing effort indicators (product as well as process) are influenced by machine translation quality, but that different error types affect different post-editing effort indicators, confirming that a more fine-grained MT quality analysis is needed to correctly estimate actual post-editing effort. Coherence, meaning shifts, and structural issues are shown to be good indicators of post-editing effort. The additional impact of experience on these interactions between MT quality and post-editing effort is smaller than expected
    • …
    corecore