51 research outputs found

    Annotating a Parallel Monolingual Treebank with Semantic Similarity Relations

    Get PDF
    Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories. Editors: Koenraad De Smedt, Jan Hajič and Sandra Kübler. NEALT Proceedings Series, Vol. 1 (2007), 85-96. © 2007 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/4476

    Care episode retrieval: distributional semantic models for information retrieval in the clinical domain

    Get PDF
    Patients' health related information is stored in electronic health records (EHRs) by health service providers. These records include sequential documentation of care episodes in the form of clinical notes. EHRs are used throughout the health care sector by professionals, administrators and patients, primarily for clinical purposes, but also for secondary purposes such as decision support and research. The vast amounts of information in EHR systems complicate information management and increase the risk of information overload. Therefore, clinicians and researchers need new tools to manage the information stored in the EHRs. A common use case is, given a - possibly unfinished - care episode, to retrieve the most similar care episodes among the records. This paper presents several methods for information retrieval, focusing on care episode retrieval, based on textual similarity, where similarity is measured through domain-specific modelling of the distributional semantics of words. Models include variants of random indexing and the semantic neural network model word2vec. Two novel methods are introduced that utilize the ICD-10 codes attached to care episodes to better induce domain-specificity in the semantic model. We report on experimental evaluation of care episode retrieval that circumvents the lack of human judgements regarding episode relevance. Results suggest that several of the methods proposed outperform a state-of-the art search engine (Lucene) on the retrieval task

    Optionality In Evaluating Prosody Prediction

    No full text
    This paper concerns the evaluation of prosody prediction at the symbolic level, in particular the locations of pitch accents and intonational boundaries. One evaluation method is to ask an expert to annotate text prosodically, and to compare the system's predictions with this reference. However, this ignores the issue of optionality: there is usually more than one acceptable way to place accents and boundaries. Therefore, predictions that do not match the reference are not necessarily wrong. We propose dealing with this issue by means of a 3-class annotation which includes a class for optional accents/boundaries. We show, in a prosody prediction experiment using a memory-based learner, that evaluating against a 3-class annotation derived from multiple independent 2-class annotations allows us to identify the real prediction errors and to better estimate the real performance. Next, it is shown that a 3-class annotation produced directly by a single annotator yields a reasonable approximation of the more expensive 3-class annotation derived from multiple annotations. Finally, the results of a larger scale experiment confirm our findings

    CoNLL-X shared task on multilingual dependency parsing

    No full text
    Each year the Conference on Computational Natural Language Learning (CoNLL) 1 features a shared task, in which participants train and test their systems on exactly the same data sets, in order to better compare systems. The tenth CoNLL (CoNLL-X) saw a shared task on Multilingual Dependency Parsing. In this paper, we describe how treebanks for 13 languages were converted into the same dependency format and how parsing performance was measured. We also give an overview of the parsing approaches that participants took and the results that they achieved. Finally, we try to draw general conclusions about multi-lingual parsing: What makes a particular language, treebank or annotation scheme easier or harder to parse and which phenomena are challenging for any dependency parser? Acknowledgement Many thanks to Amit Dubey and Yuval Krymolowski, the other two organizers of the shared task, for discussions, converting treebanks, writing software and helping with the papers.

    Classification of semantic relations by humans and machines

    No full text
    This paper addresses the classification of semantic relations between pairs of sentences extracted from a Dutch parallel corpus at the word, phrase and sentence level

    Detecting semantic overlap : A parallel monolingual treebank for Dutch

    No full text
    This paper describes an ongoing effort to build a large-scale monolingual treebank of parallel/ comparable Dutch text, where nodes of syntax trees are aligned and labeled according to a small set of semantic similarity relations. Such a corpus has many potential uses and applications in e.g., multi-document summarization, question-answering and paraphrase extraction. We describe the text material, preprocessing, annotation, and alignment of sentences and syntax trees, both manual and automatic. Two new annotation tools are presented, as well as results from pilot experiments on inter-annotator agreement. On the basis of this resource, new automatic alignment software and NLP applications will be developed
    corecore