208 research outputs found

    A comparison of parsing technologies for the biomedical domain

    Get PDF
    This paper reports on a number of experiments which are designed to investigate the extent to which current nlp resources are able to syntactically and semantically analyse biomedical text. We address two tasks: parsing a real corpus with a hand-built widecoverage grammar, producing both syntactic analyses and logical forms; and automatically computing the interpretation of compound nouns where the head is a nominalisation (e.g., hospital arrival means an arrival at hospital, while patient arrival means an arrival of a patient). For the former task we demonstrate that exible and yet constrained `preprocessing ' techniques are crucial to success: these enable us to use part-of-speech tags to overcome inadequate lexical coverage, and to `package up' complex technical expressions prior to parsing so that they are blocked from creating misleading amounts of syntactic complexity. We argue that the xml-processing paradigm is ideally suited for automatically preparing the corpus for parsing. For the latter task, we compute interpretations of the compounds by exploiting surface cues and meaning paraphrases, which in turn are extracted from the parsed corpus. This provides an empirical setting in which we can compare the utility of a comparatively deep parser vs. a shallow one, exploring the trade-o between resolving attachment ambiguities on the one hand and generating errors in the parses on the other. We demonstrate that a model of the meaning of compound nominalisations is achievable with the aid of current broad-coverage parsers

    One, no one and one hundred thousand events: Defining and processing events in an inter-disciplinary perspective

    Get PDF
    We present an overview of event definition and processing spanning 25 years of research in NLP. We first provide linguistic background to the notion of event, and then present past attempts to formalize this concept in annotation standards to foster the development of benchmarks for event extraction systems. This ranges from MUC-3 in 1991 to the Time and Space Track challenge at SemEval 2015. Besides, we shed light on other disciplines in which the notion of event plays a crucial role, with a focus on the historical domain. Our goal is to provide a comprehensive study on event definitions and investigate which potential past efforts in the NLP community may have in a different research domain. We present the results of a questionnaire, where the notion of event for historians is put in relation to the NLP perspective

    Nominalization and Alternations in Biomedical Language

    Get PDF
    Background: This paper presents data on alternations in the argument structure of common domain-specific verbs and their associated verbal nominalizations in the PennBioIE corpus. Alternation is the term in theoretical linguistics for variations in the surface syntactic form of verbs, e.g. the different forms of stimulate in FSH stimulates follicular development and follicular development is stimulated by FSH. The data is used to assess the implications of alternations for biomedical text mining systems and to test the fit of the sublanguage model to biomedical texts. Methodology/Principal Findings: We examined 1,872 tokens of the ten most common domain-specific verbs or their zerorelated nouns in the PennBioIE corpus and labelled them for the presence or absence of three alternations. We then annotated the arguments of 746 tokens of the nominalizations related to these verbs and counted alternations related to the presence or absence of arguments and to the syntactic position of non-absent arguments. We found that alternations are quite common both for verbs and for nominalizations. We also found a previously undescribed alternation involving an adjectival present participle. Conclusions/Significance: We found that even in this semantically restricted domain, alternations are quite common, and alternations involving nominalizations are exceptionally diverse. Nonetheless, the sublanguage model applies to biomedica

    Closed-World Semantics for Query Answering in Temporal Description Logics

    Get PDF
    Ontology-mediated query answering is a popular paradigm for enriching answers to user queries with background knowledge. For querying the absence of information, however, there exist only few ontology-based approaches. Moreover, these proposals conflate the closed-domain and closed-world assumption, and therefore are not suited to deal with the anonymous objects that are common in ontological reasoning. Many real-world applications, like processing electronic health records (EHRs), also contain a temporal dimension, and require efficient reasoning algorithms. Moreover, since medical data is not recorded on a regular basis, reasoners must deal with sparse data with potentially large temporal gaps. Our contribution consists of three main parts: Firstly, we introduce a new closed-world semantics for answering conjunctive queries with negation over ontologies formulated in the description logic ELH⊥, which is based on the minimal universal model. We propose a rewriting strategy for dealing with negated query atoms, which shows that query answering is possible in polynomial time in data complexity. Secondly, we introduce a new temporal variant of ELH⊥ that features a convexity operator. We extend this minimal-world semantics for answering metric temporal conjunctive queries with negation over the logic and obtain similar rewritability and complexity results. Thirdly, apart from the theoretical results, we evaluate minimal-world semantics in practice by selecting patients, based their EHRs, that match given criteria

    A Mention-Ranking Model for Abstract Anaphora Resolution

    Full text link
    Resolving abstract anaphora is an important, but difficult task for text understanding. Yet, with recent advances in representation learning this task becomes a more tangible aim. A central property of abstract anaphora is that it establishes a relation between the anaphor embedded in the anaphoric sentence and its (typically non-nominal) antecedent. We propose a mention-ranking model that learns how abstract anaphors relate to their antecedents with an LSTM-Siamese Net. We overcome the lack of training data by generating artificial anaphoric sentence--antecedent pairs. Our model outperforms state-of-the-art results on shell noun resolution. We also report first benchmark results on an abstract anaphora subset of the ARRAU corpus. This corpus presents a greater challenge due to a mixture of nominal and pronominal anaphors and a greater range of confounders. We found model variants that outperform the baselines for nominal anaphors, without training on individual anaphor data, but still lag behind for pronominal anaphors. Our model selects syntactically plausible candidates and -- if disregarding syntax -- discriminates candidates using deeper features.Comment: In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen, Denmar

    Action nominalizations in Early Modern scientific English

    Get PDF
    The present dissertation, Action nominalizations in Early Modern scientific English, was conceived as a contribution to the literature on nominalizations. Its point of departure was the assertion that action nominalizations are the result of a word-formation process which aims at filling gaps in the vocabulary of a particular language, English in this case. Action nominalizations are clear cases of grammatical metaphor (Halliday 2004 [1985]), since they are nouns, but they refer to actions as verbs do. For this reason, attention is given to the evolution and use of action nominalizations in the Early Modern English period (henceforth EModE), the time which sees the greatest increase of vocabulary in the history of the English language. Given that nouns prototypically refer to objects rather than actions, the question arises as to how they behave when they denote actions, and what the consequences of this use are
    corecore