14,310 research outputs found

    Molecular Interactions. On the Ambiguity of Ordinary Statements in Biomedical Literature

    Get PDF
    Statements about the behavior of biochemical entities (e.g., about the interaction between two proteins) abound in the literature on molecular biology and are increasingly becoming the targets of information extraction and text mining techniques. We show that an accurate analysis of the semantics of such statements reveals a number of ambiguities that have to be taken into account in the practice of biomedical ontology engineering: Such statements can not only be understood as event reporting statements, but also as ascriptions of dispositions or tendencies that may or may not refer to collectives of interacting molecules or even to collectives of interaction events

    Neurocognitive Informatics Manifesto.

    Get PDF
    Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given

    Mining Large-scale Event Knowledge from Web Text

    Get PDF
    AbstractThis paper addresses the problem of automatic acquisition of semantic relations between events. While previous works on semantic relation automatic acquisition relied on annotated text corpus, it is still unclear how to develop more generic methods to meet the needs of identifying related event pairs and extracting event-arguments (especially the predicate, subject and object). Motivated by this limitation, we develop a three-phased approach that acquires causality from the Web text. First, we use explicit connective markers (such as ā€œbecauseā€) as linguistic cues to discover causal related events. Next, we extract the event-arguments based on local dependency parse trees of event expressions. At the last step, we propose a statistical model to measure the potential causal relations. The results of our empirical evaluations on a large-scale Web text corpus show that (a) the use of local dependency tree extensively improves both the accuracy and recall of event-arguments extraction task, and (b) our measure improves the traditional PMI method

    The biomedical discourse relation bank

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identification of discourse relations, such as causal and contrastive relations, between situations mentioned in text is an important task for biomedical text-mining. A biomedical text corpus annotated with discourse relations would be very useful for developing and evaluating methods for biomedical discourse processing. However, little effort has been made to develop such an annotated resource.</p> <p>Results</p> <p>We have developed the Biomedical Discourse Relation Bank (BioDRB), in which we have annotated explicit and implicit discourse relations in 24 open-access full-text biomedical articles from the GENIA corpus. Guidelines for the annotation were adapted from the Penn Discourse TreeBank (PDTB), which has discourse relations annotated over open-domain news articles. We introduced new conventions and modifications to the sense classification. We report reliable inter-annotator agreement of over 80% for all sub-tasks. Experiments for identifying the sense of explicit discourse connectives show the connective itself as a highly reliable indicator for coarse sense classification (accuracy 90.9% and F1 score 0.89). These results are comparable to results obtained with the same classifier on the PDTB data. With more refined sense classification, there is degradation in performance (accuracy 69.2% and F1 score 0.28), mainly due to sparsity in the data. The size of the corpus was found to be sufficient for identifying the sense of explicit connectives, with classifier performance stabilizing at about 1900 training instances. Finally, the classifier performs poorly when trained on PDTB and tested on BioDRB (accuracy 54.5% and F1 score 0.57).</p> <p>Conclusion</p> <p>Our work shows that discourse relations can be reliably annotated in biomedical text. Coarse sense disambiguation of explicit connectives can be done with high reliability by using just the connective as a feature, but more refined sense classification requires either richer features or more annotated data. The poor performance of a classifier trained in the open domain and tested in the biomedical domain suggests significant differences in the semantic usage of connectives across these domains, and provides robust evidence for a biomedical sublanguage for discourse and the need to develop a specialized biomedical discourse annotated corpus. The results of our cross-domain experiments are consistent with related work on identifying connectives in BioDRB.</p
    • ā€¦
    corecore