22,502 research outputs found

    Template Mining for Information Extraction from Digital Documents

    Get PDF
    published or submitted for publicatio

    A text-mining system for extracting metabolic reactions from full-text articles

    Get PDF
    Background: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway—metabolic pathways—has been largely neglected. Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein–protein interactions. Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task. Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein–protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed

    Historical Grassland Turboveg Database Project. 2067 Relevés recorded by Dr Austin O’ Sullivan 1962 – 1982

    Get PDF
    User Guide and CD of Database are availableEnd of project reportThe more common grassland types occupy about 70% of the Irish landscape (O’Sullivan, 1982), but information on these vegetation types is rare. Generally, Irish grasslands are distinguished based on the intensity of their management (improved or semi-natural grasslands), and the drainage conditions and acidity of the soil (dry or wet, calcareous or acidic grassland types) (Fossitt, 2000). However, little is known about their floristic composition and the changes in floristic composition over time. The current knowledge on grassland vegetation is mostly based on a survey of Irish grasslands by Dr. Austin O’Sullivan completed in the 1960’s and 1970’s (O’Sullivan, 1982). In this survey O’Sullivan identified Irish grassland types in accordance with the classification of continental European grasslands based on the principles of the School of Phytosociology. O’Sullivan distinguished five main grassland types introducing agricultural criteria as well as floristic criteria into grassland classification (O’Sullivan, 1982). In 1978, O’Sullivan made an attempt at mapping Ireland’s vegetation types including the five grassland types distinguished in his later publication as well as two types of peatland vegetation (Figures 1 and 2). This map was completed using 1960’s soils maps (National Soil Survey, Teagasc, Johnstown Castle) and a subsample of the dataset on the composition of Irish grasslands. Phytosociological classification of vegetation is based on the full floristic composition of the vegetation as determined by assessing the abundance and spatial structure of the plant species in a given area. The actual area of the survey (or relevé) is determined according to strict criteria, which include how representative the sample area is for the wider vegetation (i.e. how many of the species found in the wider area are also present in the survey area).National Parks and Wildlife Service of the Department of the Environment, Heritage and Local Government, Dublin, Ireland

    An annotated corpus with nanomedicine and pharmacokinetic parameters

    Get PDF
    A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided
    corecore