14,154 research outputs found

    Datasets for generic relation extraction

    Get PDF
    A vast amount of usable electronic data is in the form of unstructured text. The relation extraction task aims to identify useful information in text (e.g. PersonW works for OrganisationX, GeneY encodes ProteinZ) and recode it in a format such as a relational database or RDF triplestore that can be more effectively used for querying and automated reasoning. A number of resources have been developed for training and evaluating automatic systems for relation extraction in different domains. However, comparative evaluation is impeded by the fact that these corpora use different markup formats and notions of what constitutes a relation. We describe the preparation of corpora for comparative evaluation of relation extraction across domains based on the publicly available ACE 2004, ACE 2005 and BioInfer data sets. We present a common document type using token standoff and including detailed linguistic markup, while maintaining all information in the original annotation. The subsequent reannotation process normalises the two data sets so that they comply with a notion of relation that is intuitive, simple and informed by the semantic web. For the ACE data, we describe an automatic process that automatically converts many relations involving nested, nominal entity mentions to relations involving non-nested, named or pronominal entity mentions. For example, the first entity is mapped from 'one' to 'Amidu Berry' in the membership relation described in 'Amidu Berry, one half of PBS'. Moreover, we describe a comparably reannotated version of the BioInfer corpus that flattens nested relations, maps part-whole to part-part relations and maps n-ary to binary relations. Finally, we summarise experiments that compare approaches to generic relation extraction, a knowledge discovery task that uses minimally supervised techniques to achieve maximally portable extractors. These experiments illustrate the utility of the corpora.39 page(s

    Chi-square-based scoring function for categorization of MEDLINE citations

    Full text link
    Objectives: Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain. Results: Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine learning algorithms (support vector machines, decision trees, na\"ive Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine learning algorithms. Conclusions: We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.Comment: 34 pages, 2 figure

    A quantitative study of adipokinetic hormone of the firebug, Pyrrhocoris apterus

    Get PDF
    The development of an enzyme-linked immunoassay (ELISA) for the adipokinetic neuropeptide hormone, Pya-AKH, from the firebug Pyrrhocoris apterus L. is described. The ELISA measures as little as 20 fmol of Pya-AKH. Tested against a range of synthetic peptides, the assay has a high sensitivity for peptides containing the C-terminal motif FTPNWamide. The amounts of Pya-AKH in the brain, corpora cardiaca, suboesophageal ganglia, and fused thoracic and abdominal ganglionic mass are very small, with only the corpora cardiaca containing appreciable levels of the hormone (ca. 4 pmol per bug). Preliminary estimates of the persistence of the hormone in the haemolymph are consistent with values determined for AKHs in other insects, and suggest that Pya-AKH has a rapid turnover with a half-life of ca. 18 min. Measurements of circulating titres of AKH in Pyrrhocoris are only possible in the ELISA described here by using pooled samples of haemolymph, and after preliminary clean-up of the haemolymph samples. The titre of Pya-AKH in resting reproductive female Pyrrhocoris is ca. 1 fmol/ÎŒl
    • 

    corecore