76,777 research outputs found
Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity
In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the systemâs architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands)
A text-mining system for extracting metabolic reactions from full-text articles
Background: Increasingly biological text mining research is focusing on the extraction of complex relationships
relevant to the construction and curation of biological networks and pathways. However, one important category of
pathwayâmetabolic pathwaysâhas been largely neglected.
Here we present a relatively simple method for extracting metabolic reaction information from free text that scores
different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence
and location of stemmed keywords. This method extends an approach that has proved effective in the context of the
extraction of proteinâprotein interactions.
Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our
method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the
well-known protein-protein interaction extraction task.
Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been
assumed, and that (as in the case of proteinâprotein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed
A geo-temporal information extraction service for processing descriptive metadata in digital libraries
In the context of digital map libraries, resources are usually described according to metadata records that define the relevant subject, location, time-span, format and keywords. On what concerns locations and time-spans, metadata records are often incomplete or they provide information in a way that is not machine-understandable (e.g. textual descriptions). This paper presents techniques for extracting geotemporal information from text, using relatively simple text mining methods that leverage on a Web gazetteer service. The idea is to go from human-made geotemporal referencing (i.e. using place and period names in textual expressions) into geo-spatial coordinates and time-spans. A prototype system, implementing the proposed methods, is described in detail. Experimental results demonstrate the efficiency and accuracy of the proposed approaches
- âŚ