29,751 research outputs found
From software APIs to web service ontologies: a semi-automatic extraction method
Successful employment of semantic web services depends on
the availability of high quality ontologies to describe the domains of these services. As always, building such ontologies is difficult and costly, thus hampering web service deployment. Our hypothesis is that since the functionality offered by a web service is reflected by the underlying software, domain ontologies could be built by analyzing the documentation of that software. We verify this hypothesis in the domain of RDF ontology storage tools.We implemented and fine-tuned a semi-automatic method to extract domain ontologies from software documentation. The quality of the extracted ontologies was verified against a high quality hand-built ontology of the same domain. Despite the low linguistic quality of the corpus, our method allows extracting a considerable amount
of information for a domain ontology
Extraction of Keyphrases from Text: Evaluation of Four Algorithms
This report presents an empirical evaluation of four algorithms for automatically extracting keywords and keyphrases from documents. The four algorithms are compared using five different collections of documents. For each document, we have a target set of keyphrases, which were generated by hand. The target keyphrases were generated for human readers; they were not tailored for any of the four keyphrase extraction algorithms. Each of the algorithms was evaluated by the degree to which the algorithmÂ’s keyphrases matched the manually generated keyphrases. The four algorithms were (1) the AutoSummarize feature in MicrosoftÂ’s Word 97, (2) an algorithm based on Eric BrillÂ’s part-of-speech tagger, (3) the Summarize feature in VerityÂ’s Search 97, and (4) NRCÂ’s Extractor algorithm. For all five document collections, NRCÂ’s Extractor yields the best match with the manually generated keyphrases
Context and Keyword Extraction in Plain Text Using a Graph Representation
Document indexation is an essential task achieved by archivists or automatic
indexing tools. To retrieve relevant documents to a query, keywords describing
this document have to be carefully chosen. Archivists have to find out the
right topic of a document before starting to extract the keywords. For an
archivist indexing specialized documents, experience plays an important role.
But indexing documents on different topics is much harder. This article
proposes an innovative method for an indexing support system. This system takes
as input an ontology and a plain text document and provides as output
contextualized keywords of the document. The method has been evaluated by
exploiting Wikipedia's category links as a termino-ontological resources
Thesaurus-based index term extraction for agricultural documents
This paper describes a new algorithm for automatically extracting index terms from documents relating to the domain of agriculture. The domain-specific Agrovoc thesaurus developed by the FAO is used both as a controlled vocabulary and as a knowledge base for semantic matching. The automatically assigned terms are evaluated against a manually indexed 200-item sample of the FAO’s document repository, and the performance of the new algorithm is compared with a state-of-the-art system for keyphrase extraction
- …