29,751 research outputs found

    From software APIs to web service ontologies: a semi-automatic extraction method

    Get PDF
    Successful employment of semantic web services depends on the availability of high quality ontologies to describe the domains of these services. As always, building such ontologies is difficult and costly, thus hampering web service deployment. Our hypothesis is that since the functionality offered by a web service is reflected by the underlying software, domain ontologies could be built by analyzing the documentation of that software. We verify this hypothesis in the domain of RDF ontology storage tools.We implemented and fine-tuned a semi-automatic method to extract domain ontologies from software documentation. The quality of the extracted ontologies was verified against a high quality hand-built ontology of the same domain. Despite the low linguistic quality of the corpus, our method allows extracting a considerable amount of information for a domain ontology

    Extraction of Keyphrases from Text: Evaluation of Four Algorithms

    Get PDF
    This report presents an empirical evaluation of four algorithms for automatically extracting keywords and keyphrases from documents. The four algorithms are compared using five different collections of documents. For each document, we have a target set of keyphrases, which were generated by hand. The target keyphrases were generated for human readers; they were not tailored for any of the four keyphrase extraction algorithms. Each of the algorithms was evaluated by the degree to which the algorithmÂ’s keyphrases matched the manually generated keyphrases. The four algorithms were (1) the AutoSummarize feature in MicrosoftÂ’s Word 97, (2) an algorithm based on Eric BrillÂ’s part-of-speech tagger, (3) the Summarize feature in VerityÂ’s Search 97, and (4) NRCÂ’s Extractor algorithm. For all five document collections, NRCÂ’s Extractor yields the best match with the manually generated keyphrases

    Context and Keyword Extraction in Plain Text Using a Graph Representation

    Full text link
    Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist indexing specialized documents, experience plays an important role. But indexing documents on different topics is much harder. This article proposes an innovative method for an indexing support system. This system takes as input an ontology and a plain text document and provides as output contextualized keywords of the document. The method has been evaluated by exploiting Wikipedia's category links as a termino-ontological resources

    Thesaurus-based index term extraction for agricultural documents

    Get PDF
    This paper describes a new algorithm for automatically extracting index terms from documents relating to the domain of agriculture. The domain-specific Agrovoc thesaurus developed by the FAO is used both as a controlled vocabulary and as a knowledge base for semantic matching. The automatically assigned terms are evaluated against a manually indexed 200-item sample of the FAO’s document repository, and the performance of the new algorithm is compared with a state-of-the-art system for keyphrase extraction
    • …
    corecore