4 research outputs found

    Knowledge extraction from webpages

    Get PDF
    http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/This article presents a system to extract Knowledge from webpages by producing semantic annotations. taking into account semantic information from the domain to annotate an element in a webpage implies solving two problems : (1) identifying the syntactic structure of this element in the webpage and (2) identifying the most specific concept (in terms of subsumption) of the ontology that will be used to annotate this element. Our approach relies on a wrapper-based machine learning algorithm combined with reasoning making use of the formal structure of the ontology

    Knowledge extraction from webpages

    No full text
    Abstract. This article presents a system to extract Knowledge from webpages by producing semantic annotations. taking into account semantic information from the domain to annotate an element in a webpage implies solving two problems: (1) identifying the syntactic structure of this element in the webpage and (2) identifying the most specific concept (in terms of subsumption) of the ontology that will be used to annotate this element. Our approach relies on a wrapper-based machine learning algorithm combined with reasoning making use of the formal structure of the ontology. 1 Context of the research Our system aims at using information provided by research teams on their website to generate knowledge about the European Research Community. In order to make this information machine-processable, a formal representation of the content of the webpages is needed, encoded with a well-defined syntax and semantics. This is the purpose of semantic annotation [1]. The system is provided with: – an ontology which represents the concepts of a domain and their relationships. The ontology, implemented in the Web Ontology Language (OWL), is based on Description Logics (DL) and thus reasoning mechanisms, like classification and subsumption, are provided [2], – webpages from which data are extracted according to the ontology. For each data in the document, the systems generates an individual with the concept and roles it instantiates. Each individual is added to a Knowledge Base (KB). Two main tasks are dealt with: the first is about locating each data in the provided documents and extracting it to generate a “raw ” individual which may not be specific enough. It is followed by a reasoning task which infers the most specific concept the individual is an instance of
    corecore