3 research outputs found

    Contextual and Metadata-based Approach for the Semantic Annotation of Heterogeneous Documents

    No full text
    ISSN 1613-0073International audienceIn this paper, we present SHIRI-Annot, an automatic ontology- driven and unsupervised approach for the semantic annotation of doc- uments which contain more or less structured parts. The aim of this approach is to build an integration system called SHIRI 3 which allows the user access to documents related to a specific domain. In this sys- tem, the querying process is guided by an ontology of the domain and the answers are only made of the pertinent parts of the documents unlike keywords-based search engines. The ontology is described using RDFS (Resource Description Framework Schema) language. The SHIRI-Annot approach consists of locating and then annotating concept instances and their semantic relations. The locating step combines existing annotation approaches in order to locate instances in the text. The annotation step exploits a set of metadata and a set of logical rule patterns which are automatically instanciated from the domain description. These metadata are provided from the ontology or are defined specifically for the annota- tion task. The resulting annotations are represented in RDF (Resource Description Framework) language. We show through a preliminary study made on a corpus of HTML documents the usefulness of these specific metadata to represent the heterogeneity of documents. We also illus- trate through examples how the SHIRI system exploits the metadata to approximate the user queries in order to provide more pertinent answers
    corecore