    Generating adaptive hypertext content from the semantic web

    Accessing and extracting knowledge from online documents is crucial for therealisation of the Semantic Web and the provision of advanced knowledge services. The Artequakt project is an ongoing investigation tackling these issues to facilitate the creation of tailored biographies from information harvested from the web. In this paper we will present the methods we currently use to model, consolidate and store knowledge extracted from the web so that it can be re-purposed as adaptive content. We look at how Semantic Web technology could be used within this process and also how such techniques might be used to provide content to be published via the Semantic Web

    RDF/S)XML Linguistic Annotation of Semantic Web Pages

    Although with the Semantic Web initiative much research on web pages semantic annotation has already done by AI researchers, linguistic text annotation, including the semantic one, was originally developed in Corpus Linguistics and its results have been somehow neglected by AI. ..

    Web based knowledge extraction and consolidation for automatic ontology instantiation

    The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the returned documents and identifying the knowledge of interest is therefore left to the user. The Artequakt system aims to deploy natural language tools to automatically ex-tract and consolidate knowledge from web documents and instantiate a given ontology, which dictates the type and form of knowledge to extract. Artequakt focuses on the domain of artists, and uses the harvested knowledge to gen-erate tailored biographies. This paper describes the latest developments of the system and discusses the problem of knowledge consolidation

    Artequakt: Generating tailored biographies from automatically annotated fragments from the web

    The Artequakt project seeks to automatically generate narrativebiographies of artists from knowledge that has been extracted from the Web and maintained in a knowledge base. An overview of the system architecture is presented here and the three key components of that architecture are explained in detail, namely knowledge extraction, information management and biography construction. Conclusions are drawn from the initial experiences of the project and future progress is detailed

    Event recognition on news stories and semi-automatic population of an ontology

    This paper describes a system which recognizes events on news stories. Our system classifies stories and populates a hand-crafted ontology with new instances of classes defined in it. Currently, our system recognizes events which can be classified as belonging to a single category and it also recognizes overlapping events within one article (more than one event is recognized). In each case, the system provides a confidence value associated to the suggested classification. Our system uses Information Extraction and Machine Learning technologies. The system was tested using a corpus of 200 news articles from an archive of electronic news stories describing the academic life of the Knowledge Media (KMi). In particular, these news stories describe events such as a project award, publications, visits, etc.

    Ontology-based Interoperation of Linguistic Tools for an Improved Lemma Annotation in Spanish

    In this paper, we present an ontology-based methodology and architecture for the comparison, assessment, combination (and, to some extent, also contrastive evaluation) of the results of different linguistic tools. More specifically, we describe an experiment aiming at the improvement of the correctness of lemma tagging for Spanish. This improvement was achieved by means of the standardisation and combination of the results of three different linguistic annotation tools (Bitext’s DataLexica, Connexor’s FDG Parser and LACELL’s POS tagger), using (1) ontologies, (2) a set of lemma tagging correction rules, determined empirically during the experiment, and (3) W3C standard languages, such as XML, RDF(S) and OWL. As we show in the results of the experiment, the interoperation of these tools by means of ontologies and the correction rules applied in the experiment improved significantly the quality of the resulting lemma tagging (when compared to the separate lemma tagging performed by each of the tools that we made interoperate)

    A Semantic web page linguistic annotation model

    Although with the Semantic Web initiative much research on web page semantic annotation has already been done by AI researchers, linguistic text annotation, including the semantic one, was originally developed in Corpus Linguistics and its results have been somehow neglected by AI. The purpose of the research presented in this proposal is to prove that integration of results in both fields is not only possible, but also highly useful in order to make Semantic Web pages more machine-readable. A multi-level (possibly multi-purpose and multi-language) annotation model based on EAGLES standards and Ontological Semantics, implemented with last generation Semantic Web languages is being developed to fit the needs of both communities