1,125 research outputs found

    A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

    Get PDF

    A hybrid logic for XML reference constraints

    Get PDF
    XML emerged as the (meta) mark-up language for representing, exchanging, and storing semistructured data. The structure of an XML document may be specified either through DTD (Document Type Definition) language or through the specific language XML Schema. While the expressiveness of XML Schema allows one to specify both the structure and constraints for XML documents, DTD does not allow the specification of integrity constraints for XML documents. On the other side, DTD has a very compact notation opposed to the complex notation and syntax of XML Schema. Thus, it becomes important to consider the issue of how to express further constraints on DTD-based XML documents, still retaining the simplicity and succinctness of DTDs. According to this scenario, in this paper we focus on a (as much as possible) simple logic, named XHyb, expressive enough to allow the specification of the most common integrity and reference constraints in XML documents. In particular, we focus on constraints on ID and IDREF(S) attributes, which are the common way of logically connecting parts of XML documents, besides the usual parent-child relationship of XML elements. Differently from other previously proposed hybrid logics, in XHyb IDREF(S) attributes are explicitly expressible by means of suitable syntactical constructors. Moreover, we propose a refinement of the usual graph representation of XML documents in order to represent XML documents in a formal and intuitive way without flatten accessibility through IDREF(S) to the usual parent-child relationship. Model checking algorithms are then proposed, to verify that a given XML document satisfies the considered constraints

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
    • 

    corecore