Search CORE

25,794 research outputs found

Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction Via the CIDOC CRM

Author: Binding Ceri
May Keith
Tudhope Douglas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Abstract. Findings from a data mapping and extraction exercise undertaken as part of the STAR project are described and related to recent work in the area. The exercise was undertaken in conjunction with English Heritage and encompassed five differently structured relational databases containing various results of archaeological excavations. The aim of the exercise was to demonstrate the potential benefits in cross searching data expressed as RDF and conforming to a common overarching conceptual data structure schema- the English Heritage Centre for Archaeology ontological model (CRM-EH), an extension of the CIDOC Conceptual Reference Model (CRM). A semi-automatic mapping/extraction tool proved an essential component. The viability of the approach is demonstrated by web services and a client application on an integrated data and concept network

CiteSeerX

University of South Wales Research Explorer

Algorithms and implementation of functional dependency discovery in XML : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Sciences in Information Systems at Massey University

Author: Zhou Zheng
Publication venue: 'Massey University'
Publication date: 01/01/2006
Field of study

1.1 Background Following the advent of the web, there has been a great demand for data interchange between applications using internet infrastructure. XML (extensible Markup Language) provides a structured representation of data empowered by broad adoption and easy deployment. As a subset of SGML (Standard Generalized Markup Language), XML has been standardized by the World Wide Web Consortium (W3C) [Bray et al., 2004], XML is becoming the prevalent data exchange format on the World Wide Web and increasingly significant in storing semi-structured data. After its initial release in 1996, it has evolved and been applied extensively in all fields where the exchange of structured documents in electronic form is required. As with the growing popularity of XML, the issue of functional dependency in XML has recently received well deserved attention. The driving force for the study of dependencies in XML is it is as crucial to XML schema design, as to relational database(RDB) design [Abiteboul et al., 1995]

Massey Research Online

Extracting and Cleaning RDF Data

Author: Farid Mina
Publication venue: 'University of Waterloo'
Publication date: 01/05/2020
Field of study

The RDF data model has become a prevalent format to represent heterogeneous data because of its versatility. The capability of dismantling information from its native formats and representing it in triple format offers a simple yet powerful way of modelling data that is obtained from multiple sources. In addition, the triple format and schema constraints of the RDF model make the RDF data easy to process as labeled, directed graphs. This graph representation of RDF data supports higher-level analytics by enabling querying using different techniques and querying languages, e.g., SPARQL. Anlaytics that require structured data are supported by transforming the graph data on-the-fly to populate the target schema that is needed for downstream analysis. These target schemas are defined by downstream applications according to their information need. The flexibility of RDF data brings two main challenges. First, the extraction of RDF data is a complex task that may involve domain expertise about the information required to be extracted for different applications. Another significant aspect of analyzing RDF data is its quality, which depends on multiple factors including the reliability of data sources and the accuracy of the extraction systems. The quality of the analysis depends mainly on the quality of the underlying data. Therefore, evaluating and improving the quality of RDF data has a direct effect on the correctness of downstream analytics. This work presents multiple approaches related to the extraction and quality evaluation of RDF data. To cope with the large amounts of data that needs to be extracted, we present DSTLR, a scalable framework to extract RDF triples from semi-structured and unstructured data sources. For rare entities that fall on the long tail of information, there may not be enough signals to support high-confidence extraction. Towards this problem, we present an approach to estimate property values for long tail entities. We also present multiple algorithms and approaches that focus on the quality of RDF data. These include discovering quality constraints from RDF data, and utilizing machine learning techniques to repair errors in RDF data

University of Waterloo's Institutional Repository

Modeling views in the layered view model for XML using UML

Author: Abiteboul S.
Abiteboul S.
Abiteboul S.
Abiteboul S.
Abiteboul S.
AIIM
Augurusa E.
Balsters
Benjelloun O.
Braga D.
Chang E.
Chang E.
Chen Y. B.
Chen Y. B.
Cluet S.
Codd E. F.
Date C. J.
Dillon T. S.
Do H. H.
Doorn J. H.
Elizabeth Chang
Elmasri R.
Feng L.
Gaevic D.
Gardner W.
Goldman R.
Gopalkrishnan V.
Graham I.
Gupta A.
HyOntUse
ITEC
KAON
Ling Feng
Ludaescher B.
Ludaescher B.
Mohania M. K.
Munroe
Nassis V.
OMG
Path
Query
Rajugan E.
Rajugan R.
Rajugan R.
Rajugan R.
Rajugan R.
Rajugan Rajagopalapillai
SPARQL
Steele R.
Tharam S. Dillon
Trujillo J.
Uceda-Sosa R.
Volz R.
Volz R.
Warmer J. B.
Wouters C.
Wouters C.
Wouters C.
Wouters C.
Xyleme
Zhuge Y.
Publication venue: Troubador Publishing Ltd.
Publication date: 01/01/2006
Field of study

In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of Extensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user-defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi-structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three-fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a view-driven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction

CiteSeerX

Crossref

OPUS - University of Technology Sydney

University of Twente Research Information

espace@Curtin

A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

Author: Price S
Publication venue: Department of Computer Science, University of Bristol
Publication date: 01/01/2004
Field of study

Explore Bristol Research