16,792 research outputs found
Schema matching for transforming structured documents
Structured document content reuse is the problem of restructuring and translating data structured under a source schema into an instance of a target schema. A notion closely tied with structured document reuse is that of structure transformations. Schema matching is a critical strep in structured document transformations. Manual matching is expensive and error-prone. It is therefore important to develop techniques to automate the matching process and thus the transformation process. In this paper, we contributed in both understanding the matching problem in the context of structured document transformations and developing matching methods those output serves as the basis for the automatic generation of transformation scripts
Data Model and Query Constructs for Versatile Web Query Languages
As the Semantic Web is gaining momentum, the need for
truly versatile query languages becomes increasingly apparent. A Web
query language is called versatile if it can access in the same query program
data in different formats (e.g. XML and RDF). Most query languages
are not versatile: they have not been specifically designed to cope
with both worlds, providing a uniform language and common constructs
to query and transform data in various formats. Moreover, most of them
do not provide a flexible data model that is powerful enough to naturally
convey both Semantic Web data formats (especially RDF and
Topic Maps) and XML. This article highlights challenges related to the
data model and language constructs for querying both standard Web
and Semantic Web data with an emphasis on facilitating sophisticated
reasoning. It is shown that Xcerpt’s data model and querying constructs
are particularly well-suited for the Semantic Web, but that some adjustments
of the Xcerpt syntax allow for even more effective and natural
querying of RDF and Topic Maps
A Progressive Clustering Algorithm to Group the XML Data by Structural and Semantic Similarity
Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. It has become a challenge for researchers to turn these documents into a more useful information utility. In this paper, we introduce a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according to their similar structural and semantic representations. We develop a global criterion function CPSim that progressively measures the similarity between a XML document and existing clusters, ignoring the need to compute the similarity between two individual documents. The experimental analysis shows the method to be fast and accurate
Data integration through service-based mediation for web-enabled information systems
The Web and its underlying platform technologies have often been used to integrate existing software and information systems. Traditional techniques for data representation and transformations between documents are not sufficient to support a flexible and maintainable data integration solution that meets the requirements of modern complex Web-enabled software and information systems. The difficulty
arises from the high degree of complexity of data structures, for example in business and technology applications, and from the constant change of data and its
representation. In the Web context, where the Web platform is used to integrate different organisations or software systems, additionally the problem of heterogeneity
arises. We introduce a specific data integration solution for Web applications such as Web-enabled information systems. Our contribution is an integration technology
framework for Web-enabled information systems comprising, firstly, a data integration technique based on the declarative specification of transformation rules and the construction of connectors that handle the integration and, secondly, a mediator architecture based on information services and the constructed connectors to handle the integration process
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
Mediated data integration and transformation for web service-based software architectures
Service-oriented architecture using XML-based web services has been widely accepted by many organisations as the standard infrastructure to integrate heterogeneous and autonomous data sources. As a result, many Web service providers are built up on top of the data sources to share the data by supporting provided and required interfaces and methods of data access in a unified manner. In the context of data integration, problems arise when Web services are assembled to deliver an integrated view of data, adaptable to the specific needs of individual clients and providers. Traditional approaches of data integration and transformation are not suitable to automate the construction of connectors dedicated to connect selected Web services to render integrated and tailored views of data. We propose a declarative approach that addresses the oftenneglected data integration and adaptivity aspects of serviceoriented
architecture
- …