Search CORE

3,493 research outputs found

A Progressive Clustering Algorithm to Group the XML Data by Structural and Semantic Similarity

Author: Nayak Richi
Tran Tien
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2007
Field of study

Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. It has become a challenge for researchers to turn these documents into a more useful information utility. In this paper, we introduce a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according to their similar structural and semantic representations. We develop a global criterion function CPSim that progressively measures the similarity between a XML document and existing clusters, ignoring the need to compute the similarity between two individual documents. The experimental analysis shows the method to be fast and accurate

CiteSeerX

Queensland University of Technology ePrints Archive

XML Schema Clustering with Semantic and Hierarchical Similarity Measures

Author: Iryadi Wina
Nayak Richi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

Crossref

Queensland University of Technology ePrints Archive

XML Matchers: approaches and challenges

Author: Agreste Santa
De Meo Pasquale
Ferrara Emilio
Ursino Domenico
Publication venue: 'Elsevier BV'
Publication date: 10/07/2014
Field of study

Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure

arXiv.org e-Print Archive

IRIS UniversitÃ Politecnica delle Marche

Review implementation of linguistic approach in schema matching

Author: Martono Galih Hendro
SN Azhari
Publication venue: 'Universitas Ahmad Dahlan, Kampus 3'
Publication date: 01/03/2017
Field of study

Research related schema matching has been conducted since last decade. Few approach related schema matching has been conducted with various methods such as neuron network, feature selection, constrain based, instance based, linguistic, and so on. Some field used schema matching as basic model such as e-commerce, e-business and data warehousing. Implementation of linguistic approach itself has been used a long time with various problem such as to calculated entity similarity values in two or more schemas. The purpose of this paper was to provide an overview of previous studies related to the implementation of the linguistic approach in the schema matching and finding gap for the development of existing methods. Futhermore, this paper focused on measurement of similarity in linguistic approach in schema matching

International Journal of Advances in Intelligent Informatics

Crossref

Directory of Open Access Journals

International Journal of Advances in Intelligent Informatics (IJAIN)

Semantics-based approach for generating partial views from linked life-cycle highway project data

Author: Le Tuyen Thanh
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2017
Field of study

The purpose of this dissertation is to develop methods that can assist data integration and extraction from heterogeneous sources generated throughout the life-cycle of a highway project. In the era of computerized technologies, project data is largely available in digital format. Due to the fragmented nature of the civil infrastructure sector, digital data are created and managed separately by different project actors in proprietary data warehouses. The differences in the data structure and semantics greatly hinder the exchange and fully reuse of digital project data. In order to address those issues, this dissertation carries out the following three individual studies. The first study aims to develop a framework for interconnecting heterogeneous life cycle project data into an unified and linked data space. This is an ontology-based framework that consists of two phases: (1) translating proprietary datasets into homogeneous RDF data graphs; and (2) connecting separate data networks to each other. Three domain ontologies for design, construction, and asset condition survey phases are developed to support data transformation. A merged ontology that integrates the domain ontologies is constructed to provide guidance on how to connect data nodes from domain graphs. The second study is to deal with the terminology inconsistency between data sources. An automated method is developed that employs Natural Language Processing (NLP) and machine learning techniques to support constructing a domain specific lexicon from design manuals. The method utilizes pattern rules to extract technical terms from texts and learns their representation vectors using a neural network based word embedding approach. The study also includes the development of an integrated method of minimal-supervised machine learning, clustering analysis, and word vectors, for computing the term semantics and classifying the relations between terms in the target lexicon. In the last study, a data retrieval technique for extracting subsets of an XML civil data schema is designed and tested. The algorithm takes a keyword input of the end user and returns a ranked list of the most relevant XML branches. This study utilizes a lexicon of the highway domain generated from the second study to analyze the semantics of the end user keywords. A context-based similarity measure is introduced to evaluate the relevance between a certain branch in the source schema and the user query. The methods and algorithms resulting from this research were tested using case studies and empirical experiments. The results indicate that the study successfully address the heterogeneity in the structure and terminology of data and enable a fast extraction of sub-models of data. The study is expected to enhance the efficiency in reusing digital data generated throughout the project life-cycle, and contribute to the success in transitioning from paper-based to digital project delivery for civil infrastructure projects

Digital Repository @ Iowa State University (ISU)

Web service searching

Author: Jaghmani Ismail
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2005
Field of study

With the growing number of Web services, it is no longer adequate to locate a Web service by searching its name or browsing a UDDI directory. An efficient Web services discovery mechanism is necessary for locating and selecting the required Web services. Searching mechanism should be based on Web service description rather than on keywords. In this work, we introduce a Web service searching prototype that can locate Web services by comparing all available information encoded in Web service description, such as operation name, input and output types, the structure of the underlying XML schema, and the semantic of element names. Our approach combines information-retrieval techniques, weighted bipartite graph matching algorithm and tree-matching algorithm. Given a query, represented as set of keywords, Web service description, or operation description, an information retrieval technique is used to rank the candidate Web services based on their text-base similarity to the query. The ranked result can be further refined by computing their structure similarity. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .J34. Source: Masters Abstracts International, Volume: 44-03, page: 1403. Thesis (M.Sc.)--University of Windsor (Canada), 2005

Scholarship at UWindsor

Dealing with uncertain entities in ontology alignment using rough sets

Author: Alireza Mousavi
Hamed Al-Raweshidy
Man Qi
Maozhen Li
Sadaqat Jan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2012
Field of study

This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Ontology alignment facilitates exchange of knowledge among heterogeneous data sources. Many approaches to ontology alignment use multiple similarity measures to map entities between ontologies. However, it remains a key challenge in dealing with uncertain entities for which the employed ontology alignment measures produce conflicting results on similarity of the mapped entities. This paper presents OARS, a rough-set based approach to ontology alignment which achieves a high degree of accuracy in situations where uncertainty arises because of the conflicting results generated by different similarity measures. OARS employs a combinational approach and considers both lexical and structural similarity measures. OARS is extensively evaluated with the benchmark ontologies of the ontology alignment evaluation initiative (OAEI) 2010, and performs best in the aspect of recall in comparison with a number of alignment systems while generating a comparable performance in precision

Crossref

Brunel University Research Archive

Recommended from our members

Semantic information systems engineering: A query-based approach for semi-automatic annotation of web services

Author: Al-Asswad Mohammad Mourhaf
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2011
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.There has been an increasing interest in Semantic Web services (SWS) as a proposed solution to facilitate automatic discovery, composition and deployment of existing syntactic Web services. Successful implementation and wider adoption of SWS by research and industry are, however, profoundly based on the existence of effective and easy to use methods for service semantic description. Unfortunately, Web service semantic annotation is currently performed by manual means. Manual annotation is a difficult, error-prone and time-consuming task and few approaches exist aiming to semi-automate that task. Existing approaches are difficult to use since they require ontology building. Moreover, these approaches employ ineffective matching methods and suffer from the Low Percentage Problem. The latter problem happens when a small number of service elements - in comparison to the total number of elements – are annotated in a given service. This research addresses the Web services annotation problem by developing a semi-automatic annotation approach that allows SWS developers to effectively and easily annotate their syntactic services. The proposed approach does not require application ontologies to model service semantics. Instead, a standard query template is used: This template is filled with data and semantics extracted from WSDL files in order to produce query instances. The input of the annotation approach is the WSDL file of a candidate service and a set of ontologies. The output is an annotated WSDL file. The proposed approach is composed of five phases: (1) Concept extraction; (2) concept filtering and query filling; (3) query execution; (4) results assessment; and (5) SAWSDL annotation. The query execution engine makes use of name-based and structural matching techniques. The name-based matching is carried out by CN-Match which is a novel matching method and tool that is developed and evaluated in this research. The proposed annotation approach is evaluated using a set of existing Web services and ontologies. Precision (P), Recall (R), F-Measure (F) and Percentage of annotated elements are used as evaluation metrics. The evaluation reveals that the proposed approach is effective since - in relation to manual results - accurate and almost complete annotation results are obtained. In addition, high percentage of annotated elements is achieved using the proposed approach because it makes use of effective ontology extension mechanisms

Brunel University Research Archive