Search CORE

11 research outputs found

Probabilistic Schema Covering

Author: Duong Chi Thang
Nguyen Quoc Viet Hung
Nguyen Thanh Toan
Phan Thanh Cong
Stantic Bela
Publication venue
Publication date: 16/04/2018
Field of study

Schema covering is the process of representing large and complex schemas by easily comprehensible common objects. This task is done by identifying a set of common concepts from a repository called concept repository and generating a cover to describe the schema by the concepts. Traditional schema covering approach has two shortcomings: it does not model the uncertainty in the covering process, and it requires user to state an ambiguity constraint which is hard to define. We remedy this problem by incorporating probabilistic model into schema covering to generate probabilistic schema cover. The integrated probabilities not only enhance the coverage of cover results but also eliminate the need of defining the ambiguity parameter. Both probabilistic schema covering and traditional schema covering run on top of a concept repository. Experiments on real-datasets show the competitive performance of our approach

Infoscience - École polytechnique fédérale de Lausanne

Making Sense of Top-K Matchings. A Unified Match Graph for Schema Matching

Author: Gal Avigdor
Levy Eliezer
Miklós Zoltán
Nguyen Quoc Viet Hung
Sagi Tomer
Shafran Victor
Weidlich Matthias
Publication venue
Publication date: 30/04/2012
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Analyzing the Emergence of Semantic Agreement among Rational Agents

Author: Aberer Karl
Khorsandi Siavash
Miklós Zoltán
Papaioannou Thanasis G.
Vakili Golnaz
Publication venue: IEEE Computer Society
Publication date: 08/07/2011
Field of study

Todays complex online applications often require the interaction of multiple services that potentially belong to different business entities. Interoperability is a core element of such an environment, yet not a straightforward one. In this paper, we argue that the emergence of interoperability is an economic process among rational agents and, although interoperability can be mutually beneficial for the involved parties, it is also costly and may fail to emerge. As a sample scenario, we consider the emergence of semantic interoperability among rational service agents in the service-oriented architectures (SOA) and analyze their individual economic incentives with respect to utility, risk and cost. We model this process as a positive-sum game and study its equilibrium and evolutionary dynamics. According to our analysis, which is also experimentally verified, certain conditions on the communication cost, the cost of technological adaptation, the expected mutual benefit from interoperability as well as the expected loss from isolation drive the process

Infoscience - École polytechnique fédérale de Lausanne

A classification of data quality assessment and improvement methods

Author: Borek A
Oberhofer M
Woodall P
Publication venue: International Journal of Information Quality
Publication date: 01/01/2014
Field of study

Data quality (DQ) assessment and improvement in larger information systems would often not be feasible without using suitable “DQ methods”, which are algorithms that can be automatically executed by computer systems to detect and/or correct problems in datasets. Currently, these methods are already essential, and they will be of even greater importance as the quantity of data in organisational systems grows. This paper provides a review of existing methods for both DQ assessment and improvement and classifies them according to the DQ problem and problem context. Six gaps have been identified in the classification, where no current DQ methods exist, and these show where new methods are required as a guide for future research and DQ tool development.This is the accepted manuscript. It's currently embargoed pending publication by Inderscience

Crossref

Nottingham Trent Institutional Repository (IRep)

Apollo (Cambridge)

Privacy-Preserving Schema Reuse

Author: Aberer Karl
Do Son Thanh
Nguyen Thanh Tam
Nguyen Quoc Viet Hung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

As the number of schema repositories grows rapidly and several web-based platforms exist to support publishing schemas, \emph{schema reuse} becomes a new trend. Schema reuse is a methodology that allows users to create new schemas by copying and adapting existing ones. This methodology supports to reduce not only the effort of designing new schemas but also the heterogeneity between them. One of the biggest barriers of schema reuse is about privacy concerns that discourage schema owners from contributing their schemas. Addressing this problem, we develop a framework that enables privacy-preserving schema reuse. Our framework supports the contributors to define their own protection policies in the form of \emph{privacy constraints}. Instead of showing original schemas, the framework returns an \emph{anonymized schema} with maximal \emph{utility} while satisfying these privacy constraints. To validate our approach, we empirically show the efficiency of different heuristics, the correctness of the proposed utility function, the computation time, as well as the trade-off between utility and privacy

Infoscience - École polytechnique fédérale de Lausanne

XML Matchers: approaches and challenges

Author: Agreste Santa
De Meo Pasquale
Ferrara Emilio
Ursino Domenico
Publication venue: 'Elsevier BV'
Publication date: 10/07/2014
Field of study

Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure

arXiv.org e-Print Archive

IRIS UniversitÃ Politecnica delle Marche

AMC - A Framework for Modelling and Comparing Matching Systems as Matching Processes

Author: Erhard Rahm
Eric Peukert
Julian Eberius
Publication venue
Publication date: 01/01/2011
Field of study

Abstract—We present the Auto Mapping Core (AMC), a new framework that supports fast construction and tuning of schema matching approaches for specific domains such as ontology alignment, model matching or database-schema matching. Distinctive features of our framework are new visualisation techniques for modelling matching processes, stepwise tuning of parameters, intermediate result analysis and performanceoriented rewrites. Furthermore, existing matchers can be plugged into the framework to comparatively evaluate them in a common environment. This allows deeper analysis of behaviour and shortcomings in existing complex matching systems. I

CiteSeerX

Crossref