9,188 research outputs found
Rule-based information integration
In this report, we show the process of information integration. We specifically discuss the language used for integration. We show that integration consists of two phases, the schema mapping phase and the data integration phase. We formally define transformation rules, conversion, evolution and versioning. We further discuss the integration process from a data point of view
Composition and Inversion of Schema Mappings
In the recent years, a lot of attention has been paid to the development of
solid foundations for the composition and inversion of schema mappings. In this
paper, we review the proposals for the semantics of these crucial operators.
For each of these proposals, we concentrate on the three following problems:
the definition of the semantics of the operator, the language needed to express
the operator, and the algorithmic issues associated to the problem of computing
the operator. It should be pointed out that we primarily consider the
formalization of schema mappings introduced in the work on data exchange. In
particular, when studying the problem of computing the composition and inverse
of a schema mapping, we will be mostly interested in computing these operators
for mappings specified by source-to-target tuple-generating dependencies
Bioinformatics service reconciliation by heterogeneous schema transformation
This paper focuses on the problem of bioinformatics service reconciliation in a generic and scalable manner so as to enhance interoperability in a highly evolving field. Using XML as a common representation format, but also supporting existing flat-file representation formats, we propose an approach for the scalable semi-automatic reconciliation of services, possibly invoked from within a scientific workflows tool. Service reconciliation may use the AutoMed heterogeneous data integration system as an intermediary service, or may use AutoMed to produce services that mediate between services. We discuss the application of our approach for the reconciliation of services in an example bioinformatics workflow. The main contribution of this research is an architecture for the scalable reconciliation of bioinformatics services
Composition with Target Constraints
It is known that the composition of schema mappings, each specified by
source-to-target tgds (st-tgds), can be specified by a second-order tgd (SO
tgd). We consider the question of what happens when target constraints are
allowed. Specifically, we consider the question of specifying the composition
of standard schema mappings (those specified by st-tgds, target egds, and a
weakly acyclic set of target tgds). We show that SO tgds, even with the
assistance of arbitrary source constraints and target constraints, cannot
specify in general the composition of two standard schema mappings. Therefore,
we introduce source-to-target second-order dependencies (st-SO dependencies),
which are similar to SO tgds, but allow equations in the conclusion. We show
that st-SO dependencies (along with target egds and target tgds) are sufficient
to express the composition of every finite sequence of standard schema
mappings, and further, every st-SO dependency specifies such a composition. In
addition to this expressive power, we show that st-SO dependencies enjoy other
desirable properties. In particular, they have a polynomial-time chase that
generates a universal solution. This universal solution can be used to find the
certain answers to unions of conjunctive queries in polynomial time. It is easy
to show that the composition of an arbitrary number of standard schema mappings
is equivalent to the composition of only two standard schema mappings. We show
that surprisingly, the analogous result holds also for schema mappings
specified by just st-tgds (no target constraints). This is proven by showing
that every SO tgd is equivalent to an unnested SO tgd (one where there is no
nesting of function symbols). Similarly, we prove unnesting results for st-SO
dependencies, with the same types of consequences.Comment: This paper is an extended version of: M. Arenas, R. Fagin, and A.
Nash. Composition with Target Constraints. In 13th International Conference
on Database Theory (ICDT), pages 129-142, 201
Semantic processing of EHR data for clinical research
There is a growing need to semantically process and integrate clinical data
from different sources for clinical research. This paper presents an approach
to integrate EHRs from heterogeneous resources and generate integrated data in
different data formats or semantics to support various clinical research
applications. The proposed approach builds semantic data virtualization layers
on top of data sources, which generate data in the requested semantics or
formats on demand. This approach avoids upfront dumping to and synchronizing of
the data with various representations. Data from different EHR systems are
first mapped to RDF data with source semantics, and then converted to
representations with harmonized domain semantics where domain ontologies and
terminologies are used to improve reusability. It is also possible to further
convert data to application semantics and store the converted results in
clinical research databases, e.g. i2b2, OMOP, to support different clinical
research settings. Semantic conversions between different representations are
explicitly expressed using N3 rules and executed by an N3 Reasoner (EYE), which
can also generate proofs of the conversion processes. The solution presented in
this paper has been applied to real-world applications that process large scale
EHR data.Comment: Accepted for publication in Journal of Biomedical Informatics, 2015,
preprint versio
- …