3 research outputs found
Semantic Service Integration & Metropolitan Medical Network
A Thesis Submitted to the Faculty of Indiana University by Nikeshbhai Patel in Partial Fulfillment of the Requirements for the Degree of Master of Science, August 2005Medical health partners use heterogeneous data formats, legacy software
and strictly licensed vocabularies which make it hard to integrate their data and
work. Integration of services and data are the two main necessities. The current
architecture used provides partial solution by providing one-to-one mapping
wrappers. This thesis provides discussion on difficulties encountered by the coexistence
of so many medical vocabularies and efforts to provide interoperation.
Also other problems are listed which hinders the interoperation between health
partners.
Solution is proposed for some of these problems by forming semantic
network based on multi-agent technology. Service composition and integration
stages are shown to develop future advance health services. Middle layer is
implemented which performs integration and provides common platform for
sharing information, using global ontology and local domain ontology. Inferencebased
matchmaking algorithm proposed in this thesis helps in mapping and
achieving our goal. Six different filtering techniques are selected and used in
matchmaking algorithm. Analysis of these filtering techniques is provided to
understand the integration process. In the ending section an abstract idea is
proposed on basis of network architecture and matchmaking algorithm to
develop Open Terminological System
Flexible query processing of SPARQL queries
SPARQL is the predominant language for querying RDF data, which is the standard
model for representing web data and more specifically Linked Open Data (a
collection of heterogeneous connected data). Datasets in RDF form can be hard to
query by a user if she does not have a full knowledge of the structure of the dataset.
Moreover, many datasets in Linked Data are often extracted from actual web page
content which might lead to incomplete or inaccurate data.
We extend SPARQL 1.1 with two operators, APPROX and RELAX, previously
introduced in the context of regular path queries. Using these operators we are able
to support
exible querying over the property path queries of SPARQL 1.1. We call
this new language SPARQLAR.
Using SPARQLAR users are able to query RDF data without fully knowing the
structure of a dataset. APPROX and RELAX encapsulate different aspects of query flexibility: finding different answers and finding more answers, respectively. This
means that users can access complex and heterogeneous datasets without the need
to know precisely how the data is structured.
One of the open problems we address is how to combine the APPROX and
RELAX operators with a pragmatic language such as SPARQL. We also devise an
implementation of a system that evaluates SPARQLAR queries in order to study the
performance of the new language.
We begin by defining the semantics of SPARQLAR and the complexity of query
evaluation. We then present a query processing technique for evaluating SPARQLAR
queries based on a rewriting algorithm and prove its soundness and completeness.
During the evaluation of a SPARQLAR query we generate multiple SPARQL 1.1
queries that are evaluated against the dataset. Each such query will generate answers
with a cost that indicates their distance with respect to the exact form of the original
SPARQLAR query.
Our prototype implementation incorporates three optimisation techniques that
aim to enhance query execution performance: the first optimisation is a pre-computation
technique that caches the answers of parts of the queries generated by the rewriting
algorithm. These answers will then be reused to avoid the re-execution of those sub-queries. The second optimisation utilises a summary of the dataset to discard
queries that it is known will not return any answer. The third optimisation technique
uses the query containment concept to discard queries whose answers would
be returned by another query at the same or lower cost.
We conclude by conducting a performance study of the system on three different
RDF datasets: LUBM (Lehigh University Benchmark), YAGO and DBpedia
Flexible query processing of SPARQL queries
SPARQL is the predominant language for querying RDF data, which is the standard
model for representing web data and more specifically Linked Open Data (a
collection of heterogeneous connected data). Datasets in RDF form can be hard to
query by a user if she does not have a full knowledge of the structure of the dataset.
Moreover, many datasets in Linked Data are often extracted from actual web page
content which might lead to incomplete or inaccurate data.
We extend SPARQL 1.1 with two operators, APPROX and RELAX, previously
introduced in the context of regular path queries. Using these operators we are able
to support
exible querying over the property path queries of SPARQL 1.1. We call
this new language SPARQLAR.
Using SPARQLAR users are able to query RDF data without fully knowing the
structure of a dataset. APPROX and RELAX encapsulate different aspects of query flexibility: finding different answers and finding more answers, respectively. This
means that users can access complex and heterogeneous datasets without the need
to know precisely how the data is structured.
One of the open problems we address is how to combine the APPROX and
RELAX operators with a pragmatic language such as SPARQL. We also devise an
implementation of a system that evaluates SPARQLAR queries in order to study the
performance of the new language.
We begin by defining the semantics of SPARQLAR and the complexity of query
evaluation. We then present a query processing technique for evaluating SPARQLAR
queries based on a rewriting algorithm and prove its soundness and completeness.
During the evaluation of a SPARQLAR query we generate multiple SPARQL 1.1
queries that are evaluated against the dataset. Each such query will generate answers
with a cost that indicates their distance with respect to the exact form of the original
SPARQLAR query.
Our prototype implementation incorporates three optimisation techniques that
aim to enhance query execution performance: the first optimisation is a pre-computation
technique that caches the answers of parts of the queries generated by the rewriting
algorithm. These answers will then be reused to avoid the re-execution of those sub-queries. The second optimisation utilises a summary of the dataset to discard
queries that it is known will not return any answer. The third optimisation technique
uses the query containment concept to discard queries whose answers would
be returned by another query at the same or lower cost.
We conclude by conducting a performance study of the system on three different
RDF datasets: LUBM (Lehigh University Benchmark), YAGO and DBpedia