28 research outputs found
Peer Data Management
Peer Data Management (PDM) deals with the management of structured data in unstructured peer-to-peer (P2P) networks. Each peer can store data locally and define relationships between its data and the data provided by other peers. Queries posed to any of the peers are then answered by also considering the information implied by those mappings.
The overall goal of PDM is to provide semantically well-founded integration and exchange of heterogeneous and distributed data sources. Unlike traditional data integration systems, peer data management systems (PDMSs) thereby allow for full autonomy of each member and need no central coordinator. The promise of such systems is to provide flexible data integration and exchange at low setup and maintenance costs.
However, building such systems raises many challenges. Beside the obvious scalability problem, choosing an appropriate semantics that can deal with arbitrary, even cyclic topologies, data inconsistencies, or updates while at the same time allowing for tractable reasoning has been an area of active research in the last decade. In this survey we provide an overview of the different approaches suggested in the literature to tackle these problems, focusing on appropriate semantics for query answering and data exchange rather than on implementation specific problems
Ontology Alignment: An annotated Bibliography
Ontology mapping, alignment, and translation has been an active research component of the general research on semantic integration and interoperability. In our talk, we gave our own classification of different topics in this research. We talked about types of heterogeneity between ontologies, various mapping representations, classified methods for discovering methods both between ontology concepts and data, and talked about various tasks where mappings are used. In this extended abstract of our talk, we provide an annotated bibliography for this area of research, giving readers brief pointers on representative papers in each of the topics mentioned above. We did not attempt to compile a comprehensive bibliography and hence the list in this abstract is necessarily incomplete. Rather, we tried to sketch a map of the field, with some specific reference to help interested readers in their exploration of the work to-date
Processing Rank-Aware Queries in Schema-Based P2P Systems
Effiziente Anfragebearbeitung in Datenintegrationssystemen sowie in
P2P-Systemen ist bereits seit einigen Jahren ein Aspekt aktueller
Forschung. Konventionelle Datenintegrationssysteme bestehen aus mehreren
Datenquellen mit ggf. unterschiedlichen Schemata, sind hierarchisch
aufgebaut und besitzen eine zentrale Komponente: den Mediator, der ein
globales Schema verwaltet. Anfragen an das System werden auf diesem
globalen Schema formuliert und vom Mediator bearbeitet, indem relevante
Daten von den Datenquellen transparent für den Benutzer angefragt werden.
Aufbauend auf diesen Systemen entstanden schließlich
Peer-Daten-Management-Systeme (PDMSs) bzw. schemabasierte P2P-Systeme. An
einem PDMS teilnehmende Knoten (Peers) können einerseits als Mediatoren
agieren andererseits jedoch ebenso als Datenquellen. Darüber hinaus sind
diese Peers autonom und können das Netzwerk jederzeit verlassen bzw.
betreten. Die potentiell riesige Datenmenge, die in einem derartigen
Netzwerk verfügbar ist, führt zudem in der Regel zu sehr großen
Anfrageergebnissen, die nur schwer zu bewältigen sind. Daher ist das
Bestimmen einer vollständigen Ergebnismenge in vielen Fällen äußerst
aufwändig oder sogar unmöglich. In diesen Fällen bietet sich die
Anwendung von Top-N- und Skyline-Operatoren, ggf. in Verbindung mit
Approximationstechniken, an, da diese Operatoren lediglich diejenigen
Datensätze als Ergebnis ausgeben, die aufgrund nutzerdefinierter
Ranking-Funktionen am relevantesten für den Benutzer sind. Da durch die
Anwendung dieser Operatoren zumeist nur ein kleiner Teil des Ergebnisses
tatsächlich dem Benutzer ausgegeben wird, muss nicht zwangsläufig die
vollständige Ergebnismenge berechnet werden sondern nur der Teil, der
tatsächlich relevant für das Endergebnis ist.
Die Frage ist nun, wie man derartige Anfragen durch die Ausnutzung dieser
Erkenntnis effizient in PDMSs bearbeiten kann. Die Beantwortung dieser
Frage ist das Hauptanliegen dieser Dissertation. Zur Lösung dieser
Problemstellung stellen wir effiziente Anfragebearbeitungsstrategien in
PDMSs vor, die die charakteristischen Eigenschaften ranking-basierter
Operatoren sowie Approximationstechniken ausnutzen. Peers werden dabei
sowohl auf Schema- als auch auf Datenebene hinsichtlich der Relevanz ihrer
Daten geprüft und dementsprechend in die Anfragebearbeitung einbezogen
oder ausgeschlossen. Durch die Heterogenität der Peers werden Techniken
zum Umschreiben einer Anfrage von einem Schema in ein anderes nötig. Da
existierende Techniken zum Umschreiben von Anfragen zumeist nur konjunktive
Anfragen betrachten, stellen wir eine Erweiterung dieser Techniken vor, die
Anfragen mit ranking-basierten Anfrageoperatoren berücksichtigt. Da PDMSs
dynamische Systeme sind und teilnehmende Peers jederzeit ihre Daten ändern
können, betrachten wir in dieser Dissertation nicht nur wie Routing-Indexe
verwendet werden, um die Relevanz eines Peers auf Datenebene zu bestimmen,
sondern auch wie sie gepflegt werden können. Schließlich stellen wir
SmurfPDMS (SiMUlating enviRonment For Peer Data Management Systems) vor,
ein System, welches im Rahmen dieser Dissertation entwickelt wurde und alle
vorgestellten Techniken implementiert.In recent years, there has been considerable research with respect to query
processing in data integration and P2P systems. Conventional data
integration systems consist of multiple sources with possibly different
schemas, adhere to a hierarchical structure, and have a central component
(mediator) that manages a global schema. Queries are formulated against
this global schema and the mediator processes them by retrieving relevant
data from the sources transparently to the user. Arising from these
systems, eventually Peer Data Management Systems (PDMSs), or schema-based
P2P systems respectively, have attracted attention. Peers participating in
a PDMS can act both as a mediator and as a data source, are autonomous, and
might leave or join the network at will. Due to these reasons peers often
hold incomplete or erroneous data sets and mappings. The possibly huge
amount of data available in such a network often results in large query
result sets that are hard to manage. Due to these reasons, retrieving the
complete result set is in most cases difficult or even impossible. Applying
rank-aware query operators such as top-N and skyline, possibly in
conjunction with approximation techniques, is a remedy to these problems as
these operators select only those result records that are most relevant to
the user. Being aware that in most cases only a small fraction of the
complete result set is actually output to the user, retrieving the complete
set before evaluating such operators is obviously inefficient.
Therefore, the questions we want to answer in this dissertation are how to
compute such queries in PDMSs and how to do that efficiently. We propose
strategies for efficient query processing in PDMSs that exploit the
characteristics of rank-aware queries and optionally apply approximation
techniques. A peer's relevance is determined on two levels: on schema-level
and on data-level. According to its relevance a peer is either considered
for query processing or not. Because of heterogeneity queries need to be
rewritten, enabling cooperation between peers that use different schemas.
As existing query rewriting techniques mostly consider conjunctive queries
only, we present an extension that allows for rewriting queries involving
rank-aware query operators. As PDMSs are dynamic systems and peers might
update their local data, this dissertation addresses not only the problem
of considering such structures within a query processing strategy but also
the problem of keeping them up-to-date. Finally, we provide a system-level
evaluation by presenting SmurfPDMS (SiMUlating enviRonment For Peer Data
Management Systems) -- a system created in the context of this dissertation
implementing all presented techniques
Collaborative Workspaces within Distributed Virtual Environments
In warfare, be it a training simulation or actual combat, a commander\u27s time is one of the most valuable and fleeting resources of a military unit. Thus, it is natural for a unit to have a plethora of personnel to analyze and filter information to the decision-maker. This dynamic exchange of ideas between analyst and commander is currently not available within the distributed interactive simulation (DIS) community. This lack of exchange limits the usefulness of the DIS experience to the commander and his troops. This thesis addresses the commander\u27s isolation problem through the integration of a collaborative workspace within AFIT\u27s Synthetic BattleBridge (SBB) as a technique to improve situational awareness. The SBB\u27s Collaborative Workspace enhances battlespace awareness through CSCW (computer supported cooperative work) enabling communication technologies. The SBB\u27s Collaborative Workspace allows the user to interact with other SBB users through the transmission and reception of public bulletins, private email, real-time chat sessions, shared viewpoints, shared video, and shared annotations to the virtual environment. Collaborative communication between SBB occurs through the use of standard and experimental DIS-compliant protocol data units. The SBB\u27s Collaborative Workspace gives the battlespace commander the widest range of communication options available within a DIS virtual environment today
Distributed Reasoning in a Peer-to-Peer Setting: Application to the Semantic Web
In a peer-to-peer inference system, each peer can reason locally but can also
solicit some of its acquaintances, which are peers sharing part of its
vocabulary. In this paper, we consider peer-to-peer inference systems in which
the local theory of each peer is a set of propositional clauses defined upon a
local vocabulary. An important characteristic of peer-to-peer inference systems
is that the global theory (the union of all peer theories) is not known (as
opposed to partition-based reasoning systems). The main contribution of this
paper is to provide the first consequence finding algorithm in a peer-to-peer
setting: DeCA. It is anytime and computes consequences gradually from the
solicited peer to peers that are more and more distant. We exhibit a sufficient
condition on the acquaintance graph of the peer-to-peer inference system for
guaranteeing the completeness of this algorithm. Another important contribution
is to apply this general distributed reasoning setting to the setting of the
Semantic Web through the Somewhere semantic peer-to-peer data management
system. The last contribution of this paper is to provide an experimental
analysis of the scalability of the peer-to-peer infrastructure that we propose,
on large networks of 1000 peers
Query Processing in a P2P Network of Taxonomy-based Information Sources
In this study we address the problem of answering queries over a peer-to-peer system of taxonomy-based sources. A taxonomy states subsumption relationships between negation-free DNF formulas on terms and negation-free conjunctions of terms. To the end of laying the foundations of our study, we first consider the centralized case, deriving the complexity of the decision problem and of query evaluation. We conclude by presenting an algorithm that is efficient in data complexity and is based on hypergraphs. We then move to the distributed case, and introduce a logical model of a network of taxonomy-based sources. On such network, a distributed version of the centralized algorithm is then presented, based on a message passing paradigm, and its correctness is proved. We finally discuss optimization issues, and relate our work to the literature