682 research outputs found

    Data Sharing in P2P Systems

    Get PDF
    To appear in Springer's "Handbook of P2P Networking"In this chapter, we survey P2P data sharing systems. All along, we focus on the evolution from simple file-sharing systems, with limited functionalities, to Peer Data Management Systems (PDMS) that support advanced applications with more sophisticated data management techniques. Advanced P2P applications are dealing with semantically rich data (e.g. XML documents, relational tables), using a high-level SQL-like query language. We start our survey with an overview over the existing P2P network architectures, and the associated routing protocols. Then, we discuss data indexing techniques based on their distribution degree and the semantics they can capture from the underlying data. We also discuss schema management techniques which allow integrating heterogeneous data. We conclude by discussing the techniques proposed for processing complex queries (e.g. range and join queries). Complex query facilities are necessary for advanced applications which require a high level of search expressiveness. This last part shows the lack of querying techniques that allow for an approximate query answering

    スカイライン問合わせを利用した大規模データベースの情報選別

    Get PDF
    Conventional SQL queries take exact input and produce complete result set. However, with massive increase in data volume in different applications, the large result sets returned by traditional SQL queries are not well suited for the users to take effective decisions. Therefore, there is an increasing interest in queries like top-k queries and skyline queries those produce a more concise result set. Top-k queries rely on the scores of the objects to evaluate the usefulness of the objects. In this type of queries, users require to define their own scoring function by combining their interests. Based on the user defined scoring function, the system sorts the objects by their scores and outputs the top-k objects in the ranking list as the result. However, defining a scoring function by the users is a major draw of the top-k queries as in the large data sets where there are many conflicting criteria exist, it is very difficult for the users to define the scoring functions by themselves.……広島大学(Hiroshima University)博士(工学)Engineeringdoctora

    Processing Rank-Aware Queries in Schema-Based P2P Systems

    Get PDF
    Effiziente Anfragebearbeitung in Datenintegrationssystemen sowie in P2P-Systemen ist bereits seit einigen Jahren ein Aspekt aktueller Forschung. Konventionelle Datenintegrationssysteme bestehen aus mehreren Datenquellen mit ggf. unterschiedlichen Schemata, sind hierarchisch aufgebaut und besitzen eine zentrale Komponente: den Mediator, der ein globales Schema verwaltet. Anfragen an das System werden auf diesem globalen Schema formuliert und vom Mediator bearbeitet, indem relevante Daten von den Datenquellen transparent für den Benutzer angefragt werden. Aufbauend auf diesen Systemen entstanden schließlich Peer-Daten-Management-Systeme (PDMSs) bzw. schemabasierte P2P-Systeme. An einem PDMS teilnehmende Knoten (Peers) können einerseits als Mediatoren agieren andererseits jedoch ebenso als Datenquellen. Darüber hinaus sind diese Peers autonom und können das Netzwerk jederzeit verlassen bzw. betreten. Die potentiell riesige Datenmenge, die in einem derartigen Netzwerk verfügbar ist, führt zudem in der Regel zu sehr großen Anfrageergebnissen, die nur schwer zu bewältigen sind. Daher ist das Bestimmen einer vollständigen Ergebnismenge in vielen Fällen äußerst aufwändig oder sogar unmöglich. In diesen Fällen bietet sich die Anwendung von Top-N- und Skyline-Operatoren, ggf. in Verbindung mit Approximationstechniken, an, da diese Operatoren lediglich diejenigen Datensätze als Ergebnis ausgeben, die aufgrund nutzerdefinierter Ranking-Funktionen am relevantesten für den Benutzer sind. Da durch die Anwendung dieser Operatoren zumeist nur ein kleiner Teil des Ergebnisses tatsächlich dem Benutzer ausgegeben wird, muss nicht zwangsläufig die vollständige Ergebnismenge berechnet werden sondern nur der Teil, der tatsächlich relevant für das Endergebnis ist. Die Frage ist nun, wie man derartige Anfragen durch die Ausnutzung dieser Erkenntnis effizient in PDMSs bearbeiten kann. Die Beantwortung dieser Frage ist das Hauptanliegen dieser Dissertation. Zur Lösung dieser Problemstellung stellen wir effiziente Anfragebearbeitungsstrategien in PDMSs vor, die die charakteristischen Eigenschaften ranking-basierter Operatoren sowie Approximationstechniken ausnutzen. Peers werden dabei sowohl auf Schema- als auch auf Datenebene hinsichtlich der Relevanz ihrer Daten geprüft und dementsprechend in die Anfragebearbeitung einbezogen oder ausgeschlossen. Durch die Heterogenität der Peers werden Techniken zum Umschreiben einer Anfrage von einem Schema in ein anderes nötig. Da existierende Techniken zum Umschreiben von Anfragen zumeist nur konjunktive Anfragen betrachten, stellen wir eine Erweiterung dieser Techniken vor, die Anfragen mit ranking-basierten Anfrageoperatoren berücksichtigt. Da PDMSs dynamische Systeme sind und teilnehmende Peers jederzeit ihre Daten ändern können, betrachten wir in dieser Dissertation nicht nur wie Routing-Indexe verwendet werden, um die Relevanz eines Peers auf Datenebene zu bestimmen, sondern auch wie sie gepflegt werden können. Schließlich stellen wir SmurfPDMS (SiMUlating enviRonment For Peer Data Management Systems) vor, ein System, welches im Rahmen dieser Dissertation entwickelt wurde und alle vorgestellten Techniken implementiert.In recent years, there has been considerable research with respect to query processing in data integration and P2P systems. Conventional data integration systems consist of multiple sources with possibly different schemas, adhere to a hierarchical structure, and have a central component (mediator) that manages a global schema. Queries are formulated against this global schema and the mediator processes them by retrieving relevant data from the sources transparently to the user. Arising from these systems, eventually Peer Data Management Systems (PDMSs), or schema-based P2P systems respectively, have attracted attention. Peers participating in a PDMS can act both as a mediator and as a data source, are autonomous, and might leave or join the network at will. Due to these reasons peers often hold incomplete or erroneous data sets and mappings. The possibly huge amount of data available in such a network often results in large query result sets that are hard to manage. Due to these reasons, retrieving the complete result set is in most cases difficult or even impossible. Applying rank-aware query operators such as top-N and skyline, possibly in conjunction with approximation techniques, is a remedy to these problems as these operators select only those result records that are most relevant to the user. Being aware that in most cases only a small fraction of the complete result set is actually output to the user, retrieving the complete set before evaluating such operators is obviously inefficient. Therefore, the questions we want to answer in this dissertation are how to compute such queries in PDMSs and how to do that efficiently. We propose strategies for efficient query processing in PDMSs that exploit the characteristics of rank-aware queries and optionally apply approximation techniques. A peer's relevance is determined on two levels: on schema-level and on data-level. According to its relevance a peer is either considered for query processing or not. Because of heterogeneity queries need to be rewritten, enabling cooperation between peers that use different schemas. As existing query rewriting techniques mostly consider conjunctive queries only, we present an extension that allows for rewriting queries involving rank-aware query operators. As PDMSs are dynamic systems and peers might update their local data, this dissertation addresses not only the problem of considering such structures within a query processing strategy but also the problem of keeping them up-to-date. Finally, we provide a system-level evaluation by presenting SmurfPDMS (SiMUlating enviRonment For Peer Data Management Systems) -- a system created in the context of this dissertation implementing all presented techniques

    Multimedia

    Get PDF
    The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications

    IDEAS-1997-2021-Final-Programs

    Get PDF
    This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

    Efficient Processing of Ranking Queries in Novel Applications

    Get PDF
    Ranking queries, which return only a subset of results matching a user query, have been studied extensively in the past decade due to their importance in a wide range of applications. In this thesis, we study ranking queries in novel environments and settings where they have not been considered so far. With the advancements in sensor technologies, these small devices are today present in all corners of human life. Millions of them are deployed in various places and are sending data on a continuous basis. These sensors which before mainly monitored environmental phenomena or production chains, have now found their way into our daily lives as well; health monitoring being a plausible example of how much we rely on continuous observation of measurements. As the Web technology evolves and facilitates data stream transmissions, sensors do not remain the sole producers of data in form of streams. The Web 2.0 has escalated the production of user-generated content which appear in form of annotated posts in a Weblog (blog), pictures and videos, or small textual snippets reflecting the current activity or status of users and can be regarded as natural items of a temporal stream. A major part of this thesis is devoted to developing novel methods which assist in keeping track of this ever increasing flow of information with continuous monitoring of ranking queries over them, particularly when traditional approaches fail to meet the newly raised requirements. We consider the ranking problem when the information flow is not synchronized among its sources. This is a recurring situation, since sensors are run by different organizations, measure moving entities, or are simply represented by users which are inherently not synchronizable. Our methods are in particular designed for handling unsynchronized streams, calculating an object's score based on both its currently observed contribution to the registered queries as well as the contribution it might have in future. While this uncertainty in score calculation causes linear growth in the space necessary for providing exact results, we are able to define criteria which allows for evicting unpromising objects as early as possible. We also leverage statistical properties that reflect the correlation between multiple streams to predict the future to provide better bounds for the best possible contribution of an object, consequently limiting the necessary storage dramatically. To achieve this, we make use of small statistical synopses that are periodically refreshed during runtime. Furthermore, we consider user generated queries in the context of Web 2.0 applications which aim at filtering data streams in forms of textual documents, based on personal interests. In this case, the dimensionality of the data, the large cardinality of the subscribed queries, as well as the desire for consuming recent information, raise new challenges. We develop new approaches which efficiently filter the information and provide real-time updates to the user subscribed queries. Our methods rely on a novel ordering of user queries in traditional inverted lists which allows the system to effectively prune those queries for which a new piece of information is of no interest. Finally, we investigate high quality search in user generated content in Web 2.0 applications in form of images or videos. These resources are inherently dispersed all over the globe, therefore can be best managed in a purely distributed peer-to-peer network which eliminates single points of failure. Search in such a huge repository of high dimensional data involves evaluating ranking queries in form of nearest neighbor queries. Therefore, we study ranking queries in high dimensional spaces, where the index of the objects is maintained in a purely distributed fashion. Our solution meets the two major requirements of a viable solution in distributing the index and evaluating ranking queries: the underlying peer-to-peer network remains load balanced, and efficient query evaluation is feasible as similar objects are assigned to nearby peers

    Recommendation Support for Multi-Attribute Databases

    Get PDF
    corecore