20 research outputs found

    Subquery Allocation Problem and Heuristics for Secret Sharing Distributed Database System

    Get PDF
    We discuss query optimization in a secure distributed database system, called the Secret Sharing Distributed DataBase System (SSDDBS). We have to consider not only subquery allocations to distributed servers and data transfer on the network but also decoding distributed shared data. At first, we formulated the subquery allocation problem as a constraints satisfaction problem. Since the subquery allocation problem is NP-complete in general, it is not easy to obtain the optimal solution in practical time. Secondly, we proposed a heuristic evaluation function for the best-first search. We constructed an optimization model on an available optimization software, and evaluated the proposed method. The results showed that feasible solutions could be obtained by using the proposed method in practical time, and that quality of the obtained solutions was good

    Models and algorithms for parallel text retrieval

    Get PDF
    Cataloged from PDF version of article.In the last decade, search engines became an integral part of our lives. The current state-of-the-art in search engine technology relies on parallel text retrieval. Basically, a parallel text retrieval system is composed of three components: a crawler, an indexer, and a query processor. The crawler component aims to locate, fetch, and store the Web pages in a local document repository. The indexer component converts the stored, unstructured text into a queryable form, most often an inverted index. Finally, the query processing component performs the search over the indexed content. In this thesis, we present models and algorithms for efficient Web crawling and query processing. First, for parallel Web crawling, we propose a hybrid model that aims to minimize the communication overhead among the processors while balancing the number of page download requests and storage loads of processors. Second, we propose models for documentand term-based inverted index partitioning. In the document-based partitioning model, the number of disk accesses incurred during query processing is minimized while the posting storage is balanced. In the term-based partitioning model, the total amount of communication is minimized while, again, the posting storage is balanced. Finally, we develop and evaluate a large number of algorithms for query processing in ranking-based text retrieval systems. We test the proposed algorithms over our experimental parallel text retrieval system, Skynet, currently running on a 48-node PC cluster. In the thesis, we also discuss the design and implementation details of another, somewhat untraditional, grid-enabled search engine, SE4SEE. Among our practical work, we present the Harbinger text classification system, used in SE4SEE for Web page classification, and the K-PaToH hypergraph partitioning toolkit, to be used in the proposed models.Cambazoğlu, Berkant BarlaPh.D

    Ensuring compliance with data privacy and usage policies in online services

    Get PDF
    Online services collect and process a variety of sensitive personal data that is subject to complex privacy and usage policies. Complying with the policies is critical, often legally binding for service providers, but it is challenging as applications are prone to many disclosure threats. We present two compliance systems, Qapla and Pacer, that ensure efficient policy compliance in the face of direct and side-channel disclosures, respectively. Qapla prevents direct disclosures in database-backed applications (e.g., personnel management systems), which are subject to complex access control, data linking, and aggregation policies. Conventional methods inline policy checks with application code. Qapla instead specifies policies directly on the database and enforces them in a database adapter, thus separating compliance from the application code. Pacer prevents network side-channel leaks in cloud applications. A tenant’s secrets may leak via its network traffic shape, which can be observed at shared network links (e.g., network cards, switches). Pacer implements a cloaked tunnel abstraction, which hides secret-dependent variation in tenant’s traffic shape, but allows variations based on non-secret information, enabling secure and efficient use of network resources in the cloud. Both systems require modest development efforts, and incur moderate performance overheads, thus demonstrating their usability.Onlinedienste sammeln und verarbeiten eine Vielzahl sensibler persönlicher Daten, die komplexen Datenschutzrichtlinien unterliegen. Die Einhaltung dieser Richtlinien ist häufig rechtlich bindend für Dienstanbieter und gleichzeitig eine Herausforderung, da Fehler in Anwendungsprogrammen zu einer unabsichtlichen Offenlegung führen können. Wir präsentieren zwei Compliance-Systeme, Qapla und Pacer, die Richtlinien effizient einhalten und gegen direkte und indirekte Offenlegungen durch Seitenkanäle schützen. Qapla verhindert direkte Offenlegungen in datenbankgestützten Anwendungen. Herkömmliche Methoden binden Richtlinienprüfungen in Anwendungscode ein. Stattdessen gibt Qapla Richtlinien direkt in der Datenbank an und setzt sie in einem Datenbankadapter durch. Die Konformität ist somit vom Anwendungscode getrennt. Pacer verhindert Netzwerkseitenkanaloffenlegungen in Cloud-Anwendungen. Geheimnisse eines Nutzers können über die Form des Netzwerkverkehr offengelegt werden, die bei gemeinsam genutzten Netzwerkelementen (z. B. Netzwerkkarten, Switches) beobachtet werden kann. Pacer implementiert eine Tunnelabstraktion, die Geheimnisse im Netzwerkverkehr des Nutzers verbirgt, jedoch Variationen basier- end auf nicht geheimen Informationen zulässt und eine sichere und effiziente Nutzung der Netzwerkressourcen in der Cloud ermöglicht. Beide Systeme erfordern geringen Entwicklungsaufwand und verursachen einen moderaten Leistungsaufwand, wodurch ihre Nützlichkeit demonstriert wird

    TLAD 2010 Proceedings:8th international workshop on teaching, learning and assesment of databases (TLAD)

    Get PDF
    This is the eighth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2010), which once again is held as a workshop of BNCOD 2010 - the 27th International Information Systems Conference. TLAD 2010 is held on the 28th June at the beautiful Dudhope Castle at the Abertay University, just before BNCOD, and hopes to be just as successful as its predecessors.The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers.This year, the workshop includes an invited talk given by Richard Cooper (of the University of Glasgow) who will present a discussion and some results from the Database Disciplinary Commons which was held in the UK over the academic year. Due to the healthy number of high quality submissions this year, the workshop will also present seven peer reviewed papers, and six refereed poster papers. Of the seven presented papers, three will be presented as full papers and four as short papers. These papers and posters cover a number of themes, including: approaches to teaching databases, e.g. group centered and problem based learning; use of novel case studies, e.g. forensics and XML data; techniques and approaches for improving teaching and student learning processes; assessment techniques, e.g. peer review; methods for improving students abilities to develop database queries and develop E-R diagrams; and e-learning platforms for supporting teaching and learning

    TLAD 2010 Proceedings:8th international workshop on teaching, learning and assesment of databases (TLAD)

    Get PDF
    This is the eighth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2010), which once again is held as a workshop of BNCOD 2010 - the 27th International Information Systems Conference. TLAD 2010 is held on the 28th June at the beautiful Dudhope Castle at the Abertay University, just before BNCOD, and hopes to be just as successful as its predecessors.The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers.This year, the workshop includes an invited talk given by Richard Cooper (of the University of Glasgow) who will present a discussion and some results from the Database Disciplinary Commons which was held in the UK over the academic year. Due to the healthy number of high quality submissions this year, the workshop will also present seven peer reviewed papers, and six refereed poster papers. Of the seven presented papers, three will be presented as full papers and four as short papers. These papers and posters cover a number of themes, including: approaches to teaching databases, e.g. group centered and problem based learning; use of novel case studies, e.g. forensics and XML data; techniques and approaches for improving teaching and student learning processes; assessment techniques, e.g. peer review; methods for improving students abilities to develop database queries and develop E-R diagrams; and e-learning platforms for supporting teaching and learning

    Attribute-Level Versioning: A Relational Mechanism for Version Storage and Retrieval

    Get PDF
    Data analysts today have at their disposal a seemingly endless supply of data and repositories hence, datasets from which to draw. New datasets become available daily thus making the choice of which dataset to use difficult. Furthermore, traditional data analysis has been conducted using structured data repositories such as relational database management systems (RDBMS). These systems, by their nature and design, prohibit duplication for indexed collections forcing analysts to choose one value for each of the available attributes for an item in the collection. Often analysts discover two or more datasets with information about the same entity. When combining this data and transforming it into a form that is usable in an RDBMS, analysts are forced to deconflict the collisions and choose a single value for each duplicated attribute containing differing values. This deconfliction is the source of a considerable amount of guesswork and speculation on the part of the analyst in the absence of professional intuition. One must consider what is lost by discarding those alternative values. Are there relationships between the conflicting datasets that have meaning? Is each dataset presenting a different and valid view of the entity or are the alternate values erroneous? If so, which values are erroneous? Is there a historical significance of the variances? The analysis of modern datasets requires the use of specialized algorithms and storage and retrieval mechanisms to identify, deconflict, and assimilate variances of attributes for each entity encountered. These variances, or versions of attribute values, contribute meaning to the evolution and analysis of the entity and its relationship to other entities. A new, distinct storage and retrieval mechanism will enable analysts to efficiently store, analyze, and retrieve the attribute versions without unnecessary complexity or additional alterations of the original or derived dataset schemas. This paper presents technologies and innovations that assist data analysts in discovering meaning within their data and preserving all of the original data for every entity in the RDBMS

    An Introduction to Database Systems

    Get PDF
    This textbook introduces the basic concepts of database systems. These concepts are presented through numerous examples in modeling and design. The material in this book is geared to an introductory course in database systems offered at the junior or senior level of Computer Science. It could also be used in a first year graduate course in database systems, focusing on a selection of the advanced topics in the latter chapters

    Earth Observation Open Science and Innovation

    Get PDF
    geospatial analytics; social observatory; big earth data; open data; citizen science; open innovation; earth system science; crowdsourced geospatial data; citizen science; science in society; data scienc
    corecore