738 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Diverse intrusion-tolerant database replication

    Get PDF
    Tese de mestrado em Segurança Informática, apresentada à Universidade de Lisboa, através da Faculdade de Ciências, 2012A combinação da replicação de bases de dados com mecanismos de tolerância a falhas bizantinas ainda é um campo de pesquisa recente com projetos a surgirem nestes últimos anos. No entanto, a maioria dos protótipos desenvolvidos ou se focam em problemas muito específicos, ou são baseados em suposições que são muito difíceis de garantir numa situação do mundo real, como por exemplo ter um componente confiável. Nesta tese apresentamos DivDB, um sistema de replicação de bases de dados diverso e tolerante a intrusões. O sistema está desenhado para ser incorporado dentro de um driver JDBC, o qual irá abstrair o utilizador de qualquer complexidade adicional dos mecanismos de tolerância a falhas bizantinas. O DivDB baseia-se na combinação de máquinas de estados replicadas com um algoritmo de processamento de transações, a fim de melhorar o seu desempenho. Para além disso, no DivDB é possível ligar cada réplica a um sistema de gestão de base de dados diferente, proporcionando assim diversidade ao sistema. Propusemos, resolvemos e implementamos três problemas em aberto, existentes na conceção de um sistema de gestão de base de dados replicado: autenticação, processamento de transações e transferência de estado. Estas características torna o DivDB exclusivo, pois é o único sistema que compreende essas três funcionalidades implementadas num sistema de base de dados replicado. A nossa implementação é suficientemente robusta para funcionar de forma segura num simples sistema de processamento de transações online. Para testar isso, utilizou-se o TPC-C, uma ferramenta de benchmarking que simula esse tipo de ambientes.The combination of database replication with Byzantine fault tolerance mechanism is a recent field of research with projects appearing in the last few years. However most of the prototypes produced are either focused on very specific problems or are based on assumptions that are very hard to accomplish in a real world scenario (e.g., trusted component). In this thesis we present DivDB, a Diverse Intrusion-Tolerant Database Replication system. It is designed to be incorporated inside a JDBC driver so that it abstracts the user from any added complexity from Byzantine Fault Tolerance mechanism. DivDB is based in State Machine Replication combined with a transaction handling algorithm in order to enhance its performance. DivDB is also able to have different database systems connected at each replica, enabling to achieve diversity. We proposed, solved and implemented three open problems in the design of a replicated database system: authentication, transaction handling and state-transfer. This makes DivDB unique since it is the only system that comprises all these three features in a single database replication system. Our implementation is robust enough to operate reliably in a simple Online Transaction Processing system. To test that, we used TPC-C, a benchmark tool that simulates that kind of environments

    A review of experiences with reliable multicast

    Get PDF

    Scalable Reliable SD Erlang Design

    Get PDF
    This technical report presents the design of Scalable Distributed (SD) Erlang: a set of language-level changes that aims to enable Distributed Erlang to scale for server applications on commodity hardware with at most 100,000 cores. We cover a number of aspects, specifically anticipated architecture, anticipated failures, scalable data structures, and scalable computation. Other two components that guided us in the design of SD Erlang are design principles and typical Erlang applications. The design principles summarise the type of modifications we aim to allow Erlang scalability. Erlang exemplars help us to identify the main Erlang scalability issues and hypothetically validate the SD Erlang design

    A comparative study of structured and un-structured remote data access in distributed computing systems

    Get PDF
    Recently, the use of distributed computing systems has been growing rapidly due to the result of cheap and advanced microelectronic technology. In addition to the decrease in hardware costs, the tremendous development in machine to machine communication interfaces, especially in local area networking, also favours the use of distributed systems. Distributed systems often require remote access to data stored at different sites. Generally, two models of access to remote data storage exist: the un structured and structured models. In the former, data is simply stored as row of bytes, whereas in the latter, data is stored along with the associated access codes. The objective of this thesis is to compare these two models and hence determines the tradeoffs of each model. First of all, an extended review of the field of distributed data access is provided which addressing key issues such as the basic design principles of distributed computing systems, the notions of abstract data types, data inheritance, data type system and data persistence. Secondly, a distributed system is implemented using the persistent programming language PS-algol and the high level language C in conjunction with the remote procedure call facilities available in Unix(^1) 4.2 BSD operating system. This distributed system makes extensive use of Unix's software tools and hence it is called DCSUNIX for Distributed Computing System on UNIX. Thirdly, two specific applications which employ the implemented system will be given so that a comparison can be made between the two remote data access models mentioned above. Finally, the implemented system is compared with the criteria established earlier in the thesis. keywords: abstract data types, class, database management, data persistence, information hiding, inheritance, object oriented programming, programming languages, remote procedure calls, transparency, and type checking

    AsterixDB: A Scalable, Open Source BDMS

    Full text link
    AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements

    Single-click to data insights: transaction replication and deployment automation made simple for the cloud age

    Get PDF
    In this report we present out initial work on making the MonetDB column-store analytical database ready for Cloud deployment. As we stand in the new space between research and industry we have tried to combine approaches from both worlds. We provide details how we utilize modern technologies and tools for automating building of virtual machine image for Cloud, datacentre and desktop use. We also explain our solution to asynchronous transaction replication MonetDB. The report concludes with how this all ties together with our efforts to make MonetDB ready for the age where high-performance data analytics is available in a single-click
    corecore