577 research outputs found

    Advanced Databases

    Get PDF
    This Grants Collection for Advanced Databases was created under a Round Two ALG Textbook Transformation Grant. Affordable Learning Georgia Grants Collections are intended to provide faculty with the frameworks to quickly implement or revise the same materials as a Textbook Transformation Grants team, along with the aims and lessons learned from project teams during the implementation process. Documents are in .pdf format, with a separate .docx (Word) version available for download. Each collection contains the following materials: Linked Syllabus Initial Proposal Final Reporthttps://oer.galileo.usg.edu/compsci-collections/1005/thumbnail.jp

    High performance data processing

    Get PDF
    Dissertação de mestrado em Informatics EngeneeringÀ medida que as aplicações atingem uma maior quantidade de utilizadores, precisam de processar uma crescente quantidade de pedidos. Para além disso, precisam de muitas vezes satisfazer pedidos de utilizadores de diferentes partes do globo, onde as latências de rede têm um impacto significativo no desempenho em instalações monolíticas. Portanto, distribuição é uma solução muito procurada para melhorar a performance das camadas aplicacional e de dados. Contudo, distribuir dados não é uma tarefa simples se pretendemos assegurar uma forte consistência. Isto leva a que muitos sistemas de base de dados dependam de protocolos de sincronização pesados, como two-phase commit, consenso distribuído, bloqueamento distribuído, entre outros, enquanto que outros sistemas dependem em consistência fraca, não viável para alguns casos de uso. Esta tese apresenta o design, implementação e avaliação de duas soluções que têm como objetivo reduzir o impacto de assegurar garantias de forte consistência em sistemas de base de dados, especialmente aqueles distribuídos pelo globo. A primeira é o Primary Semi-Primary, uma arquitetura de base de dados distribuída com total replicação que permite que as réplicas evoluam independentemente, para evitar que os clientes precisem de esperar que escritas precedentes que não geram conflitos sejam propagadas. Apesar das réplicas poderem processar tanto leituras como escritas, melhorando a escalabilidade, o sistema continua a oferecer garantias de consistência forte, através do envio da certificação de transações para um nó central. O seu design é independente de modelos de dados, mas a sua implementação pode tirar partido do controlo de concorrência nativo oferecido por algumas base de dados, como é mostrado na implementação usando PostgreSQL e o seu Snapshot Isolation. Os resultados apresentam várias vantagens tanto em ambientes locais como globais. A segunda solução são os Multi-Record Values, uma técnica que particiona dinâmicamente valores numéricos em múltiplos registros, permitindo que escritas concorrentes possam executar com uma baixa probabilidade de colisão, reduzindo a taxa de abortos e/ou contenção na adquirição de locks. Garantias de limites inferiores, exigido por objetos como saldos bancários ou inventários, são assegurados por esta estratégia, ao contrário de muitas outras alternativas. O seu design é também indiferente do modelo de dados, sendo que as suas vantagens podem ser encontradas em sistemas SQL e NoSQL, bem como distribuídos ou centralizados, tal como apresentado na secção de avaliação.As applications reach an wider audience that ever before, they must process larger and larger amounts of requests. In addition, they often must be able to serve users all over the globe, where network latencies have a significant negative impact on monolithic deployments. Therefore, distribution is a well sought-after solution to improve performance of both applicational and database layers. However, distributing data is not an easy task if we want to ensure strong consistency guarantees. This leads many databases systems to rely on expensive synchronization controls protocols such as two-phase commit, distributed consensus, distributed locking, among others, while other systems rely on weak consistency, unfeasible for some use cases. This thesis presents the design, implementation and evaluation of two solutions aimed at reducing the impact of ensuring strong consistency guarantees on database systems, especially geo-distributed ones. The first is the Primary Semi-Primary, a full replication distributed database architecture that allows different replicas to evolve independently, to avoid that clients wait for preceding non-conflicting updates. Al though replicas can process both reads and writes, improving scalability, the system still ensures strong consistency guarantees, by relaying transactions’ certifications to a central node. Its design is independent of the underlying data model, but its implementation can take advantage of the native concurrency control offered by some systems, as is exemplified by an implementation using PostgreSQL and its Snapshot Isolation. The results present several advantages in both throughput and response time, when comparing to other alternative architectures, in both local and geo-distributed environments. The second solution is the Multi-Record Values, a technique that dynami cally partitions numeric values into multiple records, allowing concurrent writes to execute with low conflict probability, reducing abort rate and/or locking contention. Lower limit guarantees, required by objects such as balances or stocks, are ensure by this strategy, unlike many other similar alternatives. Its design is also data model agnostic, given its advantages can be found in both SQL and NoSQL systems, as well as both centralized and distributed database, as presented in the evaluation section

    Priority-Driven Differentiated Performance for NoSQL Database-As-a-Service

    Get PDF
    Designing data stores for native Cloud Computing services brings a number of challenges, especially if the Cloud Provider wants to offer database services capable of controlling the response time for specific customers. These requests may come from heterogeneous data-driven applications with conflicting responsiveness requirements. For instance, a batch processing workload does not require the same level of responsiveness as a time-sensitive one. Their coexistence may interfere with the responsiveness of the time-sensitive workload, such as online video gaming, virtual reality, and cloud-based machine learning. This paper presents a modification to the popular MongoDB NoSQL database to enable differentiated per-user/request performance on a priority basis by leveraging CPU scheduling and synchronization mechanisms available within the Operating System. This is achieved with minimally invasive changes to the source code and without affecting the performance and behavior of the database when the new feature is not in use. The proposed extension has been integrated with the access-control model of MongoDB for secure and controlled access to the new capability. Extensive experimentation with realistic workloads demonstrates how the proposed solution is able to reduce the response times for high-priority users/requests, with respect to lower-priority ones, in scenarios with mixed-priority clients accessing the data store

    A general framework for blockchain analytics

    Get PDF
    Modern cryptocurrencies exploit decentralised blockchains to record a public and unalterable history of transactions. Besides transactions, further information is stored for different, and often undisclosed, purposes, making the blockchains a rich and increasingly growing source of valuable information, in part of difficult interpretation. Many data analytics have been developed, mostly based on specifically designed and ad-hoc engineered approaches.We propose a general-purpose framework, seamlessly supporting data analytics on both Bitcoin and Ethereum — currently the two most prominent cryptocurrencies. Such a framework allows us to integrate relevant blockchain data with data from other sources, and to organise them in a database, either SQL or NoSQL. Our framework is released as an open-source Scala library. We illustrate the distinguishing features of our approach on a set of significant use cases, which allow us to empirically compare ours to other competing proposals, and evaluate the impact of the database choice on scalability

    Ethereum blockchain as a decentralized and autonomous key server: storing and extracting public keys through smart contracts

    Get PDF
    Ethereum is an open-source, public, blockchain-based distributed computing platform featuring smart contract functionality. It provides a decentralized Turing-complete virtual machine which can execute scripts using an international network of public nodes. The purpose of this thesis is to build a decentralized and autonomous key server using Ethereum smart contracts to store and retrieve information. We did an overall introduction of Bitcoin and Ethereum to provide a background of the study. We then analyzed the current problems of key discovery with traditional servers and web-of-trust. We designed, built and tested an application that can verify contact cards (email address, PGP public key, domain address, Facebook account), link them to an Ethereum address and store them on a public contract running on the Ethereum blockchain. Finally we made an analysis of the costs and limitations of such solution and proposed some future improvements. The results show that Ethereum is a good choice for storing public keys, thanks to the immutability and irreversibility of the blockchain

    Metadata-driven Data Migration from Object-relational Database to NoSQL Document-oriented Database

    Get PDF
    The object-relational databases (ORDB) are powerful for managing complex data, but they suffer from problems of scalability and managing large-scale data. Therefore, the importance of the migration of ORDB to NoSQL derives from the fact that the large volume of data can be handled in the best way with high scalability and availability. This paper reports our metadata-driven approach for the migration of the ORDB to document-oriented NoSQL database. Our data migration approach involves three major stages: a preprocessing stage, to extract the data and the schema's components, a processing stage, to provide the data transformation, and a post-processing stage, to store the migrated data as BSON documents. The approach maintains the benefits of Oracle ORDB in NoSQL MongoDB by supporting integrity constraint checking. To validate our approach, we developed OR2DOD (Object Relational to Document-Oriented Databases) system, and the experimental results confirm the effectiveness of our proposal

    Accelerated Data Delivery Architecture

    Get PDF
    This paper introduces the Accelerated Data Delivery Architecture (ADDA). ADDA establishes a framework to distribute transactional data and control consistency to achieve fast access to data, distributed scalability and non-blocking concurrency control by using a clean declarative interface. It is designed to be used with web-based business applications. This framework uses a combination of traditional Relational Database Management System (RDBMS) combined with a distributed Not Only SQL (NoSQL) database and a browser-based database. It uses a single physical and conceptual database schema designed for a standard RDBMS driven application. The design allows the architect to assign consistency levels to entities which determine the storage location and query methodology. The implementation of these levels is flexible and requires no database schema changes in order to change the level of an entity. Also, a data leasing system to enforce concurrency control in a non-blocking manner is employed for critical data items. The system also ensures that all data is available for query from the RDBMS server. This means that the system can have the performance advantages of a DDBMS system and the ACID qualities of a single-site RDBMS system without the complex design considerations of traditional DDBMS systems
    corecore