577 research outputs found
Advanced Databases
This Grants Collection for Advanced Databases was created under a Round Two ALG Textbook Transformation Grant.
Affordable Learning Georgia Grants Collections are intended to provide faculty with the frameworks to quickly implement or revise the same materials as a Textbook Transformation Grants team, along with the aims and lessons learned from project teams during the implementation process.
Documents are in .pdf format, with a separate .docx (Word) version available for download. Each collection contains the following materials: Linked Syllabus Initial Proposal Final Reporthttps://oer.galileo.usg.edu/compsci-collections/1005/thumbnail.jp
High performance data processing
Dissertação de mestrado em Informatics EngeneeringÀ medida que as aplicações atingem uma maior quantidade de utilizadores, precisam de processar uma crescente quantidade de pedidos. Para além disso, precisam de muitas vezes satisfazer pedidos de utilizadores de diferentes partes do globo, onde
as latências de rede têm um impacto significativo no desempenho em instalações
monolíticas. Portanto, distribuição é uma solução muito procurada para melhorar a
performance das camadas aplicacional e de dados. Contudo, distribuir dados não é
uma tarefa simples se pretendemos assegurar uma forte consistência. Isto leva a que
muitos sistemas de base de dados dependam de protocolos de sincronização pesados,
como two-phase commit, consenso distribuído, bloqueamento distribuído, entre outros,
enquanto que outros sistemas dependem em consistência fraca, não viável para alguns
casos de uso.
Esta tese apresenta o design, implementação e avaliação de duas soluções que
têm como objetivo reduzir o impacto de assegurar garantias de forte consistência
em sistemas de base de dados, especialmente aqueles distribuídos pelo globo. A
primeira é o Primary Semi-Primary, uma arquitetura de base de dados distribuída
com total replicação que permite que as réplicas evoluam independentemente, para
evitar que os clientes precisem de esperar que escritas precedentes que não geram
conflitos sejam propagadas. Apesar das réplicas poderem processar tanto leituras
como escritas, melhorando a escalabilidade, o sistema continua a oferecer garantias de
consistência forte, através do envio da certificação de transações para um nó central.
O seu design é independente de modelos de dados, mas a sua implementação pode
tirar partido do controlo de concorrência nativo oferecido por algumas base de dados,
como é mostrado na implementação usando PostgreSQL e o seu Snapshot Isolation.
Os resultados apresentam várias vantagens tanto em ambientes locais como globais. A
segunda solução são os Multi-Record Values, uma técnica que particiona dinâmicamente
valores numéricos em múltiplos registros, permitindo que escritas concorrentes possam
executar com uma baixa probabilidade de colisão, reduzindo a taxa de abortos e/ou
contenção na adquirição de locks. Garantias de limites inferiores, exigido por objetos
como saldos bancários ou inventários, são assegurados por esta estratégia, ao contrário
de muitas outras alternativas. O seu design é também indiferente do modelo de dados,
sendo que as suas vantagens podem ser encontradas em sistemas SQL e NoSQL, bem
como distribuídos ou centralizados, tal como apresentado na secção de avaliação.As applications reach an wider audience that ever before, they must process larger and larger amounts of requests. In addition, they often must be able to serve users all over the globe, where network latencies have a significant negative impact on
monolithic deployments. Therefore, distribution is a well sought-after solution to
improve performance of both applicational and database layers. However, distributing
data is not an easy task if we want to ensure strong consistency guarantees. This leads
many databases systems to rely on expensive synchronization controls protocols such
as two-phase commit, distributed consensus, distributed locking, among others, while
other systems rely on weak consistency, unfeasible for some use cases.
This thesis presents the design, implementation and evaluation of two solutions
aimed at reducing the impact of ensuring strong consistency guarantees on database
systems, especially geo-distributed ones. The first is the Primary Semi-Primary, a full replication distributed database architecture that allows different replicas to evolve
independently, to avoid that clients wait for preceding non-conflicting updates. Al though replicas can process both reads and writes, improving scalability, the system
still ensures strong consistency guarantees, by relaying transactions’ certifications
to a central node. Its design is independent of the underlying data model, but its
implementation can take advantage of the native concurrency control offered by some
systems, as is exemplified by an implementation using PostgreSQL and its Snapshot
Isolation. The results present several advantages in both throughput and response time,
when comparing to other alternative architectures, in both local and geo-distributed
environments. The second solution is the Multi-Record Values, a technique that dynami cally partitions numeric values into multiple records, allowing concurrent writes to
execute with low conflict probability, reducing abort rate and/or locking contention.
Lower limit guarantees, required by objects such as balances or stocks, are ensure by
this strategy, unlike many other similar alternatives. Its design is also data model
agnostic, given its advantages can be found in both SQL and NoSQL systems, as well
as both centralized and distributed database, as presented in the evaluation section
Priority-Driven Differentiated Performance for NoSQL Database-As-a-Service
Designing data stores for native Cloud Computing services brings a number of challenges, especially if the Cloud Provider wants to offer database services capable of controlling the response time for specific customers. These requests may come from heterogeneous data-driven applications with conflicting responsiveness requirements. For instance, a batch processing workload does not require the same level of responsiveness as a time-sensitive one. Their coexistence may interfere with the responsiveness of the time-sensitive workload, such as online video gaming, virtual reality, and cloud-based machine learning. This paper presents a modification to the popular MongoDB NoSQL database to enable differentiated per-user/request performance on a priority basis by leveraging CPU scheduling and synchronization mechanisms available within the Operating System. This is achieved with minimally invasive changes to the source code and without affecting the performance and behavior of the database when the new feature is not in use. The proposed extension has been integrated with the access-control model of MongoDB for secure and controlled access to the new capability. Extensive experimentation with realistic workloads demonstrates how the proposed solution is able to reduce the response times for high-priority users/requests, with respect to lower-priority ones, in scenarios with mixed-priority clients accessing the data store
Recommended from our members
Data trading based on seller preferences within blockchain smart contract
This thesis was submitted for the award of Master of Philosophy and was awarded by Brunel University LondonOnline data trading has not focused on the necessary control of data selling
by the data seller preferences (DSP) using blockchain technology. This
research aims to explore the DSP using smart contract over blockchain
within the domain of online data trading. Data trading has been carried out
for several decades, but cutting-edge technologies and cloud services have
grown dramatically worldwide. Industries are gaining benefits from
accessing the data that enabled them to perform mission-critical tasks by
performing data analysis on the massively available data and getting a
higher return on investment (ROI).
This research aims to make online data trading possible only if the buyer
can satisfy the conditions predefined by the seller. For example, DSP can
restrict the data purchase if the participating buyer is doing business from
a specific geographic location, or it can further restrict a particular type and
size of business. So, data trading will be controlled by smart contract
validation based on DSP hence the novel DSP artefact has been achieved
and evaluated via a personal blockchain Ganache, which is always set to
automatics mining. Even though the DSP Dapp artefact has been explored
with a limited scope of seller preferences and data volume, future
researchers may evolve the DSP Dapp artefact framework to achieve
complex seller preferences such as ethical selling (e.g., green credentials).
The smart contract serves as an automated contract depending on DSP, between seller and buyer, without the involvement of any broker or third
party.
After the first chapter's introduction has set up the context for chapter two
to review the literature, present the research question, and set the aims
and objectives. Chapter three selected the DSR methodology for this
research and analysed the requirements to set the building block for
chapters four and five. Chapters four and five fulfilled objective two by
designing and developing the DSP artefact using a smart contract to control
data trading. Chapter 6 validated the DSP trading system to confirm the
novelty of this research, and finally, chapter 7 summarised the contribution
and future research.
The research proposes a new approach to online data trading that controls
the data selling depending on DSP within smart contract over blockchain
and opens new doors for the researchers for future work in this area
A general framework for blockchain analytics
Modern cryptocurrencies exploit decentralised blockchains to record a public and unalterable history of transactions. Besides transactions, further information is stored for different, and often undisclosed, purposes, making the blockchains a rich and increasingly growing source of valuable information, in part of difficult interpretation. Many data analytics have been developed, mostly based on specifically designed and ad-hoc engineered approaches.We propose a general-purpose framework, seamlessly supporting data analytics on both Bitcoin and Ethereum — currently the two most prominent cryptocurrencies. Such a framework allows us to integrate relevant blockchain data with data from other sources, and to organise them in a database, either SQL or NoSQL. Our framework is released as an open-source Scala library. We illustrate the distinguishing features of our approach on a set of significant use cases, which allow us to empirically compare ours to other competing proposals, and evaluate the impact of the database choice on scalability
Ethereum blockchain as a decentralized and autonomous key server: storing and extracting public keys through smart contracts
Ethereum is an open-source, public, blockchain-based distributed computing platform featuring smart contract functionality. It provides a decentralized Turing-complete virtual machine which can execute scripts using an international network of public nodes.
The purpose of this thesis is to build a decentralized and autonomous key server using Ethereum smart contracts to store and retrieve information. We did an overall introduction of Bitcoin and Ethereum to provide a background of the study. We then analyzed the current problems of key discovery with traditional servers and web-of-trust. We designed, built and tested an application that can verify contact cards (email address, PGP public key, domain address, Facebook account), link them to an Ethereum address and store them on a public contract running on the Ethereum blockchain. Finally we made an analysis of the costs and limitations of such solution and proposed some future improvements. The results show that Ethereum is a good choice for storing public keys, thanks to the immutability and irreversibility of the blockchain
Metadata-driven Data Migration from Object-relational Database to NoSQL Document-oriented Database
The object-relational databases (ORDB) are powerful for managing complex data, but they suffer from problems of scalability and managing large-scale data. Therefore, the importance of the migration of ORDB to NoSQL derives from the fact that the large volume of data can be handled in the best way with high scalability and availability. This paper reports our metadata-driven approach for the migration of the ORDB to document-oriented NoSQL database. Our data migration approach involves three major stages: a preprocessing stage, to extract the data and the schema's components, a processing stage, to provide the data transformation, and a post-processing stage, to store the migrated data as BSON documents. The approach maintains the benefits of Oracle ORDB in NoSQL MongoDB by supporting integrity constraint checking. To validate our approach, we developed OR2DOD (Object Relational to Document-Oriented Databases) system, and the experimental results confirm the effectiveness of our proposal
Accelerated Data Delivery Architecture
This paper introduces the Accelerated Data Delivery Architecture (ADDA). ADDA establishes a framework to distribute transactional data and control consistency to achieve fast access to data, distributed scalability and non-blocking concurrency control by using a clean declarative interface. It is designed to be used with web-based business applications. This framework uses a combination of traditional Relational Database Management System (RDBMS) combined with a distributed Not Only SQL (NoSQL) database and a browser-based database. It uses a single physical and conceptual database schema designed for a standard RDBMS driven application. The design allows the architect to assign consistency levels to entities which determine the storage location and query methodology. The implementation of these levels is flexible and requires no database schema changes in order to change the level of an entity. Also, a data leasing system to enforce concurrency control in a non-blocking manner is employed for critical data items. The system also ensures that all data is available for query from the RDBMS server. This means that the system can have the performance advantages of a DDBMS system and the ACID qualities of a single-site RDBMS system without the complex design considerations of traditional DDBMS systems
- …