140 research outputs found

    Rancang Bangun Non Relational Database pada Enterprise Resource Planing Retail yang Berorientasikan Multitenancy dengan Shared Distributed Database

    Get PDF
    Tugas akhir ini merancang dan membangun sebuah aplikasi Enterprise Resource Planning (ERP) pada studi kasus Retail dengan menggunakan Non Relational Database (NoSQL). Aplikasi ini berorientasikan multitenancy dengan shared distributed database. Dalam aplikasi ini setiap tenant memiliki skema yang berbeda-beda sesuai kebutuhannya. Tantangan utama tugas akhir ini adalah membandingkan basis data NoSQL dengan basis data SQL dalam hal performa, fleksibilitas, dan skalabilitas. Setelah terbukti mana basis data yang lebih unggul, maka akan diterapkan pada aplikasi ERP Retail dengan tujuan aplikasi memiliki performa yang baik, fleksibel dalam penyimpanan data, juga mendukung dalam penyimpanan data yang terus berkembang seiring berjalannya waktu. Dalam uji coba yang telah dilakukan, basis data NoSQL terbukti memiliki kecepatan lebih unggul untuk penyimpanan data dalam bentuk CRUD daripada SQL. Juga mempunyai struktur penyimpanan data yang fleksibel dikarenakan bentuk datanya berupa BSON (Binary JSON). Dan memiliki kemampuan untuk skalabel dengan cara sharding. Maka dalam kasus ini basis data NoSQL akan lebih baik untuk diterapkan pada ERP Retail. ======================================================================================================================== This final project is to design and implementation an application of Enterprise Resource Planning (ERP in Retail study case using Non Relational Database (NoSQL). This application is multitenancy oriented with shared distributed database. In this application every tenat has different scheme as per its requirement. The main challenge of this final task is to compare the NoSQL database with SQL in terms of performance, flexibility, and scalability. After being proven which the right database, it will be applied to Retail ERP applications with the aim of the application has a good performance, flexible in data storage, also supports in data storage that continues to grow over time. In the trials that has been done, NoSQL database proved to have superior speed for data storage in the case of CRUD rather than SQL. Also has a flexible structure data storage because the model of data in the form of BSON (Binary JSON). And has the ability to be scalable by sharding. So in this case NoSQL database will be better to apply to ERP Retail

    PUBLIC BLOCKCHAIN SCALABILITY: ADVANCEMENTS, CHALLENGES AND THE FUTURE

    Get PDF
    In the last decade, blockchain has emerged as one of the most influential innovations in software architecture and technology. Ideally, blockchains are designed to be architecturally and politically decentralized, similar to the Internet. But recently, public and permissionless blockchains such as Bitcoin and Ethereum have faced stumbling blocks in the form of scalability. Both Bitcoin and Ethereum process fewer than 20 transactions per second, which is significantly lower than their centralized counterpart such as VISA that can process approximately 1,700 transactions per second. In realizing this hindrance in the wide range adoption of blockchains for building advanced and large scalable systems, the blockchain community has proposed first- and second-layer scaling solutions including Segregated Witness (Segwit), Sharding, and two-way pegged sidechains. Although these proposals are innovative, they still suffer from the blockchain trilemma of scalability, security, and decentralization. Moreover, at this time, little is known or discussed regarding factors related to design choices, feasibility, limitations and other issues in adopting the various first- and second-layer scaling solutions in public and permissionless blockchains. Hence, this thesis provides the first comprehensive review of the state-of-the-art first- and second-layer scaling solutions for public and permissionless blockchains, identifying current advancements and analyzing their impact from various viewpoints, highlighting their limitations and discussing possible remedies for the overall improvement of the blockchain domain

    Modern data analytics in the cloud era

    Get PDF
    Cloud Computing ist die dominante Technologie des letzten Jahrzehnts. Die Benutzerfreundlichkeit der verwalteten Umgebung in Kombination mit einer nahezu unbegrenzten Menge an Ressourcen und einem nutzungsabhängigen Preismodell ermöglicht eine schnelle und kosteneffiziente Projektrealisierung für ein breites Nutzerspektrum. Cloud Computing verändert auch die Art und Weise wie Software entwickelt, bereitgestellt und genutzt wird. Diese Arbeit konzentriert sich auf Datenbanksysteme, die in der Cloud-Umgebung eingesetzt werden. Wir identifizieren drei Hauptinteraktionspunkte der Datenbank-Engine mit der Umgebung, die veränderte Anforderungen im Vergleich zu traditionellen On-Premise-Data-Warehouse-Lösungen aufweisen. Der erste Interaktionspunkt ist die Interaktion mit elastischen Ressourcen. Systeme in der Cloud sollten Elastizität unterstützen, um den Lastanforderungen zu entsprechen und dabei kosteneffizient zu sein. Wir stellen einen elastischen Skalierungsmechanismus für verteilte Datenbank-Engines vor, kombiniert mit einem Partitionsmanager, der einen Lastausgleich bietet und gleichzeitig die Neuzuweisung von Partitionen im Falle einer elastischen Skalierung minimiert. Darüber hinaus führen wir eine Strategie zum initialen Befüllen von Puffern ein, die es ermöglicht, skalierte Ressourcen unmittelbar nach der Skalierung auszunutzen. Cloudbasierte Systeme sind von fast überall aus zugänglich und verfügbar. Daten werden häufig von zahlreichen Endpunkten aus eingespeist, was sich von ETL-Pipelines in einer herkömmlichen Data-Warehouse-Lösung unterscheidet. Viele Benutzer verzichten auf die Definition von strikten Schemaanforderungen, um Transaktionsabbrüche aufgrund von Konflikten zu vermeiden oder um den Ladeprozess von Daten zu beschleunigen. Wir führen das Konzept der PatchIndexe ein, die die Definition von unscharfen Constraints ermöglichen. PatchIndexe verwalten Ausnahmen zu diesen Constraints, machen sie für die Optimierung und Ausführung von Anfragen nutzbar und bieten effiziente Unterstützung bei Datenaktualisierungen. Das Konzept kann auf beliebige Constraints angewendet werden und wir geben Beispiele für unscharfe Eindeutigkeits- und Sortierconstraints. Darüber hinaus zeigen wir, wie PatchIndexe genutzt werden können, um fortgeschrittene Constraints wie eine unscharfe Multi-Key-Partitionierung zu definieren, die eine robuste Anfrageperformance bei Workloads mit unterschiedlichen Partitionsanforderungen bietet. Der dritte Interaktionspunkt ist die Nutzerinteraktion. Datengetriebene Anwendungen haben sich in den letzten Jahren verändert. Neben den traditionellen SQL-Anfragen für Business Intelligence sind heute auch datenwissenschaftliche Anwendungen von großer Bedeutung. In diesen Fällen fungiert das Datenbanksystem oft nur als Datenlieferant, während der Rechenaufwand in dedizierten Data-Science- oder Machine-Learning-Umgebungen stattfindet. Wir verfolgen das Ziel, fortgeschrittene Analysen in Richtung der Datenbank-Engine zu verlagern und stellen das Grizzly-Framework als DataFrame-zu-SQL-Transpiler vor. Auf dieser Grundlage identifizieren wir benutzerdefinierte Funktionen (UDFs) und maschinelles Lernen (ML) als wichtige Aufgaben, die von einer tieferen Integration in die Datenbank-Engine profitieren würden. Daher untersuchen und bewerten wir Ansätze für die datenbankinterne Ausführung von Python-UDFs und datenbankinterne ML-Inferenz.Cloud computing has been the groundbreaking technology of the last decade. The ease-of-use of the managed environment in combination with nearly infinite amount of resources and a pay-per-use price model enables fast and cost-efficient project realization for a broad range of users. Cloud computing also changes the way software is designed, deployed and used. This thesis focuses on database systems deployed in the cloud environment. We identify three major interaction points of the database engine with the environment that show changed requirements compared to traditional on-premise data warehouse solutions. First, software is deployed on elastic resources. Consequently, systems should support elasticity in order to match workload requirements and be cost-effective. We present an elastic scaling mechanism for distributed database engines, combined with a partition manager that provides load balancing while minimizing partition reassignments in the case of elastic scaling. Furthermore we introduce a buffer pre-heating strategy that allows to mitigate a cold start after scaling and leads to an immediate performance benefit using scaling. Second, cloud based systems are accessible and available from nearly everywhere. Consequently, data is frequently ingested from numerous endpoints, which differs from bulk loads or ETL pipelines in a traditional data warehouse solution. Many users do not define database constraints in order to avoid transaction aborts due to conflicts or to speed up data ingestion. To mitigate this issue we introduce the concept of PatchIndexes, which allow the definition of approximate constraints. PatchIndexes maintain exceptions to constraints, make them usable in query optimization and execution and offer efficient update support. The concept can be applied to arbitrary constraints and we provide examples of approximate uniqueness and approximate sorting constraints. Moreover, we show how PatchIndexes can be exploited to define advanced constraints like an approximate multi-key partitioning, which offers robust query performance over workloads with different partition key requirements. Third, data-centric workloads changed over the last decade. Besides traditional SQL workloads for business intelligence, data science workloads are of significant importance nowadays. For these cases the database system might only act as data delivery, while the computational effort takes place in data science or machine learning (ML) environments. As this workflow has several drawbacks, we follow the goal of pushing advanced analytics towards the database engine and introduce the Grizzly framework as a DataFrame-to-SQL transpiler. Based on this we identify user-defined functions (UDFs) and machine learning inference as important tasks that would benefit from a deeper engine integration and investigate approaches to push these operations towards the database engine

    A patient agent controlled customized blockchain based framework for internet of things

    Get PDF
    Although Blockchain implementations have emerged as revolutionary technologies for various industrial applications including cryptocurrencies, they have not been widely deployed to store data streaming from sensors to remote servers in architectures known as Internet of Things. New Blockchain for the Internet of Things models promise secure solutions for eHealth, smart cities, and other applications. These models pave the way for continuous monitoring of patient’s physiological signs with wearable sensors to augment traditional medical practice without recourse to storing data with a trusted authority. However, existing Blockchain algorithms cannot accommodate the huge volumes, security, and privacy requirements of health data. In this thesis, our first contribution is an End-to-End secure eHealth architecture that introduces an intelligent Patient Centric Agent. The Patient Centric Agent executing on dedicated hardware manages the storage and access of streams of sensors generated health data, into a customized Blockchain and other less secure repositories. As IoT devices cannot host Blockchain technology due to their limited memory, power, and computational resources, the Patient Centric Agent coordinates and communicates with a private customized Blockchain on behalf of the wearable devices. While the adoption of a Patient Centric Agent offers solutions for addressing continuous monitoring of patients’ health, dealing with storage, data privacy and network security issues, the architecture is vulnerable to Denial of Services(DoS) and single point of failure attacks. To address this issue, we advance a second contribution; a decentralised eHealth system in which the Patient Centric Agent is replicated at three levels: Sensing Layer, NEAR Processing Layer and FAR Processing Layer. The functionalities of the Patient Centric Agent are customized to manage the tasks of the three levels. Simulations confirm protection of the architecture against DoS attacks. Few patients require all their health data to be stored in Blockchain repositories but instead need to select an appropriate storage medium for each chunk of data by matching their personal needs and preferences with features of candidate storage mediums. Motivated by this context, we advance third contribution; a recommendation model for health data storage that can accommodate patient preferences and make storage decisions rapidly, in real-time, even with streamed data. The mapping between health data features and characteristics of each repository is learned using machine learning. The Blockchain’s capacity to make transactions and store records without central oversight enables its application for IoT networks outside health such as underwater IoT networks where the unattended nature of the nodes threatens their security and privacy. However, underwater IoT differs from ground IoT as acoustics signals are the communication media leading to high propagation delays, high error rates exacerbated by turbulent water currents. Our fourth contribution is a customized Blockchain leveraged framework with the model of Patient-Centric Agent renamed as Smart Agent for securely monitoring underwater IoT. Finally, the smart Agent has been investigated in developing an IoT smart home or cities monitoring framework. The key algorithms underpinning to each contribution have been implemented and analysed using simulators.Doctor of Philosoph

    FPGA-based Query Acceleration for Non-relational Databases

    Get PDF
    Database management systems are an integral part of today’s everyday life. Trends like smart applications, the internet of things, and business and social networks require applications to deal efficiently with data in various data models close to the underlying domain. Therefore, non-relational database systems provide a wide variety of database models, like graphs and documents. However, current non-relational database systems face performance challenges due to the end of Dennard scaling and therefore performance scaling of CPUs. In the meanwhile, FPGAs have gained traction as accelerators for data management. Our goal is to tackle the performance challenges of non-relational database systems with FPGA acceleration and, at the same time, address design challenges of FPGA acceleration itself. Therefore, we split this thesis up into two main lines of work: graph processing and flexible data processing. Because of the lacking benchmark practices for graph processing accelerators, we propose GraphSim. GraphSim is able to reproduce runtimes of these accelerators based on a memory access model of the approach. Through this simulation environment, we extract three performance-critical accelerator properties: asynchronous graph processing, compressed graph data structure, and multi-channel memory. Since these accelerator properties have not been combined in one system, we propose GraphScale. GraphScale is the first scalable, asynchronous graph processing accelerator working on a compressed graph and outperforms all state-of-the-art graph processing accelerators. Focusing on accelerator flexibility, we propose PipeJSON as the first FPGA-based JSON parser for arbitrary JSON documents. PipeJSON is able to achieve parsing at line-speed, outperforming the fastest, vectorized parsers for CPUs. Lastly, we propose the subgraph query processing accelerator GraphMatch which outperforms state-of-the-art CPU systems for subgraph query processing and is able to flexibly switch queries during runtime in a matter of clock cycles

    Towards Autonomous and Efficient Machine Learning Systems

    Get PDF
    Computation-intensive machine learning (ML) applications are becoming some of the most popular workloads running atop cloud infrastructure. While training ML applications, practitioners face the challenge of tuning various system-level parameters, such as the number of training nodes, communication topology during training, instance type, and the number of serving nodes, to meet the SLO requirements for bursty workload during the inference. Similarly, efficient resource utilization is another key challenge in cloud computing. This dissertation proposes high-performing and efficient ML systems to speed up training time and inference tasks while enabling automated and robust system management.To train an ML model in a distributed fashion we focus on strategies to mitigate the resource provisioning overhead and improve the training speed without impacting the model accuracy. More specifically, a system for autonomic and adaptive scheduling is built atop serverless computing that dynamically optimizes deployment and resource scaling for ML training tasks for cost-effectiveness and fast training. Similarly, a dynamic client selection framework is developed to address the stragglers problem caused by resource heterogeneity, data quality, and data quantity in a privacy-preserving Federated Learning (FL) environment without impacting the model accuracy.For serving bursty ML workloads we focus on developing highly scalable and adaptive strategies to serve the dynamically changing workload in a cost-effective manner in an autonomic fashion. We develop a framework that optimizes batching parameters on the fly using a lightweight profiler and an analytical model. We also devise strategies for serving ML workloads of varying sizes, leading to non-deterministic service time in a cost-effective manner. More specifically, we develop an SLO-aware framework that first analyzes the request size variations and workload variation to estimate the number of serving functions and intelligently route requests to multiple serving functions. Finally, resource utilization of burstable instances is optimized to benefit the cloud provider and end-user through a careful orchestration of resources (i.e., CPU, network, and I/O) using an analytical model and lightweight profiling, while complying with a user-defined SLO

    Survey of NoSQL Database Engines for Big Data

    Get PDF
    Cloud computing is a paradigm shift that provides computing over Internet. With growing outreach of Internet in the lives of people, everyday large volume of data is generated from different sources such as cellphones, electronic gadgets, e-commerce transactions, social media, and sensors. Eventually, the size of generated data is so large that it is also referred as Big Data. Companies harvesting business opportunities in digital world need to invest their budget and time to scale their IT infrastructure for the expansion of their businesses. The traditional relational databases have limitations in scaling for large Internet scale distributed systems. To store rapidly expanding high volume Big Data efficiently, NoSQL data stores have been developed as an alternative solution to the relational databases. The purpose of this thesis is to provide a holistic overview of different NoSQL data stores. We cover different fundamental principles supporting the NoSQL data store development. Many NoSQL data stores have specific and exclusive features and properties. They also differ in their architecture, data model, storage system, and fault tolerance abilities. This thesis describes different aspects of few NoSQL data stores in detail. The thesis also covers the experiments to evaluate and compare the performance of different NoSQL data stores on a distributed cluster. In the scope of this thesis, HBase, Cassandra, MongoDB, and Riak are four NoSQL data stores selected for the benchmarking experiments

    Efficient data reconfiguration for today's cloud systems

    Get PDF
    Performance of big data systems largely relies on efficient data reconfiguration techniques. Data reconfiguration operations deal with changing configuration parameters that affect data layout in a system. They could be user-initiated like changing shard key, block size in NoSQL databases, or system-initiated like changing replication in distributed interactive analytics engine. Current data reconfiguration schemes are heuristics at best and often do not scale well as data volume grows. As a result, system performance suffers. In this thesis, we show that {\it data reconfiguration mechanisms can be done in the background by using new optimal or near-optimal algorithms coupling them with performant system designs}. We explore four different data reconfiguration operations affecting three popular types of systems -- storage, real-time analytics and batch analytics. In NoSQL databases (storage), we explore new strategies for changing table-level configuration and for compaction as they improve read/write latencies. In distributed interactive analytics engines, a good replication algorithm can save costs by judiciously using memory that is sufficient to provide the highest throughput and low latency for queries. Finally, in batch processing systems, we explore prefetching and caching strategies that can improve the number of production jobs meeting their SLOs. All these operations happen in the background without affecting the fast path. Our contributions in each of the problems are two-fold -- 1) we model the problem and design algorithms inspired from well-known theoretical abstractions, 2) we design and build a system on top of popular open source systems used in companies today. Finally, using real-life workloads, we evaluate the efficacy of our solutions. Morphus and Parqua provide several 9s of availability while changing table level configuration parameters in databases. By halving memory usage in distributed interactive analytics engine, Getafix reduces cost of deploying the system by 10 million dollars annually and improves query throughput. We are the first to model the problem of compaction and provide formal bounds on their runtime. Finally, NetCachier helps 30\% more production jobs to meet their SLOs compared to existing state-of-the-art
    • …
    corecore