94 research outputs found

    Scalable data management for web applications

    Get PDF
    Steen, M.R. van [Promotor]Pierre, G.E.O. [Copromotor]Chi, C.H. [Copromotor

    Layout Optimization for Distributed Relational Databases Using Machine Learning

    Get PDF
    A common problem when running Web-based applications is how to scale-up the database. The solution to this problem usually involves having a smart Database Administrator determine how to spread the database tables out amongst computers that will work in parallel. Laying out database tables across multiple machines so they can act together as a single efficient database is hard. Automated methods are needed to help eliminate the time required for database administrators to create optimal configurations. There are four operators that we consider that can create a search space of possible database layouts: 1) denormalizing, 2) horizontally partitioning, 3) vertically partitioning, and 4) fully replicating. Textbooks offer general advice that is useful for dealing with extreme cases - for instance you should fully replicate a table if the level of insert to selects is close to zero. But even this seemingly obvious statement is not necessarily one that will lead to a speed up once you take into account that some nodes might be a bottle neck. There can be complex interactions between the 4 different operators which make it even more difficult to predict what the best thing to do is. Instead of using best practices to do database layout, we need a system that collects empirical data on when these 4 different operators are effective. We have implemented a state based search technique to try different operators, and then we used the empirically measured data to see if any speed up occurred. We recognized that the costs of creating the physical database layout are potentially large, but it is necessary since we want to know the Ground Truth about what is effective and under what conditions. After creating a dataset where these four different operators have been applied to make different databases, we can employ machine learning to induce rules to help govern the physical design of the database across an arbitrary number of computer nodes. This learning process, in turn, would allow the database placement algorithm to get better over time as it trains over a set of examples. What this algorithm calls for is that it will try to learn 1) What is a good database layout for a particular application given a query workload? and 2) Can this algorithm automatically improve itself in making recommendations by using machine learned rules to try to generalize when it makes sense to apply each of these operators? There has been considerable research done in parallelizing databases where large amounts of data are shipped from one node to another to answer a single query. Sometimes the costs of shipping the data back and forth might be high, so in this work we assume that it might be more efficient to create a database layout where each query can be answered by a single node. To make this assumption requires that all the incoming query templates are known beforehand. This requirement can easily be satisfied in the case of a Web-based application due to the characteristic that users typically interact with the system through a web interface such as web forms. In this case, unseen queries are not necessarily answerable, without first possibly reconstructing the data on a single machine. Prior knowledge of these exact query templates allows us to select the best possible database table placements across multiple nodes. But in the case of trying to improve the efficiency of a Web-based application, a web site provider might feel that they are willing to suffer the inconvenience of not being able to answer an arbitrary query, if they are in turn provided with a system that runs more efficiently

    NoSQL storage and management of geospatial data with emphasis on serving geospatial data using standard geospatial web services

    Get PDF
    Today a huge amount of geospatial data is being created, collected and used more than ever before. The ever increasing observations and measurements of geo-sensor networks, satellite imageries, point clouds from laser scanning, geospatial data of Location Based Services (LBS) and location-based social networks has become a serious challenge for data management and analysis systems. Traditionally, Relational Database Management Systems (RDBMS) were used to manage and to some extent analyse the geospatial data. Nowadays these systems can be used in many scenarios but there are some situations when using these systems may not provide the required efficiency and effectiveness. In these situations, NoSQL solutions can provide the efficiency necessary for applications using geospatial data. It is important to differentiate between the physical way a NoSQL product is implemented, and the interfaces, coding and access methods they use for the abstraction of data. This paper provides an overview of the major types of NoSQL solutions, their advantages and disadvantages and the challenges they present in managing geospatial data. Then the paper elaborates on serving geospatial data using standard geospatial web services with a NoSQL database as a backend

    CloudTPS: Scalable Transactions for Web Applications in the Cloud

    Get PDF
    NoSQL Cloud data services provide scalability and high availability properties for web applications but at the same time they sacrifice data consistency. However, many applications cannot afford any data inconsistency. CloudTPS is a scalable transaction manager to allow cloud database services to execute the ACID transactions of web applications, even in the presence of server failures and network partitions. We implement this approach on top of the two main families of scalable data layers: Bigtable and SimpleDB. Performance evaluation on top of HBase (an open-source version of Bigtable) in our local cluster and Amazon SimpleDB in the Amazon cloud shows that our system scales linearly at least up to 40 nodes in our local cluster and 80 nodes in the Amazon cloud

    Practitioners’ view on command query responsibility segregation

    Get PDF
    Relational database management systems (RDBMS) have long been a predominant technology in information systems (IS). Today, however, the ever-changing technology landscape seems to be the proving grounds for many alternative approaches. For instance, alternative databases are currently used in many cloud services that affect everyday life. Similarly, a novel way to design applications has come to fruition. It relies on two concepts; command query responsibility segregation (CQRS) and event sourcing. A combination of the concepts is suggested to mitigate some performance and design issues that commonly arise in traditional information systems development (ISD). However, this particular approach hasn’t sparked interest from of academia yet. This inquiry sets out to find opportunities and challenges that arise from adoption of one of the two concepts, namely CQRS. This is done in relative isolation from event sourcing. In total five interviews were conducted with seven participants using open-ended interview questions derived from design patterns research. The results are five themes that provide guidance to IS professionals evaluating adoption. These are alignment between IT-artifacts and business processes, simultaneous development, flexibility from specific database technology, modularization as a means of implementation and risk of introducing complexity. The results indicate that several themes from domain-driven design are influential to the concept. Additionally, results indicate that CQRS may be a precursor to eventually consistent queries and aids fine-tuning of availability, consistency and partition tolerance considerations. It is concluded that CQRS may facilitate improved collaboration and ease distribution of work. Moreover, it is hoped that the results will help to contextualize CQRS and spark additional interest in the field of IS research. The inquiry suggests further inquiries in other areas. These are among others; extract transform load-patterns, operational transforms, probabilistic bounded staleness and occasionally connected systems

    Automatic Data Migration into the Cloud

    Get PDF
    Relational databases have been used for decades to store data. Using scale up, relational databases require a bigger and bigger server with more CPUs, more memory, and more disk storage to keep all the tables to support more concurrent users. However, big servers tend to be highly complex, proprietary, and disproportionately expensive, unlike the low-cost, commodity hardware. Therefore, it becomes important to store data efficiently and compute with massive amount of data, providing high scalability, providing high performance and availability at low costs. This leads to the invention of cloud databases, for instance NoSQL databases. NoSQL databases have many advantages such as reading and writing data quickly, supporting massive storage and low cost. The scaling approach in cloud databases is scale out, which is used to add multiple servers, and the data structure of storage is in the form of key-value pairs. However, it can be a challenge for enterprises to migrate existing relational databases to highly scalable NoSQL databases on clouds. In this thesis, we propose an automatic data migration model which will assist enterprises to migrate their relational databases efficiently and transparently to the cloud databases. We propose four migration methods to migrate data in four different ways. Each migration method is independent of the others and stores the migrated relational database in different formats in the cloud database. We design a system to implement the automatic data migration model. As a proof of concept, we successfully migrated a relational database from Microsoft SQL Server to a cloud database Amazon SimpleDB using four different migration methods. Furthermore, we have conducted extensive experiments on Amazon SimpleDB to evaluate the performance of our model in terms of computational time, storage cost, sharding and redundancy. Based on these experiments and detailed analysis of each migration method, our system allows enterprises to determine which method is suitable for their data migration. Furthermore, our experimental evaluation shows that our solution is promising and can migrate data from the relational databases to the cloud databases

    Enabling technologies for a web-based urban street construction permit system

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2001.Includes bibliographical references (leaves 87-88).This thesis is focused on the enabling technologies for a web-based urban street construction permit system. The web-based application system can automatically verify the various constraints, issue the permit if the constraints are met, notify the relevant persons of the issuance of the permit, update the pavement status for the affected street and prepare the billing report for further processing with the existing billing system. The web-based permit system is divided into two sub-systems: External System and Internal System. The external system is used by contractor/utility companies for permit application, and the internal system is used solely by authorized internal users for maintenance of the system or permit application on behalf of contractor/utility companies when there is such a necessity. These two sub-systems share the same underlying database system. In order to develop this web-based permit system, the following J2EE technologies have been used: Enterprise JavaBeans, JavaServer Pages, Servlet and JDBC API. Other J2EE technologies such as Transaction, JNDI and XML are also discussed where appropriate. The following development environments to support these technologies are also presented in this thesis: Red Hat Linux 7.0, Java 2 Platform, Tomcat Server 3.2.1, Database MySQL 2.1.4, and JDBC Driver 2.0.4 for MySQL. As an example, Arlington permit system was used to demonstrate the design of an Entity- Relationship model, and an Enterprise JavaBeans application.by Changxin Qi.M.Eng

    The Forgotten Document-Oriented Database Management Systems: An Overview and Benchmark of Native XML DODBMSes in Comparison with JSON DODBMSes

    Get PDF
    In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve the issue of rigid data structures present in relational databases, by introducing semi-structured and flexible schema design. As current data generated by different sources and devices, especially from IoT sensors and actuators, use either XML or JSON format, depending on the application, database technologies that store and query semi-structured data in XML format are needed. Thus, Native XML Databases, which were initially designed to manipulate XML data using standardized querying languages, i.e., XQuery and XPath, were rebranded as NoSQL Document-Oriented Databases Systems. Currently, the majority of these solutions have been replaced with the more modern JSON based Database Management Systems. However, we believe that XML-based solutions can still deliver performance in executing complex queries on heterogeneous collections. Unfortunately nowadays, research lacks a clear comparison of the scalability and performance for database technologies that store and query documents in XML versus the more modern JSON format. Moreover, to the best of our knowledge, there are no Big Data-compliant benchmarks for such database technologies. In this paper, we present a comparison for selected Document-Oriented Database Systems that either use the XML format to encode documents, i.e., BaseX, eXist-db, and Sedna, or the JSON format, i.e., MongoDB, CouchDB, and Couchbase. To underline the performance differences we also propose a benchmark that uses a heterogeneous complex schema on a large DBLP corpus.Comment: 28 pages, 6 figures, 7 table
    • …
    corecore