2,015 research outputs found

    Designing and Implementing a Distributed Database for a Small Multi-Outlet Business

    Get PDF
    Data is a fundamental and necessary element for businesses. During their operations they generate a certain amount of data that they need to capture, store, and later on retrieve when required. Databases provide the means to store and effectively retrieve data. Such a database can help a business improve its services, be more competitive, and ultimately increase its profits. In this paper, the system requirements of a distributed database are researched for a movie rental and sale store that has at least two outlets in different locations besides the main one. This project investigates the different stages of such a database, namely, the planning, analysis, decision, implementation and testing

    The End of Slow Networks: It's Time for a Redesign

    Full text link
    Next generation high-performance RDMA-capable networks will require a fundamental rethinking of the design and architecture of modern distributed DBMSs. These systems are commonly designed and optimized under the assumption that the network is the bottleneck: the network is slow and "thin", and thus needs to be avoided as much as possible. Yet this assumption no longer holds true. With InfiniBand FDR 4x, the bandwidth available to transfer data across network is in the same ballpark as the bandwidth of one memory channel, and it increases even further with the most recent EDR standard. Moreover, with the increasing advances of RDMA, the latency improves similarly fast. In this paper, we first argue that the "old" distributed database design is not capable of taking full advantage of the network. Second, we propose architectural redesigns for OLTP, OLAP and advanced analytical frameworks to take better advantage of the improved bandwidth, latency and RDMA capabilities. Finally, for each of the workload categories, we show that remarkable performance improvements can be achieved

    PerfXplain: Debugging MapReduce Job Performance

    Full text link
    While users today have access to many tools that assist in performing large scale data analysis tasks, understanding the performance characteristics of their parallel computations, such as MapReduce jobs, remains difficult. We present PerfXplain, a system that enables users to ask questions about the relative performances (i.e., runtimes) of pairs of MapReduce jobs. PerfXplain provides a new query language for articulating performance queries and an algorithm for generating explanations from a log of past MapReduce job executions. We formally define the notion of an explanation together with three metrics, relevance, precision, and generality, that measure explanation quality. We present the explanation-generation algorithm based on techniques related to decision-tree building. We evaluate the approach on a log of past executions on Amazon EC2, and show that our approach can generate quality explanations, outperforming two naive explanation-generation methods.Comment: VLDB201

    Scalable mining for classification rules in relational databases

    Get PDF
    doi:10.1214/lnms/1196285404Data mining is a process of discovering useful patterns (knowledge) hidden in extremely large datasets. Classification is a fundamental data mining function, and some other functions can be reduced to it. In this paper we propose a novel classification algorithm (classifier) called MIND (MINing in Databases). MIND can be phrased in such a way that its implementation is very easy using the extended relational calculus SQL, and this in turn allows the classifier to be built into a relational database system directly. MIND is truly scalable with respect to I/O efficiency, which is important since scalability is a key requirement for any data mining algorithm. We have built a prototype of MIND in the relational database management system DB2 and have benchmarked its performance. We describe the working prototype and report the measured performance with respect to the previous method of choice. MIND scales not only with the size of datasets but also with the number of processors on an IBM SP2 computer system. Even on uniprocessors, MIND scales well beyond dataset sizes previously published for classifiers.We also give some insights that may have an impact on the evolution of the extended relational calculus SQL

    Business Analytics in (a) Blink

    Get PDF
    The Blink project’s ambitious goal is to answer all Business Intelligence (BI) queries in mere seconds, regardless of the database size, with an extremely low total cost of ownership. Blink is a new DBMS aimed primarily at read-mostly BI query processing that exploits scale-out of commodity multi-core processors and cheap DRAM to retain a (copy of a) data mart completely in main memory. Additionally, it exploits proprietary compression technology and cache-conscious algorithms that reduce memory bandwidth consumption and allow most SQL query processing to be performed on the compressed data. Blink always scans (portions of) the data mart in parallel on all nodes, without using any indexes or materialized views, and without any query optimizer to choose among them. The Blink technology has thus far been incorp
    • …
    corecore