116 research outputs found

    Data Structures & Algorithm Analysis in C++

    Get PDF
    This is the textbook for CSIS 215 at Liberty University.https://digitalcommons.liberty.edu/textbooks/1005/thumbnail.jp

    Extending functional databases for use in text-intensive applications

    Get PDF
    This thesis continues research exploring the benefits of using functional databases based around the functional data model for advanced database applications-particularly those supporting investigative systems. This is a growing generic application domain covering areas such as criminal and military intelligence, which are characterised by significant data complexity, large data sets and the need for high performance, interactive use. An experimental functional database language was developed to provide the requisite semantic richness. However, heavy use in a practical context has shown that language extensions and implementation improvements are required-especially in the crucial areas of string matching and graph traversal. In addition, an implementation on multiprocessor, parallel architectures is essential to meet the performance needs arising from existing and projected database sizes in the chosen application area. [Continues.

    Resource Sharing for Multi-Tenant Nosql Data Store in Cloud

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2015Multi-tenancy hosting of users in cloud NoSQL data stores is favored by cloud providers because it enables resource sharing at low operating cost. Multi-tenancy takes several forms depending on whether the back-end file system is a local file system (LFS) or a parallel file system (PFS), and on whether tenants are independent or share data across tenants In this thesis I focus on and propose solutions to two cases: independent data-local file system, and shared data-parallel file system. In the independent data-local file system case, resource contention occurs under certain conditions in Cassandra and HBase, two state-of-the-art NoSQL stores, causing performance degradation for one tenant by another. We investigate the interference and propose two approaches. The first provides a scheduling scheme that can approximate resource consumption, adapt to workload dynamics and work in a distributed fashion. The second introduces a workload-aware resource reservation approach to prevent interference. The approach relies on a performance model obtained offline and plans the reservation according to different workload resource demands. Results show the approaches together can prevent interference and adapt to dynamic workloads under multi-tenancy. In the shared data-parallel file system case, it has been shown that running a distributed NoSQL store over PFS for shared data across tenants is not cost effective. Overheads are introduced due to the unawareness of the NoSQL store of PFS. This dissertation targets the key-value store (KVS), a specific form of NoSQL stores, and proposes a lightweight KVS over a parallel file system to improve efficiency. The solution is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes, giving capability that embedded KVSs are not designed for. Results show the proposed system outperforms Cassandra and Voldemort in several different workloads

    Leveraging Emerging Hardware to Improve the Performance of Data Analytics Frameworks

    Get PDF
    Department of Computer Science and EngineeringThe data analytics frameworks have evolved along with the growing amount of data. There have been numerous efforts to improve the performance of the data analytics frameworks in- cluding MapReduce frameworks and NoSQL and NewSQL databases. These frameworks have various target workloads and their own characteristicshowever, there is common ground as a data analytics framework. Emerging hardware such as graphics processing units and persistent memory is expected to open up new opportunities for such commonality. The goal of this dis- sertation is to leverage emerging hardware to improve the performance of the data analytics frameworks. First, we design and implement EclipseMR, a novel MapReduce framework that efficiently leverages an extensive amount of memory space distributed among the machines in a cluster. EclipseMR consists of a decentralized DHT-based file system layer and an in-memory cache layer. The in-memory cache layer is designed to store both local and remote data while balancing the load between the servers with proposed Locality-Aware Fair (LAF) job scheduler. The design of EclipseMR is easily extensible with emerging hardwareit can adopt persistent memory as a primary storage layer or cache layer, or it can adopt GPU to improve the performance of map and reduce functions. Our evaluation shows that EclipseMR outperforms Hadoop and Spark for various applications. Second, we propose B 3 -tree and Cache-Conscious Extendible Hashing (CCEH) for the persis- tent memory. The fundamental challenge to design a data structure for the persistent memory is to guarantee consistent transition with 8-bytes of fine-grained atomic write with minimum cost. B 3 -tree is a fully persistent hybrid indexing structure of binary tree and B+-tree that benefits from the strength of both in-memory index and block-based index, and CCEH is a variant of extendible hashing that introduces an intermediate layer between directory and buckets to fully benefit from a cache-sized bucket while minimizing the size of the directory. Both of the data structures show better performance than the corresponding state-of-the-art techniques. Third, we develop a data parallel tree traversal algorithm, Parallel Scan and Backtrack (PSB), for k-nearest neighbor search problem on the GPU. Several studies have been proposed to improve the performance of the query by leveraging GPU as an acceleratorhowever, most of the works focus on the brute-force algorithms. In this work, we overcome the challenges of traversing multi-dimensional hierarchical indexing structure on the GPU such as tiny shared memory and runtime stack, irregular memory access pattern, and warp divergence problem. Our evaluation shows that our data parallel PSB algorithm outperforms both the brute-force algorithm and the traditional branch and bound algorithm.clos

    Scalable Decentralized Fault-Tolerant MapReduce for Iterative Algorithms

    Get PDF

    Query Optimization Technique in Relational Databases�

    Get PDF
    Computer Scienc

    Distributed computation in computer networks

    Get PDF
    None provide

    Data storage hierarchy systems for data base computers.

    Get PDF
    Thesis. 1979. Ph.D.--Massachusetts Institute of Technology. Alfred P. Sloan School of Management.MICROFICHE COPY AVAILABLE IN ARCHIVES AND DEWEY.Vita.Bibliography: p. 241-248.Ph.D