280,305 research outputs found

    On the Design and Analysis of Parallel and Distributed Algorithms

    Full text link
    Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide distributed processing using distributed file systems and processing units, while network is modeled as minimum cost spanning tree. On the other hand, the parallel processing chooses different language platforms, data parallel vs. parallel programming, and GPUs. Processing units, memory elements and storage are connected through dynamic distributed networks in the form of spanning trees. The article presents foundational algorithms, analysis, and efficiency considerations.Comment: 9 page

    Behrooz File System (BFS)

    Get PDF
    In this thesis, the Behrooz File System (BFS) is presented, which provides an in-memory distributed file system. BFS is a simple design which combines the best of in-memory and remote file systems. BFS stores data in the main memory of commodity servers and provides a shared unified file system view over them. BFS utilizes backend storage to provide persistency and availability. Unlike most existing distributed in-memory storage systems, BFS supports a general purpose POSIX-like file interface. BFS is built by grouping multiple servers’ memory together; therefore, if applications and BFS servers are co-located, BFS is a highly efficient design because this architecture minimizes inter-node communication. This pattern is common in distributed computing environments and data analytics applications. A set of microbenchmarks and SPEC SFS 2014 benchmark are used to evaluate different aspects of BFS, such as throughput, reliability, and scalability. The evaluation results indicate the simple design of BFS is successful in delivering the expected performance, while certain workloads reveal limitations of BFS in handling a large number of files. Addressing these limitations, as well as other potential improvements, are considered as future work

    SharkGraph: A Time Series Distributed Graph System

    Full text link
    Current graph systems can easily process billions of data, however when increased to exceed hundred billions, the performance decreases dramatically, time series data always be very huge, consequently computation on time series graphs still remains challenging nowadays. In current piece of work, we introduces SharkGraph, a (distributed file system) DFS-based time series graph system, used a novel storage structure (Time Series Graph Data File) TGF, By reading file stream to iterate graph computation, SharkGraph is able to execute batch graph query, simulation, data mining, or clustering algorithm on exceed hundred billions edge size industry graph. Through well defined experiments that shows SharkGraph performs well on large-scale graph processing, also can support time traversal for graphs, and recover state at any position in the timeline. By repeating experiments reported for existing distributed systems like GraphX, we demonstrate that SharkGraph can easily handle hundreds billions of data, rather than GraphX which met many problems such as memory issues and skewed distribution on graph traversal. Compared with other graph systems SharkGraph uses less memory and more efficiently to process the same graph.Comment: 7 pages, 7 figures, 1 algorith

    Lazy Evaluation and Nondeterminism make Backus\u27 FP-systems More Practical

    Get PDF
    Backus\u27 FP-systems are made more practical by introducing into them lazy evaluation and nondeterminism. This is done in the framework of a concrete programming language called FP*. From the one hand, this language is almost as mathematical as FP-systems are. From the other hand, it gives the possibility to manage secondary memory and to develop such applications as, for instance, interactive and distributed file systems. Experimental versions of a compiler and an interpreter for the FP* language are implemented

    Mapping Datasets to Object Storage System

    Full text link
    Access libraries such as ROOT and HDF5 allow users to interact with datasets using high level abstractions, like coordinate systems and associated slicing operations. Unfortunately, the implementations of access libraries are based on outdated assumptions about storage systems interfaces and are generally unable to fully benefit from modern fast storage devices. The situation is getting worse with rapidly evolving storage devices such as non-volatile memory and ever larger datasets. This project explores distributed dataset mapping infrastructures that can integrate and scale out existing access libraries using Ceph's extensible object model, avoiding re-implementation or even modifications of these access libraries as much as possible. These programmable storage extensions coupled with our distributed dataset mapping techniques enable: 1) access library operations to be offloaded to storage system servers, 2) the independent evolution of access libraries and storage systems and 3) fully leveraging of the existing load balancing, elasticity, and failure management of distributed storage systems like Ceph. They also create more opportunities to conduct storage server-local optimizations specific to storage servers. For example, storage servers might include local key/value stores combined with chunk stores that require different optimizations than a local file system. As storage servers evolve to support new storage devices like non-volatile memory, these server-local optimizations can be implemented while minimizing disruptions to applications. We will report progress on the means by which distributed dataset mapping can be abstracted over particular access libraries, including access libraries for ROOT data, and how we address some of the challenges revolving around data partitioning and composability of access operations

    Approximation and Compression Techniques to Enhance Performance of Graphics Processing Units

    Get PDF
    A key challenge in modern computing systems is to access data fast enough to fully utilize the computing elements in the chip. In Graphics Processing Units (GPUs), the performance is often constrained by register file size, memory bandwidth, and the capacity of the main memory. One important technique towards alleviating this challenge is data compression. By reducing the amount of data that needs to be communicated or stored, memory resources crucial for performance can be efficiently utilized.This thesis provides a set of approximation and compression techniques for GPUs, with the goal of efficiently utilizing the computational fabric, and thereby increase performance. The thesis shows that these techniques can substantially lower the amount of information the system has to process, and are thus important tools in the process of meeting challenges in memory utilization.This thesis makes contributions within three areas: controlled floating-point precision reduction, lossless and lossy memory compression, and distributed training of neural networks. In the first area, the thesis shows that through automated and controlled floating-point approximation, the register file can be more efficiently utilized. This is achieved through a framework which establishes a cross-layer connection between the application and the microarchitecture layer, and a novel register file organization capable of leveraging low-precision floating-point values and narrow integers for increased capacity and performance.Within the area of compression, this thesis aims at increasing the effective bandwidth of GPUs by presenting a lossless and lossy memory compression algorithm to reduce the amount of transferred data. In contrast to state-of-the-art compression techniques such as Base-Delta-Immediate and Bitplane Compression, which uses intra-block bases for compression, the proposed algorithm leverages multiple global base values to reach a higher compression ratio. The algorithm includes an optional approximation step for floating-point values which offers higher compression ratio at a given, low, error rate.Finally, within the area of distributed training of neural networks, this thesis proposes a subgraph approximation scheme for graph data which mitigates accuracy loss in a distributed setting. The scheme allows neural network models that use graphs as inputs to converge at single-machine accuracy, while minimizing synchronization overhead between the machines
    • …
    corecore