5 research outputs found

    Grid and P2P middleware for scientific computing systems

    Get PDF
    Grid and P2P systems have achieved a notable success in the domain of scientific and engineering applications, which commonly demand considerable amounts of computational resources. However, Grid and P2P systems remain still difficult to be used by the domain scientists and engineers due to the inherent complexity of the corresponding middleware and the lack of adequate documentation. In this paper we survey recent developments of Grid and P2P middleware in the context of scientific computing systems. The differences on the approaches taken for Grid and P2P middleware as well as the common points of both paradigms are highlighted. In addition, we discuss the corresponding programming models, languages, and applications.Peer ReviewedPostprint (published version

    Abundance of correctly folded RNA motifs in sequence space, calculated on computational grids

    Get PDF
    Although functional RNA molecules are known to be biased in overall composition, the effects of background composition on the probability of finding a particular active site by chance has received little attention. The probability of finding a particular motif has important implications both for understanding the distribution of functional RNAs in ancient and modern organisms with varying genome compositions and for tuning SELEX pools to optimize the chance of finding specific functions. Here we develop a new method for calculating the probability of finding a modular motif containing base-paired regions, and use a computational grid to fold several hundred million random RNA sequences containing the core elements of the isoleucine aptamer and the hammerhead ribozyme to estimate the probability that a sequence containing these structural elements will fold correctly when isolated from background sequences of different compositions. We find that the two motifs are most likely to be found in distinct regions of compositional space, and that the regions of greatest abundance are influenced by the probability of finding the conserved bases, finding the flanking helices, and folding, in that order of importance. Additionally, we can refine our estimates of the number of random sequences required for a 50% probability of finding an example of each site in unbiased random pools of length 100 to 4.1 × 10(9) for the isoleucine aptamer and 1.6 × 10(10) for the hammerhead ribozyme. These figures are consistent with the facile recovery of these motifs from SELEX experiments

    Enhancing Data Processing on Clouds with Hadoop/HBase

    Get PDF
    In the current information age, large amounts of data are being generated and accumulated rapidly in various industrial and scientific domains. This imposes important demands on data processing capabilities that can extract sensible and valuable information from the large amount of data in a timely manner. Hadoop, the open source implementation of Google's data processing framework (MapReduce, Google File System and BigTable), is becoming increasingly popular and being used to solve data processing problems in various application scenarios. However, being originally designed for handling very large data sets that can be divided easily in parts to be processed independently with limited inter-task communication, Hadoop lacks applicability to a wider usage case. As a result, many projects are under way to enhance Hadoop for different application needs, such as data warehouse applications, machine learning and data mining applications, etc. This thesis is one such research effort in this direction. The goal of the thesis research is to design novel tools and techniques to extend and enhance the large-scale data processing capability of Hadoop/HBase on clouds, and to evaluate their effectiveness in performance tests on prototype implementations. Two main research contributions are described. The first contribution is a light-weight computational workflow system called "CloudWF" for Hadoop. The second contribution is a client library called "HBaseSI" supporting transactional snapshot isolation (SI) in HBase, Hadoop's database component. CloudWF addresses the problem of automating the execution of scientific workflows composed of both MapReduce and legacy applications on clouds with Hadoop/HBase. CloudWF is the first computational workflow system built directly using Hadoop/HBase. It uses novel methods in handling workflow directed acyclic graph decomposition, storing and querying dependencies in HBase sparse tables, transparent file staging, and decentralized workflow execution management relying on the MapReduce framework for task scheduling and fault tolerance. HBaseSI addresses the problem of maintaining strong transactional data consistency in HBase tables. This is the first SI mechanism developed for HBase. HBaseSI uses novel methods in handling distributed transactional management autonomously by individual clients. These methods greatly simplify the design of HBaseSI and can be generalized to other column-oriented stores with similar architecture as HBase. As a result of the simplicity in design, HBaseSI adds low overhead to HBase performance and directly inherits many desirable properties of HBase. HBaseSI is non-intrusive to existing HBase installations and user data, and is designed to work with a large cloud in terms of data size and the number of nodes in the cloud

    A fault tolerant, peer-to-peer based scheduler for home grids

    Get PDF
    This thesis presents a fault-tolerant, Peer-to-Peer (P2P) based grid scheduling system for highly dynamic and highly heterogeneous environments, such as home networks, where we can find a variety of devices (laptops, PCs, game consoles, etc.) and networks. The number of devices found in a house that are capable of processing data has been increasing in the last few years. However, being able to process data does not mean that these devices are powerful, and, in a home environment, there will be a demand for some applications that need significant computing resources, beyond the capabilities of a single domestic device, such as a set top box (examples of such applications are TV recommender systems, image processing and photo indexing systems). A computational grid is a possible solution for this problem, but the constrained environment in the home makes it difficult to use conventional grid scheduling technologies, which demand a powerful infrastructure. Our solution is based on the distribution of the matchmaking task among providers, leaving the final allocation decision to a central scheduler that can be running on a limited device without a big loss in performance. We evaluate our solution by simulating different scenarios and configurations against the Opportunistic Load Balance (OLB) scheduling heuristic, which we found to be the best option for home grids from the existing solutions that we analysed. The results have shown that our solution performs similar or better to OLB. Furthermore, our solution also provides fault tolerance, which is not achieved with OLB, and we have formally verified the behaviour our solution against two cases of network partition failure
    corecore