Search CORE

5 research outputs found

Grid and P2P middleware for scientific computing systems

Author: Barolli Leonard
Pllana Sabri
Xhafa Xhafa Fatos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Grid and P2P systems have achieved a notable success in the domain of scientific and engineering applications, which commonly demand considerable amounts of computational resources. However, Grid and P2P systems remain still difficult to be used by the domain scientists and engineers due to the inherent complexity of the corresponding middleware and the lack of adequate documentation. In this paper we survey recent developments of Grid and P2P middleware in the context of scientific computing systems. The differences on the approaches taken for Grid and P2P middleware as well as the common points of both paradigms are highlighted. In addition, we discuss the corresponding programming models, languages, and applications.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Abundance of correctly folded RNA motifs in sequence space, calculated on computational grids

Author: De Sterck Hans
Knight Rob
Markel Rob
Oshmyansky Alexander
Smit Sandra
Yarus Michael
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

Although functional RNA molecules are known to be biased in overall composition, the effects of background composition on the probability of finding a particular active site by chance has received little attention. The probability of finding a particular motif has important implications both for understanding the distribution of functional RNAs in ancient and modern organisms with varying genome compositions and for tuning SELEX pools to optimize the chance of finding specific functions. Here we develop a new method for calculating the probability of finding a modular motif containing base-paired regions, and use a computational grid to fold several hundred million random RNA sequences containing the core elements of the isoleucine aptamer and the hammerhead ribozyme to estimate the probability that a sequence containing these structural elements will fold correctly when isolated from background sequences of different compositions. We find that the two motifs are most likely to be found in distinct regions of compositional space, and that the regions of greatest abundance are influenced by the probability of finding the conserved bases, finding the flanking helices, and folding, in that order of importance. Additionally, we can refine our estimates of the number of random sequences required for a 50% probability of finding an example of each site in unbiased random pools of length 100 to 4.1 × 10(9) for the isoleucine aptamer and 1.6 × 10(10) for the hammerhead ribozyme. These figures are consistent with the facile recovery of these motifs from SELEX experiments

CiteSeerX

PubMed Central

eScholarship - University of California

Enhancing Data Processing on Clouds with Hadoop/HBase

Author: Zhang Chen
Publication venue: 'University of Waterloo'
Publication date: 01/01/2011
Field of study

In the current information age, large amounts of data are being generated and accumulated rapidly in various industrial and scientific domains. This imposes important demands on data processing capabilities that can extract sensible and valuable information from the large amount of data in a timely manner. Hadoop, the open source implementation of Google's data processing framework (MapReduce, Google File System and BigTable), is becoming increasingly popular and being used to solve data processing problems in various application scenarios. However, being originally designed for handling very large data sets that can be divided easily in parts to be processed independently with limited inter-task communication, Hadoop lacks applicability to a wider usage case. As a result, many projects are under way to enhance Hadoop for different application needs, such as data warehouse applications, machine learning and data mining applications, etc. This thesis is one such research effort in this direction. The goal of the thesis research is to design novel tools and techniques to extend and enhance the large-scale data processing capability of Hadoop/HBase on clouds, and to evaluate their effectiveness in performance tests on prototype implementations. Two main research contributions are described. The first contribution is a light-weight computational workflow system called "CloudWF" for Hadoop. The second contribution is a client library called "HBaseSI" supporting transactional snapshot isolation (SI) in HBase, Hadoop's database component. CloudWF addresses the problem of automating the execution of scientific workflows composed of both MapReduce and legacy applications on clouds with Hadoop/HBase. CloudWF is the first computational workflow system built directly using Hadoop/HBase. It uses novel methods in handling workflow directed acyclic graph decomposition, storing and querying dependencies in HBase sparse tables, transparent file staging, and decentralized workflow execution management relying on the MapReduce framework for task scheduling and fault tolerance. HBaseSI addresses the problem of maintaining strong transactional data consistency in HBase tables. This is the first SI mechanism developed for HBase. HBaseSI uses novel methods in handling distributed transactional management autonomously by individual clients. These methods greatly simplify the design of HBaseSI and can be generalized to other column-oriented stores with similar architecture as HBase. As a result of the simplicity in design, HBaseSI adds low overhead to HBase performance and directly inherits many desirable properties of HBase. HBaseSI is non-intrusive to existing HBase installations and user data, and is designed to work with a large cloud in terms of data size and the number of nodes in the cloud

University of Waterloo's Institutional Repository

A fault tolerant, peer-to-peer based scheduler for home grids

Author: Lopes da Silva Erick
Publication venue
Publication date: 12/07/2023
Field of study

This thesis presents a fault-tolerant, Peer-to-Peer (P2P) based grid scheduling system for highly dynamic and highly heterogeneous environments, such as home networks, where we can find a variety of devices (laptops, PCs, game consoles, etc.) and networks. The number of devices found in a house that are capable of processing data has been increasing in the last few years. However, being able to process data does not mean that these devices are powerful, and, in a home environment, there will be a demand for some applications that need significant computing resources, beyond the capabilities of a single domestic device, such as a set top box (examples of such applications are TV recommender systems, image processing and photo indexing systems). A computational grid is a possible solution for this problem, but the constrained environment in the home makes it difficult to use conventional grid scheduling technologies, which demand a powerful infrastructure. Our solution is based on the distribution of the matchmaking task among providers, leaving the final allocation decision to a central scheduler that can be running on a limited device without a big loss in performance. We evaluate our solution by simulating different scenarios and configurations against the Opportunistic Load Balance (OLB) scheduling heuristic, which we found to be the best option for home grids from the existing solutions that we analysed. The results have shown that our solution performs similar or better to OLB. Furthermore, our solution also provides fault tolerance, which is not achieved with OLB, and we have formally verified the behaviour our solution against two cases of network partition failure

Kent Academic Repository