1,073 research outputs found
MOON: MapReduce On Opportunistic eNvironments
AbstractâMapReduce offers a ïŹexible programming model for processing and generating large data sets on dedicated resources, where only a small fraction of such resources are every unavailable at any given time. In contrast, when MapReduce is run on volunteer computing systems, which opportunistically harness idle desktop computers via frameworks like Condor, it results in poor performance due to the volatility of the resources, in particular, the high rate of node unavailability. Specifically, the data and task replication scheme adopted by existing MapReduce implementations is woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. The adaptive task and data scheduling algorithms in MOON distinguish between (1) different types of MapReduce data and (2) different types of node outages in order to strategically place tasks and data on both volatile and dedicated nodes. Our tests demonstrate that MOON can deliver a 3-fold performance improvement to Hadoop in volatile, volunteer computing environments
Implementing and Running a Workflow Application on Cloud Resources
Scientist need to run applications that are time and resource consuming, but, not all of them, have the requires knowledge to run this applications in a parallel manner, by using grid, cluster or cloud resources. In the past few years many workflow building frameworks were developed in order to help scientist take a better advantage of computing resources, by designing workflows based on their applications and executing them on heterogeneous resources. This paper presents a case study of implementing and running a workflow for an E-bay data retrieval application. The workflow was designed using Askalon framework and executed on the cloud resources. The purpose of this paper is to demonstrate how workflows and cloud resources can be used by scientists in order to achieve speedup for their application without the need of spending large amounts of money on computational resources.Workflow, Cloud Resource
Pando: Personal Volunteer Computing in Browsers
The large penetration and continued growth in ownership of personal
electronic devices represents a freely available and largely untapped source of
computing power. To leverage those, we present Pando, a new volunteer computing
tool based on a declarative concurrent programming model and implemented using
JavaScript, WebRTC, and WebSockets. This tool enables a dynamically varying
number of failure-prone personal devices contributed by volunteers to
parallelize the application of a function on a stream of values, by using the
devices' browsers. We show that Pando can provide throughput improvements
compared to a single personal device, on a variety of compute-bound
applications including animation rendering and image processing. We also show
the flexibility of our approach by deploying Pando on personal devices
connected over a local network, on Grid5000, a French-wide computing grid in a
virtual private network, and seven PlanetLab nodes distributed in a wide area
network over Europe.Comment: 14 pages, 12 figures, 2 table
Recommended from our members
Heterogeneous Cloud Systems Based on Broadband Embedded Computing
Computing systems continue to evolve from homogeneous systems of commodity-based servers within a single data-center towards modern Cloud systems that consist of numerous data-center clusters virtualized at the infrastructure and application layers to provide scalable, cost-effective and elastic services to devices connected over the Internet. There is an emerging trend towards heterogeneous Cloud systems driven from growth in wired as well as wireless devices that incorporate the potential of millions, and soon billions, of embedded devices enabling new forms of computation and service delivery. Service providers such as broadband cable operators continue to contribute towards this expansion with growing Cloud system infrastructures combined with deployments of increasingly powerful embedded devices across broadband networks. Broadband networks enable access to service provider Cloud data-centers and the Internet from numerous devices. These include home computers, smart-phones, tablets, game-consoles, sensor-networks, and set-top box devices. With these trends in mind, I propose the concept of broadband embedded computing as the utilization of a broadband network of embedded devices for collective computation in conjunction with centralized Cloud infrastructures. I claim that this form of distributed computing results in a new class of heterogeneous Cloud systems, service delivery and application enablement. To support these claims, I present a collection of research contributions in adapting distributed software platforms that include MPI and MapReduce to support simultaneous application execution across centralized data-center blade servers and resource-constrained embedded devices. Leveraging these contributions, I develop two complete prototype system implementations to demonstrate an architecture for heterogeneous Cloud systems based on broadband embedded computing. Each system is validated by executing experiments with applications taken from bioinformatics and image processing as well as communication and computational benchmarks. This vision, however, is not without challenges. The questions on how to adapt standard distributed computing paradigms such as MPI and MapReduce for implementation on potentially resource-constrained embedded devices, and how to adapt cluster computing runtime environments to enable heterogeneous process execution across millions of devices remain open-ended. This dissertation presents methods to begin addressing these open-ended questions through the development and testing of both experimental broadband embedded computing systems and in-depth characterization of broadband network behavior. I present experimental results and comparative analysis that offer potential solutions for optimal scalability and performance for constructing broadband embedded computing systems. I also present a number of contributions enabling practical implementation of both heterogeneous Cloud systems and novel application services based on broadband embedded computing
Distributed Computing in a Pandemic
The current COVID-19 global pandemic caused by the SARS-CoV-2 betacoronavirus has resulted in over a million deaths and is having a grave socio-economic impact, hence there is an urgency to find solutions to key research challenges. Much of this COVID-19 research depends on distributed computing. In this article, I review distributed architectures -- various types of clusters, grids and clouds -- that can be leveraged to perform these tasks at scale, at high-throughput, with a high degree of parallelism, and which can also be used to work collaboratively. High-performance computing (HPC) clusters will be used to carry out much of this work. Several bigdata processing tasks used in reducing the spread of SARS-CoV-2 require high-throughput approaches, and a variety of tools, which Hadoop and Spark offer, even using commodity hardware. Extremely large-scale COVID-19 research has also utilised some of the world's fastest supercomputers, such as IBM's SUMMIT -- for ensemble docking high-throughput screening against SARS-CoV-2 targets for drug-repurposing, and high-throughput gene analysis -- and Sentinel, an XPE-Cray based system used to explore natural products. Grid computing has facilitated the formation of the world's first Exascale grid computer. This has accelerated COVID-19 research in molecular dynamics simulations of SARS-CoV-2 spike protein interactions through massively-parallel computation and was performed with over 1 million volunteer computing devices using the Folding@home platform. Grids and clouds both can also be used for international collaboration by enabling access to important datasets and providing services that allow researchers to focus on research rather than on time-consuming data-management tasks
Molecular simulations and visualization: introduction and overview
Here we provide an introduction and overview of current progress in the field of molecular simulation and visualization, touching on the following topics: (1) virtual and augmented reality for immersive molecular simulations; (2) advanced visualization and visual analytic techniques; (3) new developments in high performance computing; and (4) applications and model building
- âŠ