Search CORE

1,963 research outputs found

Extending and Implementing the Self-adaptive Virtual Processor for Distributed Memory Architectures

Author: Koivisto Juha
van Tol Michiel W.
Publication venue
Publication date: 01/01/2011
Field of study

Many-core architectures of the future are likely to have distributed memory organizations and need fine grained concurrency management to be used effectively. The Self-adaptive Virtual Processor (SVP) is an abstract concurrent programming model which can provide this, but the model and its current implementations assume a single address space shared memory. We investigate and extend SVP to handle distributed environments, and discuss a prototype SVP implementation which transparently supports execution on heterogeneous distributed memory clusters over TCP/IP connections, while retaining the original SVP programming model

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Cluster Computing with Single Thread Space

Author: Cheung B
Lau FCM
Ma MJM
Wang CL
Publication venue
Publication date: 01/01/2000
Field of study

postprin

HKU Scholars Hub

MATLAB*G: A Grid-Based Parallel MATLAB

Author: Chen Ying
Tan Suan Fong
Publication venue
Publication date: 01/01/2004
Field of study

This paper describes the design and implementation of MATLAB*G, a parallel MATLAB on the ALiCE Grid. ALiCE (Adaptive and scaLable internet-based Computing Engine), developed at NUS, is a lightweight grid-computing middleware. Grid applications in ALiCE are written in Java and use the distributed shared memory programming model. Utilizing existing MATLAB functions, MATLAB*G provides distributed matrix computation to the user through a set of simple commands. Currently two forms of parallelism for distributed matrix computation are implemented: task parallelism and job parallelism. Experiments are carried out to investigate the performance of MATLAB*G on each type of parallelism. Results indicate that for large matrix sizes MATLAB*G can be a faster alternative to sequential MATLAB.Singapore-MIT Alliance (SMA

DSpace@MIT

Vcluster: A Portable Virtual Computing Library For Cluster Computing

Author: Zhang Hua
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2008
Field of study

Message passing has been the dominant parallel programming model in cluster computing, and libraries like Message Passing Interface (MPI) and Portable Virtual Machine (PVM) have proven their novelty and efficiency through numerous applications in diverse areas. However, as clusters of Symmetric Multi-Processor (SMP) and heterogeneous machines become popular, conventional message passing models must be adapted accordingly to support this new kind of clusters efficiently. In addition, Java programming language, with its features like object oriented architecture, platform independent bytecode, and native support for multithreading, makes it an alternative language for cluster computing. This research presents a new parallel programming model and a library called VCluster that implements this model on top of a Java Virtual Machine (JVM). The programming model is based on virtual migrating threads to support clusters of heterogeneous SMP machines efficiently. VCluster is implemented in 100% Java, utilizing the portability of Java to address the problems of heterogeneous machines. VCluster virtualizes computational and communication resources such as threads, computation states, and communication channels across multiple separate JVMs, which makes a mobile thread possible. Equipped with virtual migrating thread, it is feasible to balance the load of computing resources dynamically. Several large scale parallel applications have been developed using VCluster to compare the performance and usage of VCluster with other libraries. The results of the experiments show that VCluster makes it easier to develop multithreading parallel applications compared to conventional libraries like MPI. At the same time, the performance of VCluster is comparable to MPICH, a widely used MPI library, combined with popular threading libraries like POSIX Thread and OpenMP. In the next phase of our work, we implemented thread group and thread migration to demonstrate the feasibility of dynamic load balancing in VCluster. We carried out experiments to show that the load can be dynamically balanced in VCluster, resulting in a better performance. Thread group also makes it possible to implement collective communication functions between threads, which have been proved to be useful in process based libraries

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Adaptive sampling-based profiling techniques for optimizing the distributed JVM runtime

Author: Lam KT
Luo Y
Wang CL
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Extending the standard Java virtual machine (JVM) for cluster-awareness is a transparent approach to scaling out multithreaded Java applications. While this clustering solution is gaining momentum in recent years, efficient runtime support for fine-grained object sharing over the distributed JVM remains a challenge. The system efficiency is strongly connected to the global object sharing profile that determines the overall communication cost. Once the sharing or correlation between threads is known, access locality can be optimized by collocating highly correlated threads via dynamic thread migrations. Although correlation tracking techniques have been studied in some page-based sof Tware DSM systems, they would entail prohibitively high overheads and low accuracy when ported to fine-grained object-based systems. In this paper, we propose a lightweight sampling-based profiling technique for tracking inter-thread sharing. To preserve locality across migrations, we also propose a stack sampling mechanism for profiling the set of objects which are tightly coupled with a migrant thread. Sampling rates in both techniques can vary adaptively to strike a balance between preciseness and overhead. Such adaptive techniques are particularly useful for applications whose sharing patterns could change dynamically. The profiling results can be exploited for effective thread-to-core placement and dynamic load balancing in a distributed object sharing environment. We present the design and preliminary performance result of our distributed JVM with the profiling implemented. Experimental results show that the profiling is able to obtain over 95% accurate global sharing profiles at a cost of only a few percents of execution time increase for fine- to medium- grained applications. © 2010 IEEE.published_or_final_versionThe 24th IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2010), Atlanta, GA., 19-23 April 2010. In Proceedings of the 24th IPDPS, 2010, p. 1-1

HKU Scholars Hub

Reparallelization and Migration of OpenMP Programs

Author: Matthias Bezold
Michael Klemm
Ronald Veldema
Stefan Gabriel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Typical computational grid users target only a single cluster and have to estimate the runtime of their jobs. Job schedulers prefer short-running jobs to maintain a high system utilization. If the user underestimates the runtime, premature termination causes computation loss; overesti-mation is penalized by long queue times. As a solution, we present an automatic reparallelization and migration of OpenMP applications. A reparallelization is dynamically computed for an OpenMP work distribution when the num-ber of CPUs changes. The application can be migrated between clusters when an allocated time slice is exceeded. Migration is based on a coordinated, heterogeneous check-pointing algorithm. Both reparallelization and migration enable the user to freely use computing time at more than a single point of the grid. Our demo applications successfully adapt to the changed CPU setting and smoothly migrate between, for example, clusters in Erlangen, Germany, and Amsterdam, the Netherlands, that use different processors. Benchmarks show that reparallelization and migration im-pose average overheads of about 4 % and 2%. 1

CiteSeerX

Crossref

PicoGrid: A Web-Based Distributed Computing Framework for Heterogeneous Networks Using Java

Author: Panichevaluk Apikrit
Wittayasooporn Nipun
Zhao Yan
Publication venue: 'Faculty of Engineering, Chulalongkorn University'
Publication date: 01/01/2014
Field of study

We propose a framework for distributed computing applications in heterogeneous networks. The system is simple to deploy and can run on any operating systems that support the Java Virtual Machine. Using our developed system, idle computing power in an organization can be harvested for performing computing tasks. Agent computers can enter and leave the computation at any time which makes our system very flexible and easily scalable. Our system also does not affect the normal use of client machines to guarantee satisfactory user experience. System tests show that the system has comparable performance to the theoretical case and the computation time is significantly reduced by utilizing multiple computers on the network

CiteSeerX

Engineering Journal (Faculty of Engineering, Chulalongkorn University, Bangkok)

Model-driven Scheduling for Distributed Stream Processing Systems

Author: Shukla Anshu
Simmhan Yogesh
Publication venue: 'Elsevier BV'
Publication date: 06/02/2017
Field of study

Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by Twitter is a widely used stream processing engine while others includes Flink, Spark streaming. For running the streaming applications successfully there is need to know the optimal resource requirement, as over-estimation of resources adds extra cost.So we need some strategy to come up with the optimal resource requirement for a given streaming application. In this article, we propose a model-driven approach for scheduling streaming applications that effectively utilizes a priori knowledge of the applications to provide predictable scheduling behavior. Specifically, we use application performance models to offer reliable estimates of the resource allocation required. Further, this intuition also drives resource mapping, and helps narrow the estimated and actual dataflow performance and resource utilization. Together, this model-driven scheduling approach gives a predictable application performance and resource utilization behavior for executing a given DSPS application at a target input stream rate on distributed resources.Comment: 54 page

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications