Search CORE

26,022 research outputs found

A unified modulo scheduling and register allocation technique for clustered processors

Author: Codina Viñas Josep M.
González Colás Antonio María
Sánchez Navarro F. Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or all) of the three steps, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Besides, it avoids the iterative nature of traditional approaches, which require repeated applications of the three steps until a valid solution is found. The proposed framework includes a mechanism to insert spill code on-the-fly and heuristics to evaluate the quality of partial schedules considering simultaneously inter-cluster communications, memory pressure and register pressure. Transformations that allow trading pressure on a type of resource for another resource are also included. We show that the proposed technique outperforms previously proposed techniques. For instance, the average speed-up for the SPECfp95 is 36% for a 4-cluster configuration.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Derandomization of Online Assignment Algorithms for Dynamic Graphs

Author: Sahai Ankur
Publication venue
Publication date: 01/05/2011
Field of study

This paper analyzes different online algorithms for the problem of assigning weights to edges in a fully-connected bipartite graph that minimizes the overall cost while satisfying constraints. Edges in this graph may disappear and reappear over time. Performance of these algorithms is measured using simulations. This paper also attempts to derandomize the randomized online algorithm for this problem

arXiv.org e-Print Archive

CiteSeerX

The assignment problem in distributed computing

Author: Medepalli Anand
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1992
Field of study

This dissertation focuses on the problem of assigning the modules of a program to the processors in a distributed system with the goal of minimizing the overall cost of running the program. The cost depends on the execution times of the modules on the processors and on the cost of communication between modules. This module allocation problem arises in a variety of situations where one is interested in making optimum use of available computer resources. The general module allocation problem is intractable; however it becomes polynomially-solvable when the communication graph is restricted. In this dissertation, we restrict our attention to k-trees;As the first problem, we study parametric module allocation on partial k-trees. We allow the costs, both execution and communication, to vary linearly as functions of a real parameter t. We show that if the number of processors is fixed, the sequence of optimum assignments that are obtained, as t varies from zero to infinity, can be constructed in polynomial time. As an auxiliary result, we develop a linear-time algorithm to find a separator in a k-tree. We discuss the implications of our results for parametric versions of the weighted vertex cover, independent set, and 0-1 quadratic programming problems on partial k-trees;Next, we consider two variants of the assignment problem. The first problem is to find a minimum-cost assignment when one of the processors has a limited memory. The second is to find an assignment that minimizes the maximum processor load. We present exact dynamic programming algorithms for both problems, which lead to approximation schemes for the case where the communication graph is a partial k-tree. Faster algorithms are presented for trees with uniform costs. In contrast to these results, we show that, for arbitrary graphs, no fully polynomial time approximation schemes exist unless P = NP. Both dynamic programming algorithms have been implemented. The implementation details and our experimental results are presented

Digital Repository @ Iowa State University (ISU)

Improvement of Data-Intensive Applications Running on Cloud Computing Clusters

Author: Ibrahim Ibrahim Adel
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2019
Field of study

MapReduce, designed by Google, is widely used as the most popular distributed programming model in cloud environments. Hadoop, an open-source implementation of MapReduce, is a data management framework on large cluster of commodity machines to handle data-intensive applications. Many famous enterprises including Facebook, Twitter, and Adobe have been using Hadoop for their data-intensive processing needs. Task stragglers in MapReduce jobs dramatically impede job execution on massive datasets in cloud computing systems. This impedance is due to the uneven distribution of input data and computation load among cluster nodes, heterogeneous data nodes, data skew in reduce phase, resource contention situations, and network configurations. All these reasons may cause delay failure and the violation of job completion time. One of the key issues that can significantly affect the performance of cloud computing is the computation load balancing among cluster nodes. Replica placement in Hadoop distributed file system plays a significant role in data availability and the balanced utilization of clusters. In the current replica placement policy (RPP) of Hadoop distributed file system (HDFS), the replicas of data blocks cannot be evenly distributed across cluster\u27s nodes. The current HDFS must rely on a load balancing utility for balancing the distribution of replicas, which results in extra overhead for time and resources. This dissertation addresses data load balancing problem and presents an innovative replica placement policy for HDFS. It can perfectly balance the data load among cluster\u27s nodes. The heterogeneity of cluster nodes exacerbates the issue of computational load balancing; therefore, another replica placement algorithm has been proposed in this dissertation for heterogeneous cluster environments. The timing of identifying the straggler map task is very important for straggler mitigation in data-intensive cloud computing. To mitigate the straggler map task, Present progress and Feedback based Speculative Execution (PFSE) algorithm has been proposed in this dissertation. PFSE is a new straggler identification scheme to identify the straggler map tasks based on the feedback information received from completed tasks beside the progress of the current running task. Straggler reduce task aggravates the violation of MapReduce job completion time. Straggler reduce task is typically the result of bad data partitioning during the reduce phase. The Hash partitioner employed by Hadoop may cause intermediate data skew, which results in straggler reduce task. In this dissertation a new partitioning scheme, named Balanced Data Clusters Partitioner (BDCP), is proposed to mitigate straggler reduce tasks. BDCP is based on sampling of input data and feedback information about the current processing task. BDCP can assist in straggler mitigation during the reduce phase and minimize the job completion time in MapReduce jobs. The results of extensive experiments corroborate that the algorithms and policies proposed in this dissertation can improve the performance of data-intensive applications running on cloud platforms

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

An intelligent processing environment for real-time simulation

Author: Carroll Chester C.
Wells Buren Earl, Jr.
Publication venue
Publication date
Field of study

The development of a highly efficient and thus truly intelligent processing environment for real-time general purpose simulation of continuous systems is described. Such an environment can be created by mapping the simulation process directly onto the University of Alamba's OPERA architecture. To facilitate this effort, the field of continuous simulation is explored, highlighting areas in which efficiency can be improved. Areas in which parallel processing can be applied are also identified, and several general OPERA type hardware configurations that support improved simulation are investigated. Three direct execution parallel processing environments are introduced, each of which greatly improves efficiency by exploiting distinct areas of the simulation process. These suggested environments are candidate architectures around which a highly intelligent real-time simulation configuration can be developed

NASA Technical Reports Server