110 research outputs found
Recommended from our members
Final Report: Performance Modeling Activities in PERC2
Progress in Performance Modeling for PERC2 resulted in: • Automated modeling tools that are robust, able to characterize large applications running at scale while simultaneously simulating the memory hierarchies of mul-tiple machines in parallel. • Porting of the requisite tracer tools to multiple platforms. • Improved performance models by using higher resolution memory models that ever before. • Adding control-flow and data dependency analysis to the tracers used in perform-ance tools. • Exploring and developing several new modeling methodologies. • Using modeling tools to develop performance models for strategic codes. • Application of modeling methodology to make a large number of “blind” per-formance predictions on certain mission partner applications, targeting most cur-rently available system architectures. • Error analysis to correct some systematic biases encountered as part of the large-scale blind prediction exercises. • Addition of instrumentation capabilities for communication libraries other than MPI. • Dissemination the tools and modeling methods to several mission partners, in-cluding DoD HPCMO and two DARPA HPCS vendors (Cray and IBM), as well as to the wider HPC community via a series of tutorials
Modulated Branching Processes, Origins of Power Laws and Queueing Duality
Power law distributions have been repeatedly observed in a wide variety of
socioeconomic, biological and technological areas. In many of the observations,
e.g., city populations and sizes of living organisms, the objects of interest
evolve due to the replication of their many independent components, e.g.,
births-deaths of individuals and replications of cells. Furthermore, the rates
of the replication are often controlled by exogenous parameters causing periods
of expansion and contraction, e.g., baby booms and busts, economic booms and
recessions, etc. In addition, the sizes of these objects often have reflective
lower boundaries, e.g., cities do not fall bellow a certain size, low income
individuals are subsidized by the government, companies are protected by
bankruptcy laws, etc.
Hence, it is natural to propose reflected modulated branching processes as
generic models for many of the preceding observations. Indeed, our main results
show that the proposed mathematical models result in power law distributions
under quite general polynomial Gartner-Ellis conditions, the generality of
which could explain the ubiquitous nature of power law distributions. In
addition, on a logarithmic scale, we establish an asymptotic equivalence
between the reflected branching processes and the corresponding multiplicative
ones. The latter, as recognized by Goldie (1991), is known to be dual to
queueing/additive processes. We emphasize this duality further in the
generality of stationary and ergodic processes.Comment: 36 pages, 2 figures; added references; a new theorem in Subsection
4.
Provably Efficient Adaptive Scheduling for Parallel Jobs
Scheduling competing jobs on multiprocessors has always been an important issue for parallel and distributed systems. The challenge is to ensure global, system-wide efficiency while offering a level of fairness to user jobs. Various degrees of successes have been achieved over the years. However, few existing schemes address both efficiency and fairness over a wide range of work loads. Moreover, in order to obtain analytical results, most of them require prior information about jobs, which may be difficult to obtain in real applications.
This paper presents two novel adaptive scheduling algorithms -- GRAD for centralized scheduling, and WRAD for distributed scheduling. Both GRAD and WRAD ensure fair allocation under all levels of workload, and they offer provable efficiency without requiring prior information of job's parallelism. Moreover, they provide effective control over the scheduling overhead and ensure efficient utilization of processors. To the best of our knowledge, they are the first non-clairvoyant scheduling algorithms that offer such guarantees. We also believe that our new approach of resource request-allotment protocol deserves further exploration.
Specifically, both GRAD and WRAD are O(1)-competitive with respect to mean response time for batched jobs, and O(1)-competitive with respect to makespan for non-batched jobs with arbitrary release times. The simulation results show that, for non-batched jobs, the makespan produced by GRAD is no more than 1.39 times of the optimal on average and it never exceeds 4.5 times. For batched jobs, the mean response time produced by GRAD is no more than 2.37 times of the optimal on average, and it never exceeds 5.5 times.Singapore-MIT Alliance (SMA
Towards ServMark, an Architecture for Testing Grid Services
Technical University of Delft - Technical Report ServMark-2006-002, July 2006Grid computing provides a natural way to aggregate resources from different administrative domains for building large scale distributed environments. The Web Services paradigm proposes a way by which virtual services can be seamlessly integrated into global-scale solutions to complex problems. While the usage of Grid technology ranges from academia and research to business world and production, two issues must be considered: that the promised functionality can be accurately quantified and that the performance can be evaluated based on well defined means. Without adequate functionality demonstrators, systems cannot be tuned or adequately configured, and Web services cannot be stressed adequately in production environment. Without performance evaluation systems, the system design and procurement processes are limp, and the performance of Web Services in production cannot be assessed. In this paper, we present ServMark, a carefully researched tool for Grid performance evaluation. While we acknowledge that a lot of ground must be covered to fulfill the requirements of a system for testing Grid environments, and Web (and Grid) Services, we believe that ServMark addresses the minimal set of critical issues
Real-Time Divisible Load Scheduling with Different Processor Available Times
Providing QoS and performance guarantees to arbitrarily divisible loads has become a significant problem for many cluster-based research computing facilities. While progress is being made in scheduling arbitrarily divisible loads, some of proposed approaches may cause Inserted Idle Times (IITs) that are detrimental to system performance. In this paper we propose a new approach that utilizes IITs and thus enhances the system performance. The novelty of our approach is that, to simplify the analysis, a homogenous system with IITs is transformed to an equivalent heterogeneous system, and that our algorithms can schedule real-time divisible loads with different processor available times. Intensive simulations show that the new approach outperforms the previous approach in all configurations. We also compare the performance of our algorithm to the current practice of manually splitting workloads by users. Simulation results validate the advantages of our approach
- …