3,722 research outputs found
Recommended from our members
Algorithmic support for commodity-based parallel computing systems.
The Computational Plant or Cplant is a commodity-based distributed-memory supercomputer under development at Sandia National Laboratories. Distributed-memory supercomputers run many parallel programs simultaneously. Users submit their programs to a job queue. When a job is scheduled to run, it is assigned to a set of available processors. Job runtime depends not only on the number of processors but also on the particular set of processors assigned to it. Jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This report introduces new allocation strategies and performance metrics based on space-filling curves and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in Release 2.0 of the Cplant System Software that was phased into the Cplant systems at Sandia by May 2002. Experimental results then demonstrated that the average number of communication hops between the processors allocated to a job strongly correlates with the job's completion time. This report also gives processor-allocation algorithms for minimizing the average number of communication hops between the assigned processors for grid architectures. The associated clustering problem is as follows: Given n points in {Re}d, find k points that minimize their average pairwise L{sub 1} distance. Exact and approximate algorithms are given for these optimization problems. One of these algorithms has been implemented on Cplant and will be included in Cplant System Software, Version 2.1, to be released. In more preliminary work, we suggest improvements to the scheduler separate from the allocator
Throughput Analysis of CSMA Wireless Networks with Finite Offered-load
This paper proposes an approximate method, equivalent access intensity (EAI),
for the throughput analysis of CSMA wireless networks in which links have
finite offered-load and their MAC-layer transmit buffers may be empty from time
to time. Different from prior works that mainly considered the saturated
network, we take into account in our analysis the impacts of empty transmit
buffers on the interactions and dependencies among links in the network that is
more common in practice. It is known that the empty transmit buffer incurs
extra waiting time for a link to compete for the channel airtime usage, since
when it has no packet waiting for transmission, the link will not perform
channel competition. The basic idea behind EAI is that this extra waiting time
can be mapped to an equivalent "longer" backoff countdown time for the
unsaturated link, yielding a lower link access intensity that is defined as the
mean packet transmission time divided by the mean backoff countdown time. That
is, we can compute the "equivalent access intensity" of an unsaturated link to
incorporate the effects of the empty transmit buffer on its behavior of channel
competition. Then, prior saturated ideal CSMA network (ICN) model can be
adopted for link throughput computation. Specifically, we propose an iterative
algorithm, "Compute-and-Compare", to identify which links are unsaturated under
current offered-load and protocol settings, compute their "equivalent access
intensities" and calculate link throughputs. Simulation shows that our
algorithm has high accuracy under various offered-load and protocol settings.
We believe the ability to identify unsaturated links and compute links
throughputs as established in this paper will serve an important first step
toward the design and optimization of general CSMA wireless networks with
offered-load control.Comment: 6 pages. arXiv admin note: text overlap with arXiv:1007.5255 by other
author
Methodology for modeling high performance distributed and parallel systems
Performance modeling of distributed and parallel systems is of considerable importance to the high performance computing community. To achieve high performance, proper task or process assignment and data or file allocation among processing sites is essential. This dissertation describes an elegant approach to model distributed and parallel systems, which combines the optimal static solutions for data allocation with dynamic policies for task assignment. A performance-efficient system model is developed using analytical tools and techniques.
The system model is accomplished in three steps. First, the basic client-server model which allows only data transfer is evaluated. A prediction and evaluation method is developed to examine the system behavior and estimate performance measures. The method is based on known product form queueing networks. The next step extends the model so that each site of the system behaves as both client and server. A data-allocation strategy is designed at this stage which optimally assigns the data to the processing sites. The strategy is based on flow deviation technique in queueing models. The third stage considers process-migration policies. A novel on-line adaptive load-balancing algorithm is proposed which dynamically migrates processes and transfers data among different sites to minimize the job execution cost. The gradient-descent rule is used to optimize the cost function, which expresses the cost of process execution at different processing sites.
The accuracy of the prediction method and the effectiveness of the analytical techniques is established by the simulations. The modeling procedure described here is general and applicable to any message-passing distributed and parallel system. The proposed techniques and tools can be easily utilized in other related areas such as networking and operating systems. This work contributes significantly towards the design of distributed and parallel systems where performance is critical
Supercomputer Emulation For Evaluating Scheduling Algorithms
Scheduling algorithms have a significant impact on the optimal
utilization of HPC facilities, yet the vast majority of the
research in this area is done using simulations. In working with
simulations, a great deal of factors that affect a real
scheduler, such as its scheduling processing time, communication
latencies and the scheduler intrinsic
implementation complexity are not considered. As a result,
despite theoretical improvements reported in several articles,
practically no new algorithms proposed have been implemented in
real schedulers, with HPC facilities still using the basic
first-come-first-served (FCFS) with Backfill policy scheduling
algorithm.
A better approach could be, therefore, the use of real schedulers
in an emulation environment to evaluate new algorithms.
This thesis investigates two related challenges in emulations:
computational cost and faithfulness of the results to real
scheduling environments.
It finds that the sampling, shrinking and shuffling of a trace
must be done carefully to keep the classical metrics invariant or
linear variant in relation to size and times of the original
workload. This is accomplished by the careful control of the
submission period and the consideration of drifts in the
submission period and trace duration.
This methodology can help researchers to better evaluate their
scheduling algorithms and help HPC administrators to optimize the
parameters of production schedulers.
In order to assess the proposed methodology, we evaluated both
the FCFS with Backfill and Suspend/Resume scheduling algorithms.
The results strongly suggest that Suspend/Resume leads to a
better utilization of a supercomputer when high priorities are
given to big jobs
Parallel and Distributed Computing
The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing
Low Power Dynamic Scheduling for Computing Systems
This paper considers energy-aware control for a computing system with two
states: "active" and "idle." In the active state, the controller chooses to
perform a single task using one of multiple task processing modes. The
controller then saves energy by choosing an amount of time for the system to be
idle. These decisions affect processing time, energy expenditure, and an
abstract attribute vector that can be used to model other criteria of interest
(such as processing quality or distortion). The goal is to optimize time
average system performance. Applications of this model include a smart phone
that makes energy-efficient computation and transmission decisions, a computer
that processes tasks subject to rate, quality, and power constraints, and a
smart grid energy manager that allocates resources in reaction to a time
varying energy price. The solution methodology of this paper uses the theory of
optimization for renewal systems developed in our previous work. This paper is
written in tutorial form and develops the main concepts of the theory using
several detailed examples. It also highlights the relationship between online
dynamic optimization and linear fractional programming. Finally, it provides
exercises to help the reader learn the main concepts and apply them to their
own optimizations. This paper is an arxiv technical report, and is a
preliminary version of material that will appear as a book chapter in an
upcoming book on green communications and networking.Comment: 26 pages, 10 figures, single spac
Optimizing for a Many-Core Architecture without Compromising Ease-of-Programming
Faced with nearly stagnant clock speed advances, chip manufacturers have turned to parallelism as the source for continuing performance improvements. But even though numerous parallel architectures have already been brought to market, a universally accepted methodology for programming them for general purpose applications has yet to emerge. Existing solutions tend to be hardware-specific, rendering them difficult to use for the majority of application programmers and domain experts, and not providing scalability guarantees for future generations of the hardware.
This dissertation advances the validation of the following thesis: it is possible to develop efficient general-purpose programs for a many-core platform using a model recognized for its simplicity. To prove this thesis, we refer to the eXplicit Multi-Threading (XMT) architecture designed and built at the University of Maryland. XMT is an attempt at re-inventing parallel computing with a solid theoretical foundation and an aggressive scalable design. Algorithmically, XMT is inspired by the PRAM (Parallel Random Access Machine) model and the architecture design is focused on reducing inter-task communication and synchronization overheads and providing an easy-to-program parallel model.
This thesis builds upon the existing XMT infrastructure to improve support for efficient execution with a focus on ease-of-programming. Our contributions aim at reducing the programmer's effort in developing XMT applications and improving the overall performance. More concretely, we: (1) present a work-flow guiding programmers to produce efficient parallel solutions starting from a high-level problem; (2) introduce an analytical performance model for XMT programs and provide a methodology to project running time from an implementation; (3) propose and evaluate RAP -- an improved resource-aware compiler loop prefetching algorithm targeted at fine-grained many-core architectures; we demonstrate performance improvements of up to 34.79% on average over the GCC loop prefetching implementation and up to 24.61% on average over a simple hardware prefetching scheme; and (4) implement a number of parallel benchmarks and evaluate the overall performance of XMT relative to existing serial and parallel solutions, showing speedups of up to 13.89x vs.~ a serial processor and 8.10x vs.~parallel code optimized for an existing many-core (GPU). We also discuss the implementation and optimization of the Max-Flow algorithm on XMT, a problem which is among the more advanced in terms of complexity, benchmarking and research interest in the parallel algorithms community. We demonstrate better speed-ups compared to a best serial solution than previous attempts on other parallel platforms
Stochastic optimization of staffing for multiskill call centers
Dans cette thèse, nous étudions le problème d’optimisation des effectifs dans les centres d’appels, dans lequel nous visons à minimiser les coûts d’exploitation tout en offrant aux clients une qualité de service (QoS) élevée. Nous introduisons également l'utilisation de contraintes probabilistes qui exigent que la qualité de service soit satisfaite avec une probabilité donnée. Ces contraintes sont adéquates dans le cas où la performance est mesurée sur un court intervalle de temps, car les mesures de QoS sont des variables aléatoires sur une période donnée. Les problèmes de personnel proposés sont difficiles en raison de l'absence de forme analytique pour les contraintes probabilistes et doivent être approximées par simulation. En outre, les fonctions QoS sont généralement non linéaires et non convexes. Nous considérons les problèmes d’affectation personnel dans différents contextes et étudions les modèles proposés tant du point de vue théorique que pratique. Les méthodologies développées sont générales, en ce sens qu'elles peuvent être adaptées et appliquées à d'autres problèmes de décision dans les systèmes de files d'attente.
La thèse comprend trois articles traitant de différents défis en matière de modélisation et de résolution de problèmes d'optimisation d’affectation personnel dans les centres d'appels à compétences multiples. Les premier et deuxième article concernent un problème d'optimisation d'affectation de personnel en deux étapes sous l'incertitude. Alors que dans le second, nous étudions un modèle général de programmation stochastique discrète en deux étapes pour fournir une garantie théorique de la consistance de l'approximation par moyenne échantillonnale (SAA) lorsque la taille des échantillons tend vers l'infini, le troisième applique l'approche du SAA pour résoudre le problème d’optimisation d'affectation de personnel en deux étapes avec les taux d’arrivée incertain. Les deux articles indiquent la viabilité de l'approche SAA dans notre contexte, tant du point de vue théorique que pratique.
Pour être plus précis, dans le premier article, nous considérons un problème stochastique discret général en deux étapes avec des contraintes en espérance. Nous formulons un problème SAA avec échantillonnage imbriqué et nous montrons que, sous certaines hypothèses satisfaites dans les exemples de centres d'appels, il est possible d'obtenir les solutions optimales du problème initial en résolvant son SAA avec des échantillons suffisamment grands. De plus, nous montrons que la probabilité que la solution optimale du problème de l’échantillon soit une solution optimale du problème initial tend vers un de manière exponentielle au fur et à mesure que nous augmentons la taille des échantillons. Ces résultats théoriques sont importants, non seulement pour les applications de centre d'appels, mais également pour d'autres problèmes de prise de décision avec des variables de décision discrètes.
Le deuxième article concerne les méthodes de résolution d'un problème d'affectation en personnel en deux étapes sous incertitude du taux d'arrivée. Le problème SAA étant coûteux à résoudre lorsque le nombre de scénarios est important. En effet, pour chaque scénario, il est nécessaire d'effectuer une simulation pour estimer les contraintes de QoS. Nous développons un algorithme combinant simulation, génération de coupes, renforcement de coupes et décomposition de Benders pour résoudre le problème SAA. Nous montrons l'efficacité de l'approche, en particulier lorsque le nombre de scénarios est grand.
Dans le dernier article, nous examinons les problèmes de contraintes en probabilité sur les mesures de niveau de service. Notre méthodologie proposée dans cet article est motivée par le fait que les fonctions de QoS affichent généralement des courbes en S et peuvent être bien approximées par des fonctions sigmoïdes appropriées. Sur la base de cette idée, nous avons développé une nouvelle approche combinant la régression non linéaire, la simulation et la recherche locale par région de confiance pour résoudre efficacement les problèmes de personnel à grande échelle de manière viable. L’avantage principal de cette approche est que la procédure d’optimisation peut être formulée comme une séquence de simulations et de résolutions de problèmes de programmation linéaire. Les résultats numériques basés sur des exemples réels de centres d'appels montrent l'efficacité pratique de notre approche.
Les méthodologies développées dans cette thèse peuvent être appliquées dans de nombreux autres contextes, par exemple les problèmes de personnel et de planification dans d'autres systèmes basés sur des files d'attente avec d'autres types de contraintes de QoS. Celles-ci soulèvent également plusieurs axes de recherche qu'il pourrait être intéressant d'étudier. Par exemple, une approche de regroupement de scénarios pour atténuer le coût des modèles d'affectation en deux étapes, ou une version d'optimisation robuste en distribution pour mieux gérer l'incertitude des données.In this thesis, we study the staffing optimization problem in multiskill call centers, in which we aim at minimizing the operating cost while delivering a high quality of service (QoS) to customers. We also introduce the use of chance constraints which require that the QoSs are met with a given probability. These constraints are adequate in the case when the performance is measured over a short time interval as QoS measures are random variables in a given time period. The proposed staffing problems are challenging in the sense that the stochastic constraints have no-closed forms and need to be approximated by simulation. In addition, the QoS functions are typically non-linear and non-convex. We consider
staffing optimization problems in different settings and study the proposed models in both theoretical and practical aspects. The methodologies developed are general, in the sense that they can be adapted and applied to other staffing/scheduling problems in queuing-based systems.
The thesis consists of three articles dealing with different challenges in modeling and solving staffing optimization problems in multiskill call centers.
The first and second articles concern a two-stage staffing optimization problem under uncertainty. While in the first one, we study a general two-stage discrete stochastic programming model to provide a theoretical guarantee for the consistency of the sample average approximation (SAA) when the sample sizes go to infinity, the second one applies the SAA approach to solve the two-stage staffing optimization problem under arrival rate uncertainty. Both papers indicate the viability of the SAA approach in our context, in both theoretical and practical aspects.
To be more precise, in the first article, we consider a general two-stage discrete stochastic problem with expected value constraints. We formulate
its SAA with nested sampling. We show that under some assumptions that hold in call center examples, one can obtain the optimal solutions of the original problem by solving its SAA with large enough sample sizes. Moreover, we show that the probability that the optimal solution of the sample problem is an optimal solution of the original problem, approaches one exponentially fast as we increase the sample sizes. These theoretical findings are important, not only for call center applications, but also for other decision-making problems with discrete decision variables.
The second article concerns solution methods to solve a two-stage staffing problem under arrival rate uncertainty. It is motivated by the fact that the SAA version of the two-stage staffing problem becomes expensive to solve with a large number of scenarios, as for each scenario, one needs to use simulation to approximate the QoS constraints. We develop an algorithm that combines simulation, cut generation, cut strengthening and Benders decomposition to solve the SAA problem. We show the efficiency of the approach, especially when the number of scenarios is large.
In the last article, we consider problems with chance constraints on the service level measures. Our methodology proposed in this article is motivated by the fact that the QoS functions generally display ``S-shape'' curves and might be well approximated by appropriate sigmoid functions. Based on this idea, we develop a novel approach that combines non-linear regression, simulation and trust region local search to efficiently solve large-scale staffing problems in a viable way. The main advantage of the approach is that the optimization procedure can be formulated as a sequence of steps of performing simulation and solving linear programming models. Numerical results based on real-life call center examples show the practical viability of our approach.
The methodologies developed in this thesis can be applied in many other settings, e.g., staffing and scheduling problems in other queuing-based systems with other types of QoS constraints. These also raise several research directions that might be interesting to investigate. For examples, a clustering approach to mitigate the expensiveness of the two-stage staffing models, or a distributionally robust optimization version to better deal with data uncertainty
- …