359 research outputs found

    Master/worker parallel discrete event simulation

    Get PDF
    The execution of parallel discrete event simulation across metacomputing infrastructures is examined. A master/worker architecture for parallel discrete event simulation is proposed providing robust executions under a dynamic set of services with system-level support for fault tolerance, semi-automated client-directed load balancing, portability across heterogeneous machines, and the ability to run codes on idle or time-sharing clients without significant interaction by users. Research questions and challenges associated with issues and limitations with the work distribution paradigm, targeted computational domain, performance metrics, and the intended class of applications to be used in this context are analyzed and discussed. A portable web services approach to master/worker parallel discrete event simulation is proposed and evaluated with subsequent optimizations to increase the efficiency of large-scale simulation execution through distributed master service design and intrinsic overhead reduction. New techniques for addressing challenges associated with optimistic parallel discrete event simulation across metacomputing such as rollbacks and message unsending with an inherently different computation paradigm utilizing master services and time windows are proposed and examined. Results indicate that a master/worker approach utilizing loosely coupled resources is a viable means for high throughput parallel discrete event simulation by enhancing existing computational capacity or providing alternate execution capability for less time-critical codes.Ph.D.Committee Chair: Fujimoto, Richard; Committee Member: Bader, David; Committee Member: Perumalla, Kalyan; Committee Member: Riley, George; Committee Member: Vuduc, Richar

    Queuing with future information

    Full text link
    We study an admissions control problem, where a queue with service rate 1p1-p receives incoming jobs at rate λ(1p,1)\lambda\in(1-p,1), and the decision maker is allowed to redirect away jobs up to a rate of pp, with the objective of minimizing the time-average queue length. We show that the amount of information about the future has a significant impact on system performance, in the heavy-traffic regime. When the future is unknown, the optimal average queue length diverges at rate log1/(1p)11λ\sim\log_{1/(1-p)}\frac{1}{1-\lambda}, as λ1\lambda\to 1. In sharp contrast, when all future arrival and service times are revealed beforehand, the optimal average queue length converges to a finite constant, (1p)/p(1-p)/p, as λ1\lambda\to1. We further show that the finite limit of (1p)/p(1-p)/p can be achieved using only a finite lookahead window starting from the current time frame, whose length scales as O(log11λ)\mathcal{O}(\log\frac{1}{1-\lambda}), as λ1\lambda\to1. This leads to the conjecture of an interesting duality between queuing delay and the amount of information about the future.Comment: Published in at http://dx.doi.org/10.1214/13-AAP973 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Lookahead scheduling in a real-time context: Models, algorithms, and analysis

    Get PDF
    Our research considers job scheduling, a special type of resource assignment problem. For example, at a cross-docking facility trucks must be assigned to doors where they will be unloaded. The cargo on each truck has various destinations within the facility, and the unloading time for a truck is dependent on the distance from the assigned door to these destinations. The goal is to assign the trucks to doors while minimizing the amount of time to unload all trucks.;We study scheduling algorithms for problems like the cross-docking example that are different from traditional algorithms in two ways. First, we utilize real-time, where the algorithm executes at the same time as when the jobs are handled. Because the time used by the algorithm to make decisions cannot be used to complete a job, these decisions must be made quickly Second, our algorithms utilize lookahead, or partial knowledge of jobs that will arrive in the future.;The three goals of this research were to demonstrate that lookahead algorithms can be implemented effectively in a real-time context, to measure the amount of improvement gained by utilizing lookahead, and to explore the conditions in which lookahead is beneficial.;We present a model suitable for representing problems that include lookahead in a real-time context. Using this model, we develop lookahead algorithms for two important job scheduling systems and argue that these algorithms make decisions efficiently. We then study the performance of lookahead algorithms using mathematical analysis and simulation.;Our results provide a detailed picture of the behavior of lookahead algorithms in a real-time context. Our analytical study shows that lookahead algorithms produce schedules that are significantly better than those without lookahead. We also found that utilizing Lookahead-1, or knowledge of the next arriving job, produces substantial improvement while requiring the least effort to design. When more lookahead information is used, the solutions are better, but the amount of improvement is not significantly larger than a Lookahead-1 algorithm. Further, algorithms utilizing more lookahead are more complex to design, implement, and analyze. We conclude that Lookahead-1 algorithms are the best balance between improvement and design effort

    Scheduling of data-intensive workloads in a brokered virtualized environment

    Full text link
    Providing performance predictability guarantees is increasingly important in cloud platforms, especially for data-intensive applications, for which performance depends greatly on the available rates of data transfer between the various computing/storage hosts underlying the virtualized resources assigned to the application. With the increased prevalence of brokerage services in cloud platforms, there is a need for resource management solutions that consider the brokered nature of these workloads, as well as the special demands of their intra-dependent components. In this paper, we present an offline mechanism for scheduling batches of brokered data-intensive workloads, which can be extended to an online setting. The objective of the mechanism is to decide on a packing of the workloads in a batch that minimizes the broker's incurred costs, Moreover, considering the brokered nature of such workloads, we define a payment model that provides incentives to these workloads to be scheduled as part of a batch, which we analyze theoretically. Finally, we evaluate the proposed scheduling algorithm, and exemplify the fairness of the payment model in practical settings via trace-based experiments

    Framework for sustainable TVET-Teacher Education Program in Malaysia Public Universities

    Get PDF
    Studies had stated that less attention was given to the education aspect, such as teaching and learning in planning for improving the TVET system. Due to the 21st Century context, the current paradigm of teaching for the TVET educators also has been reported to be fatal and need to be shifted. All these disadvantages reported hindering the country from achieving the 5th strategy in the Strategic Plan for Vocational Education Transformation to transform TVET system as a whole. Therefore, this study aims to develop a framework for sustainable TVET Teacher Education program in Malaysia. This study had adopted an Exploratory Sequential Mix-Method design, which involves a semi-structured interview (phase one) and survey method (phase two). Nine experts had involved in phase one chosen by using Purposive Sampling Technique. As in phase two, 118 TVET-TE program lecturers were selected as the survey sample chosen through random sampling method. After data analysis in phase one (thematic analysis) and phase two (Principal Component Analysis), eight domains and 22 elements have been identified for the framework for sustainable TVET-TE program in Malaysia. This framework was identified to embed the elements of 21st Century Education, thus filling the gap in this research. The research findings also indicate that the developed framework was unidimensional and valid for the development and research regarding TVET-TE program in Malaysia. Lastly, it is in the hope that this research can be a guide for the nations in producing a quality TVET teacher in the future

    Backfilling with fairness and slack for parallel job scheduling

    Get PDF
    Parallel jobs have different runtimes and numbers of threads/processes. Thus, scheduling parallel jobs involves a packing problem. If jobs are packed as tightly as possible, utilization will be improved. Otherwise, some resources have to stay idle. The common solution to deal with idle resources is backfilling, which schedule smaller jobs submitted later to execute earlier as long as they do not postpone the first job or all the previous jobs in the waiting queue. Traditionally, backfilling uses first fit for idle resources, according to the submission order. However, in this case, better packing of jobs could be missed. Hence, we propose an algorithm which looks further ahead if significantly improving utilization. However at the same time, this could be unfair to some jobs ahead in the queue. So we use a delay factor as a constraint to limit unfairness. We propose a branch and bound algorithm which selects jobs for backfilling which keep utilization high, while trying to stay close to First-Come-First-Served (FCFS). We evaluate relative response time and utilization and compare to other backfilling approaches. The selection of jobs for backfilling to optimize for high utilization and low delay is implemented as an extension of the existing Scojo-PECT preemptive scheduler

    PRODUCTION SEQUENCING AND STABILITY ANALYSIS OF A JUST-IN-TIME SYSTEM WITH SEQUENCE DEPENDENT SETUPS

    Get PDF
    Just-In-Time (JIT) production systems is a popular area for researchers but real-world issues such as sequence dependent setups are often overlooked. This research investigates an approach for determining stability and an approach for mixed product sequencing in production systems with sequence dependent setups and buffer thresholds which signal replenishment of a given buffer. Production systems in this research operate under JIT pull production principles by producing only when demand exists and idle when no demand exists. In the first approach, an iterative method is presented to determine stability for a multi-product production system that operates with replenishment signals and may have sequence dependent setups. In this method, a network of nodes representing machine states and arcs representing the buffer inventory levels is used to find a stable trajectory for the production system via an iterative procedure. The method determines suitable buffer levels for the production system that ensure that a trajectory originating from any point within a buffer region will always map to a point contained on another buffer region for all future mappings. This iterative method for determining the stability of a production system was implemented using an algorithm to calculate the buffer inventory regions for all arcs in a given arc-node network. The algorithm showed favorable results for two and three product systems in which sequence dependent setups may exist. In the second approach, a product sequencing algorithm determines a product sequence for a production system based on system parameters – setup times, buffer levels, usage rates, production rates, etc. The algorithm selects a product by evaluating the goodness of each product that has reached the replenishment threshold at the current time. The algorithm also incorporates a lookahead function that calculates the goodness for some time interval into the future. The lookahead function considers all branches of the tree of potential sequences to prevent the sequence from travelling down a dead-end branch in which the system will be unable to avoid a depleted buffer. The sequencing algorithm allows the user to weight the five terms of the goodness equations (current and lookahead) to control the behavior of the sequence

    The virtual time machine

    Get PDF
    Journal ArticleExisting multiprocessors and multicomputers require the programmer or compiler to perform data dependence analysis at compile time. We propose a parallel computer that performs this task at runtime. In particular, the Virtual Time Machine (VTM) detects violations of data dependence constraints as they occur, and automatically recovers from them. A sophisticated memory system that is addressed using both a spatial and a temporal coordinate is used to efficiently implement this mechanism. Initially targeted for discrete event simulation applications, many of the ideas used in the machine architecture have direct application in the more general realm of parallel computation. The long term goal of this work is to develop a general purpose parallel computer that will support a wide range of parallel programming paradigms. This paper outlines the motivations behind the V TM architecture, the underlying computation model, a proposed implementation, and initial performance results. A recurring theme that pervades the entire paper is our contention that existing shared memory and message-base machines do not pay adequate attention to the dimension of time. We argue that this architectural deficiency is the underlying reason behind many difficult problems in parallel computation today

    G-LOMARC-TS: Lookahead group matchmaking for time/space sharing on multi-core parallel machines

    Get PDF
    Parallel machines with multi-core nodes are becoming increasingly popular. The performances of applications running on these machines are improved gradually due to the resource competition in each node. Researches have found that coscheduling different applications with complementary resource characteristics on the same set of nodes (semi time sharing) may improve the performance. We propose a scheduling algorithm G-LOMARC-TS which incorporates both space and semi time sharing scheduling methods and matches groups of jobs if possible for coscheduling. Since matchmaking may select jobs further down the waiting queue and the jobs in front of the queue may be delayed subsequently, fairness for each individual job will be watched and the delay will be kept within a limited bound. Several heuristics are used to solve the NP-complete problem of forming groups. Our experiment results show both utilization gain and average relative response time improvements of G-LOMARC-TS over other several scheduling policies

    Algorithms incorporating concurrency and caching

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 189-203).This thesis describes provably good algorithms for modern large-scale computer systems, including today's multicores. Designing efficient algorithms for these systems involves overcoming many challenges, including concurrency (dealing with parallel accesses to the same data) and caching (achieving good memory performance.) This thesis includes two parallel algorithms that focus on testing for atomicity violations in a parallel fork-join program. These algorithms augment a parallel program with a data structure that answers queries about the program's structure, on the fly. Specifically, one data structure, called SP-ordered-bags, maintains the series-parallel relationships among threads, which is vital for uncovering race conditions (bugs) in the program. Another data structure, called XConflict, aids in detecting conflicts in a transactional-memory system with nested parallel transactions. For a program with work T and span To, maintaining either data structure adds an overhead of PT, to the running time of the parallel program when executed on P processors using an efficient scheduler, yielding a total runtime of O(T1/P + PTo). For each of these data structures, queries can be answered in 0(1) time. This thesis also introduces the compressed sparse rows (CSB) storage format for sparse matrices, which allows both Ax and ATx to be computed efficiently in parallel, where A is an n x n sparse matrix with nnz > n nonzeros and x is a dense n-vector. The parallel multiplication algorithm uses e(nnz) work and ... span, yielding a parallelism of ... , which is amply high for virtually any large matrix.(cont.) Also addressing concurrency, this thesis considers two scheduling problems. The first scheduling problem, motivated by transactional memory, considers randomized backoff when jobs have different lengths. I give an analysis showing that binary exponential backoff achieves makespan V2e(6v 1- i ) with high probability, where V is the total length of all n contending jobs. This bound is significantly larger than when jobs are all the same size. A variant of exponential backoff, however, achieves makespan of ... with high probability. I also present the size-hashed backoff protocol, specifically designed for jobs having different lengths, that achieves makespan ... with high probability. The second scheduling problem considers scheduling n unit-length jobs on m unrelated machines, where each job may fail probabilistically. Specifically, an input consists of a set of n jobs, a directed acyclic graph G describing the precedence constraints among jobs, and a failure probability qij for each job j and machine i. The goal is to find a schedule that minimizes the expected makespan. I give an O(log log(min {m, n}))-approximation for the case of independent jobs (when there are no precedence constraints) and an O(log(n + m) log log(min {m, n}))-approximation algorithm when precedence constraints form disjoint chains. This chain algorithm can be extended into one that supports precedence constraints that are trees, which worsens the approximation by another log(n) factor. To address caching, this thesis includes several new variants of cache-oblivious dynamic dictionaries.(cont.) A cache-oblivious dictionary fills the same niche as a classic B-tree, but it does so without tuning for particular memory parameters. Thus, cache-oblivious dictionaries optimize for all levels of a multilevel hierarchy and are more portable than traditional B-trees. I describe how to add concurrency to several previously existing cache-oblivious dictionaries. I also describe two new data structures that achieve significantly cheaper insertions with a small overhead on searches. The cache-oblivious lookahead array (COLA) supports insertions/deletions and searches in O((1/B) log N) and O(log N) memory transfers, respectively, where B is the block size, M is the memory size, and N is the number of elements in the data structure. The xDict supports these operations in O((1/1B E1-) logB(N/M)) and O((1/)0logB(N/M)) memory transfers, respectively, where 0 < E < 1 is a tunable parameter. Also on caching, this thesis answers the question: what is the worst possible page-replacement strategy? The goal of this whimsical chapter is to devise an online strategy that achieves the highest possible fraction of page faults / cache misses as compared to the worst offline strategy. I show that there is no deterministic strategy that is competitive with the worst offline. I also give a randomized strategy based on the most recently used heuristic and show that it is the worst possible pagereplacement policy. On a more serious note, I also show that direct mapping is, in some sense, a worst possible page-replacement policy. Finally, this thesis includes a new algorithm, following a new approach, for the problem of maintaining a topological ordering of a dag as edges are dynamically inserted.(cont.) The main result included here is an O(n2 log n) algorithm for maintaining a topological ordering in the presence of up to m < n(n - 1)/2 edge insertions. In contrast, the previously best algorithm has a total running time of O(min { m3/ 2, n5/2 }). Although these algorithms are not parallel and do not exhibit particularly good locality, some of the data structural techniques employed in my solution are similar to others in this thesis.by Jeremy T. Fineman.Ph.D
    corecore