3,946 research outputs found

    A hybrid Markov chain modeling architecture for workload on parallel computers

    Get PDF
    This paper proposes a comprehensive modeling architecture for workloads on parallel computers using Markov chains in combination with state dependent empirical distribution functions. This hybrid approach is based on the requirements of scheduling algorithms: the model considers the four essential job attributes submission time, number of required processors, estimated processing time, and actual processing time. To assess the goodness-of-fit of a workload model the similarity of sequences of real jobs and jobs generated from the model needs to be captured. We propose to reduce the complexity of this task and to evaluate the model by comparing the results of a widely-used scheduling algorithm instead. This approach is demonstrated with commonly used scheduling objectives like the Average Weighted Response Time and total Utilization. We compare their outcomes on the simulated workload traces from our model with those of an original workload trace from a real Massively Parallel Processing system installation. To verify this new evaluation technique, standard criteria for assessing the goodness-of-fit for workload models are additionally applied

    Experimental analysis of computer system dependability

    Get PDF
    This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

    Stochastic Modeling of Hybrid Cache Systems

    Full text link
    In recent years, there is an increasing demand of big memory systems so to perform large scale data analytics. Since DRAM memories are expensive, some researchers are suggesting to use other memory systems such as non-volatile memory (NVM) technology to build large-memory computing systems. However, whether the NVM technology can be a viable alternative (either economically and technically) to DRAM remains an open question. To answer this question, it is important to consider how to design a memory system from a "system perspective", that is, incorporating different performance characteristics and price ratios from hybrid memory devices. This paper presents an analytical model of a "hybrid page cache system" so to understand the diverse design space and performance impact of a hybrid cache system. We consider (1) various architectural choices, (2) design strategies, and (3) configuration of different memory devices. Using this model, we provide guidelines on how to design hybrid page cache to reach a good trade-off between high system throughput (in I/O per sec or IOPS) and fast cache reactivity which is defined by the time to fill the cache. We also show how one can configure the DRAM capacity and NVM capacity under a fixed budget. We pick PCM as an example for NVM and conduct numerical analysis. Our analysis indicates that incorporating PCM in a page cache system significantly improves the system performance, and it also shows larger benefit to allocate more PCM in page cache in some cases. Besides, for the common setting of performance-price ratio of PCM, "flat architecture" offers as a better choice, but "layered architecture" outperforms if PCM write performance can be significantly improved in the future.Comment: 14 pages; mascots 201

    Order Acceptance and Scheduling: A Taxonomy and Review

    Get PDF
    Over the past 20 years, the topic of order acceptance has attracted considerable attention from those who study scheduling and those who practice it. In a firm that strives to align its functions so that profit is maximized, the coordination of capacity with demand may require that business sometimes be turned away. In particular, there is a trade-off between the revenue brought in by a particular order, and all of its associated costs of processing. The present study focuses on the body of research that approaches this trade-off by considering two decisions: which orders to accept for processing, and how to schedule them. This paper presents a taxonomy and a review of this literature, catalogs its contributions and suggests opportunities for future research in this area

    Optimization of a parallel Monte Carlo method for linear algebra problems

    Get PDF
    Many problems in science and engineering can be represented by Systems of Linear Algebraic Equations (SLAEs). Numerical methods such as direct or iterative ones are used to solve these kind of systems. Depending on the size and other factors that characterize these systems they can be sometimes very difficult to solve even for iterative methods, requiring long time and large amounts of computational resources. In these cases a preconditioning approach should be applied. Preconditioning is a technique used to transform a SLAE into a equivalent but simpler system which requires less time and effort to be solved. The matrix which performs such transformation is called the preconditioner [7]. There are preconditioners for both direct and iterative methods but they are more commonly used among the later ones. In the general case a preconditioned system will require less effort to be solved than the original one. For example, when an iterative method is being used, less iterations will be required or each iteration will require less time, depending on the quality and the efficiency of the preconditioner. There are different classes of preconditioners but we will focused only on those that are based on the SParse Approximate Inverse (SPAI) approach. These algorithms are based on the fact that the approximate inverse of a given SLAE matrix can be used to approximate its result or to reduce its complexity. Monte Carlo methods are probabilistic methods, that use random numbers to either simulate a stochastic behaviour or to estimate the solution of a problem. They are good candidates for parallelization due to the fact that many independent samples are used to estimate the solution. These samples can be calculated in parallel, thereby speeding up the solution finding process [27]. In the past there has been a lot of research around the use of Monte Carlo methods to calculate SPAI preconditioners [1] [27] [10]. In this work we present the implementation of a SPAI preconditioner that is based on a Monte Carlo method. This algorithm calculates the matrix inverse by sampling a random variable which approximates the Neumann Series expansion. Using the Neumman series it is possible to calculate the matrix inverse of a system A by performing consecutive additions of the powers of a matrix expressed by the series expansion of (I − A) −1 . Given the stochastic approach of the Monte Carlo algorithm, the computational effort required to find an element of the inverse matrix is independent from the size of the matrix. This allows to target systems that, due to their size, can be prohibitive for common deterministic approaches [27]. Great part of this work is focused on the enhancement of this algorithm. First, the current errors of the implementation were fixed, making the algorithm able to target larger systems. Then multiple optimizations were applied at different stages of the implementation making a better use of the resources and improving the performance of the algorithm. Four optimizations, with consistently improvements have been performed: 1. An inefficient implementation of the realloc function within the MPI library was provoking the application to rapidly run out of memory. This function was replaced by the malloc function and some slight modifications to estimate the size of matrix A. 2. A coordinate format (COO) was introduced within the algorithm’s core to make a more efficient use of the memory, avoiding several unnecessary memory accesses. 3. A method to produce an intermediate matrix P was shown to produce similar results to the default one and with matrix P being reduced to a single vector, thus requiring less data. Given that this was a broadcast data a diminishing on it, translated into a reduction of the broadcast time. 4. Four individual procedures which accessed the whole initial matrix memory, were merged into two processes, reducing this way the number of memory accesses. For each optimization applied, a comparison was performed to show the particular improvements achieved. A set of different matrices, representing different SLAEs, was used to show the consistency of these improvements. In order to provide with insights about the scalability issues of the algorithm, other approaches are presented to show the particularities of the algorithm’s scalability: 1. Given that the original version of this algorithm was designed for a cluster of single-core machines, an hybrid approach of MPI + openMP was proposed to target the nowadays multi-core architectures. Surprisingly this new approach did not show any improvement but it was useful to show a scalability problem related to the random pattern used to access the memory. 2. Having that common MPI implementations of the broadcast operation do not take into account the different latencies between inter-node and intra-node communications [25]. Therefore, we decided to implement the broadcast in two steps. First by reaching a single process in each of the compute nodes and then using those processes to perform a local broadcast within their compute nodes. Results on this approach showed that this method could lead to improvements when very big systems are used. Finally a comparison is carried out between the optimized version of the Monte Carlo algorithm and the state of the art Modified SPAI (MSPAI). Four metrics are used to compare these approaches: 1. The amount of time needed for the preconditioner construction. 2. The time needed by the solver to calculate the solution of the preconditioned system. 3. The addition of the previous metrics, which gives a overview of the quality and efficiency of the preconditioner. 4. The number of cores used in the preconditioner construction. This gives an idea of the energy efficiency of the algorithm. Results from previous comparison showed that Monte Carlo algorithm can deal with both symmetric and nonsymmetric matrices while MSPAI only performs well with the nonsymetric ones. Furthermore the time for Monte Carlo’s algorithm is always faster for the preconditioner construction and most of the times also for the solver calculation. This means that Monte Carlo produces preconditioners of better or same quality than MSPAI. Finally, the number of cores used in the Monte Carlo approach is always equal or smaller than in the case of MSPAI

    A Variable Neighborhood MOEA/D for Multiobjective Test Task Scheduling Problem

    Get PDF
    Test task scheduling problem (TTSP) is a typical combinational optimization scheduling problem. This paper proposes a variable neighborhood MOEA/D (VNM) to solve the multiobjective TTSP. Two minimization objectives, the maximal completion time (makespan) and the mean workload, are considered together. In order to make solutions obtained more close to the real Pareto Front, variable neighborhood strategy is adopted. Variable neighborhood approach is proposed to render the crossover span reasonable. Additionally, because the search space of the TTSP is so large that many duplicate solutions and local optima will exist, the Starting Mutation is applied to prevent solutions from becoming trapped in local optima. It is proved that the solutions got by VNM can converge to the global optimum by using Markov Chain and Transition Matrix, respectively. The experiments of comparisons of VNM, MOEA/D, and CNSGA (chaotic nondominated sorting genetic algorithm) indicate that VNM performs better than the MOEA/D and the CNSGA in solving the TTSP. The results demonstrate that proposed algorithm VNM is an efficient approach to solve the multiobjective TTSP
    • …
    corecore