3,946 research outputs found
A hybrid Markov chain modeling architecture for workload on parallel computers
This paper proposes a comprehensive modeling architecture for workloads on parallel computers using Markov chains in combination with state dependent empirical distribution functions. This hybrid approach is based on the requirements of scheduling algorithms: the model considers the four essential job attributes submission time, number of required processors, estimated processing time, and actual processing time. To assess the goodness-of-fit of a workload model the similarity of sequences of real jobs and jobs generated from the model needs to be captured. We propose to reduce the complexity of this task and to evaluate the model by comparing the results of a widely-used scheduling algorithm instead. This approach is demonstrated with commonly used scheduling objectives like the Average Weighted Response Time and total Utilization. We compare their outcomes on the simulated workload traces from our model with those of an original workload trace from a real Massively Parallel Processing system installation. To verify this new evaluation technique, standard criteria for assessing the goodness-of-fit for workload models are additionally applied
Experimental analysis of computer system dependability
This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance
Stochastic Modeling of Hybrid Cache Systems
In recent years, there is an increasing demand of big memory systems so to
perform large scale data analytics. Since DRAM memories are expensive, some
researchers are suggesting to use other memory systems such as non-volatile
memory (NVM) technology to build large-memory computing systems. However,
whether the NVM technology can be a viable alternative (either economically and
technically) to DRAM remains an open question. To answer this question, it is
important to consider how to design a memory system from a "system
perspective", that is, incorporating different performance characteristics and
price ratios from hybrid memory devices.
This paper presents an analytical model of a "hybrid page cache system" so to
understand the diverse design space and performance impact of a hybrid cache
system. We consider (1) various architectural choices, (2) design strategies,
and (3) configuration of different memory devices. Using this model, we provide
guidelines on how to design hybrid page cache to reach a good trade-off between
high system throughput (in I/O per sec or IOPS) and fast cache reactivity which
is defined by the time to fill the cache. We also show how one can configure
the DRAM capacity and NVM capacity under a fixed budget. We pick PCM as an
example for NVM and conduct numerical analysis. Our analysis indicates that
incorporating PCM in a page cache system significantly improves the system
performance, and it also shows larger benefit to allocate more PCM in page
cache in some cases. Besides, for the common setting of performance-price ratio
of PCM, "flat architecture" offers as a better choice, but "layered
architecture" outperforms if PCM write performance can be significantly
improved in the future.Comment: 14 pages; mascots 201
Order Acceptance and Scheduling: A Taxonomy and Review
Over the past 20 years, the topic of order acceptance has attracted considerable attention from those who study scheduling and those who practice it. In a firm that strives to align its functions so that profit is maximized, the coordination of capacity with demand may require that business sometimes be turned away. In particular, there is a trade-off between the revenue brought in by a particular order, and all of its associated costs of processing. The present study focuses on the body of research that approaches this trade-off by considering two decisions: which orders to accept for processing, and how to schedule them. This paper presents a taxonomy and a review of this literature, catalogs its contributions and suggests opportunities for future research in this area
Optimization of a parallel Monte Carlo method for linear algebra problems
Many problems in science and engineering can be represented by Systems of
Linear Algebraic Equations (SLAEs). Numerical methods such as direct or
iterative ones are used to solve these kind of systems. Depending on the size
and other factors that characterize these systems they can be sometimes
very difficult to solve even for iterative methods, requiring long time and
large amounts of computational resources. In these cases a preconditioning
approach should be applied.
Preconditioning is a technique used to transform a SLAE into a equivalent
but simpler system which requires less time and effort to be solved. The
matrix which performs such transformation is called the preconditioner [7].
There are preconditioners for both direct and iterative methods but they
are more commonly used among the later ones.
In the general case a preconditioned system will require less effort to
be solved than the original one. For example, when an iterative method is
being used, less iterations will be required or each iteration will require less
time, depending on the quality and the efficiency of the preconditioner.
There are different classes of preconditioners but we will focused only on
those that are based on the SParse Approximate Inverse (SPAI) approach.
These algorithms are based on the fact that the approximate inverse of a
given SLAE matrix can be used to approximate its result or to reduce its
complexity.
Monte Carlo methods are probabilistic methods, that use random numbers
to either simulate a stochastic behaviour or to estimate the solution of
a problem. They are good candidates for parallelization due to the fact that
many independent samples are used to estimate the solution. These samples
can be calculated in parallel, thereby speeding up the solution finding
process [27].
In the past there has been a lot of research around the use of Monte
Carlo methods to calculate SPAI preconditioners [1] [27] [10]. In this work
we present the implementation of a SPAI preconditioner that is based on a Monte Carlo method. This algorithm calculates the matrix inverse by sampling
a random variable which approximates the Neumann Series expansion.
Using the Neumman series it is possible to calculate the matrix inverse of
a system A by performing consecutive additions of the powers of a matrix
expressed by the series expansion of (I − A)
−1
.
Given the stochastic approach of the Monte Carlo algorithm, the computational
effort required to find an element of the inverse matrix is independent
from the size of the matrix. This allows to target systems that, due
to their size, can be prohibitive for common deterministic approaches [27].
Great part of this work is focused on the enhancement of this algorithm.
First, the current errors of the implementation were fixed, making the algorithm
able to target larger systems. Then multiple optimizations were
applied at different stages of the implementation making a better use of the
resources and improving the performance of the algorithm.
Four optimizations, with consistently improvements have been performed:
1. An inefficient implementation of the realloc function within the MPI
library was provoking the application to rapidly run out of memory.
This function was replaced by the malloc function and some slight
modifications to estimate the size of matrix A.
2. A coordinate format (COO) was introduced within the algorithm’s
core to make a more efficient use of the memory, avoiding several
unnecessary memory accesses.
3. A method to produce an intermediate matrix P was shown to produce
similar results to the default one and with matrix P being reduced to a
single vector, thus requiring less data. Given that this was a broadcast
data a diminishing on it, translated into a reduction of the broadcast
time.
4. Four individual procedures which accessed the whole initial matrix
memory, were merged into two processes, reducing this way the number
of memory accesses.
For each optimization applied, a comparison was performed to show the
particular improvements achieved. A set of different matrices, representing
different SLAEs, was used to show the consistency of these improvements.
In order to provide with insights about the scalability issues of the algorithm,
other approaches are presented to show the particularities of the
algorithm’s scalability: 1. Given that the original version of this algorithm was designed for a
cluster of single-core machines, an hybrid approach of MPI + openMP
was proposed to target the nowadays multi-core architectures. Surprisingly
this new approach did not show any improvement but it was
useful to show a scalability problem related to the random pattern
used to access the memory.
2. Having that common MPI implementations of the broadcast operation
do not take into account the different latencies between inter-node and
intra-node communications [25]. Therefore, we decided to implement
the broadcast in two steps. First by reaching a single process in each
of the compute nodes and then using those processes to perform a
local broadcast within their compute nodes. Results on this approach
showed that this method could lead to improvements when very big
systems are used.
Finally a comparison is carried out between the optimized version of the
Monte Carlo algorithm and the state of the art Modified SPAI (MSPAI).
Four metrics are used to compare these approaches:
1. The amount of time needed for the preconditioner construction.
2. The time needed by the solver to calculate the solution of the preconditioned
system.
3. The addition of the previous metrics, which gives a overview of the
quality and efficiency of the preconditioner.
4. The number of cores used in the preconditioner construction. This
gives an idea of the energy efficiency of the algorithm.
Results from previous comparison showed that Monte Carlo algorithm
can deal with both symmetric and nonsymmetric matrices while MSPAI
only performs well with the nonsymetric ones. Furthermore the time for
Monte Carlo’s algorithm is always faster for the preconditioner construction
and most of the times also for the solver calculation. This means that Monte
Carlo produces preconditioners of better or same quality than MSPAI. Finally,
the number of cores used in the Monte Carlo approach is always equal
or smaller than in the case of MSPAI
A Variable Neighborhood MOEA/D for Multiobjective Test Task Scheduling Problem
Test task scheduling problem (TTSP) is a typical combinational optimization scheduling problem. This paper proposes a variable neighborhood MOEA/D (VNM) to solve the multiobjective TTSP. Two minimization objectives, the maximal completion time (makespan) and the mean workload, are considered together. In order to make solutions obtained more close to the real Pareto Front, variable neighborhood strategy is adopted. Variable neighborhood approach is proposed to render the crossover span reasonable. Additionally, because the search space of the TTSP is so large that many duplicate solutions and local optima will exist, the Starting Mutation is applied to prevent solutions from becoming trapped in local optima. It is proved that the solutions got by VNM can converge to the global optimum by using Markov Chain and Transition Matrix, respectively. The experiments of comparisons of VNM, MOEA/D, and CNSGA (chaotic nondominated sorting genetic algorithm) indicate that VNM performs better than the MOEA/D and the CNSGA in solving the TTSP. The results demonstrate that proposed algorithm VNM is an efficient approach to solve the multiobjective TTSP
- …