6,242 research outputs found
Lattice QCD Thermodynamics on the Grid
We describe how we have used simultaneously nodes of the
EGEE Grid, accumulating ca. 300 CPU-years in 2-3 months, to determine an
important property of Quantum Chromodynamics. We explain how Grid resources
were exploited efficiently and with ease, using user-level overlay based on
Ganga and DIANE tools above standard Grid software stack. Application-specific
scheduling and resource selection based on simple but powerful heuristics
allowed to improve efficiency of the processing to obtain desired scientific
results by a specified deadline. This is also a demonstration of combined use
of supercomputers, to calculate the initial state of the QCD system, and Grids,
to perform the subsequent massively distributed simulations. The QCD simulation
was performed on a lattice. Keeping the strange quark mass at
its physical value, we reduced the masses of the up and down quarks until,
under an increase of temperature, the system underwent a second-order phase
transition to a quark-gluon plasma. Then we measured the response of this
system to an increase in the quark density. We find that the transition is
smoothened rather than sharpened. If confirmed on a finer lattice, this finding
makes it unlikely for ongoing experimental searches to find a QCD critical
point at small chemical potential
Discovering Job Preemptions in the Open Science Grid
The Open Science Grid(OSG) is a world-wide computing system which facilitates
distributed computing for scientific research. It can distribute a
computationally intensive job to geo-distributed clusters and process job's
tasks in parallel. For compute clusters on the OSG, physical resources may be
shared between OSG and cluster's local user-submitted jobs, with local jobs
preempting OSG-based ones. As a result, job preemptions occur frequently in
OSG, sometimes significantly delaying job completion time.
We have collected job data from OSG over a period of more than 80 days. We
present an analysis of the data, characterizing the preemption patterns and
different types of jobs. Based on observations, we have grouped OSG jobs into 5
categories and analyze the runtime statistics for each category. we further
choose different statistical distributions to estimate probability density
function of job runtime for different classes.Comment: 8 page
Enhancing Job Scheduling of an Atmospheric Intensive Data Application
Nowadays, e-Science applications involve great deal of data to have more accurate analysis. One of its application domains is the Radio Occultation which manages satellite data. Grid Processing Management is a physical infrastructure geographically distributed based on Grid Computing, that is implemented for the overall processing Radio Occultation analysis. After a brief description of algorithms adopted to characterize atmospheric profiles, the paper presents an improvement of job scheduling in order to decrease processing time and optimize resource utilization. Extension of grid computing capacity is implemented by virtual machines in existing physical Grid in order to satisfy temporary job requests. Also scheduling plays an important role in the infrastructure that is handled by a couple of schedulers which are developed to manage data automaticall
Libra: An Economy driven Job Scheduling System for Clusters
Clusters of computers have emerged as mainstream parallel and distributed
platforms for high-performance, high-throughput and high-availability
computing. To enable effective resource management on clusters, numerous
cluster managements systems and schedulers have been designed. However, their
focus has essentially been on maximizing CPU performance, but not on improving
the value of utility delivered to the user and quality of services. This paper
presents a new computational economy driven scheduling system called Libra,
which has been designed to support allocation of resources based on the users?
quality of service (QoS) requirements. It is intended to work as an add-on to
the existing queuing and resource management system. The first version has been
implemented as a plugin scheduler to the PBS (Portable Batch System) system.
The scheduler offers market-based economy driven service for managing batch
jobs on clusters by scheduling CPU time according to user utility as determined
by their budget and deadline rather than system performance considerations. The
Libra scheduler ensures that both these constraints are met within an O(n)
run-time. The Libra scheduler has been simulated using the GridSim toolkit to
carry out a detailed performance analysis. Results show that the deadline and
budget based proportional resource allocation strategy improves the utility of
the system and user satisfaction as compared to system-centric scheduling
strategies.Comment: 13 page
Compositional competitiveness for distributed algorithms
We define a measure of competitive performance for distributed algorithms
based on throughput, the number of tasks that an algorithm can carry out in a
fixed amount of work. This new measure complements the latency measure of Ajtai
et al., which measures how quickly an algorithm can finish tasks that start at
specified times. The novel feature of the throughput measure, which
distinguishes it from the latency measure, is that it is compositional: it
supports a notion of algorithms that are competitive relative to a class of
subroutines, with the property that an algorithm that is k-competitive relative
to a class of subroutines, combined with an l-competitive member of that class,
gives a combined algorithm that is kl-competitive.
In particular, we prove the throughput-competitiveness of a class of
algorithms for collect operations, in which each of a group of n processes
obtains all values stored in an array of n registers. Collects are a
fundamental building block of a wide variety of shared-memory distributed
algorithms, and we show that several such algorithms are competitive relative
to collects. Inserting a competitive collect in these algorithms gives the
first examples of competitive distributed algorithms obtained by composition
using a general construction.Comment: 33 pages, 2 figures; full version of STOC 96 paper titled "Modular
competitiveness for distributed algorithms.
Checkpointing as a Service in Heterogeneous Cloud Environments
A non-invasive, cloud-agnostic approach is demonstrated for extending
existing cloud platforms to include checkpoint-restart capability. Most cloud
platforms currently rely on each application to provide its own fault
tolerance. A uniform mechanism within the cloud itself serves two purposes: (a)
direct support for long-running jobs, which would otherwise require a custom
fault-tolerant mechanism for each application; and (b) the administrative
capability to manage an over-subscribed cloud by temporarily swapping out jobs
when higher priority jobs arrive. An advantage of this uniform approach is that
it also supports parallel and distributed computations, over both TCP and
InfiniBand, thus allowing traditional HPC applications to take advantage of an
existing cloud infrastructure. Additionally, an integrated health-monitoring
mechanism detects when long-running jobs either fail or incur exceptionally low
performance, perhaps due to resource starvation, and proactively suspends the
job. The cloud-agnostic feature is demonstrated by applying the implementation
to two very different cloud platforms: Snooze and OpenStack. The use of a
cloud-agnostic architecture also enables, for the first time, migration of
applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201
Performance analysis of downlink shared channels in a UMTS network
In light of the expected growth in wireless data communications and the commonly anticipated up/downlink asymmetry, we present a performance analysis of downlink data transfer over \textsc{d}ownlink \textsc{s}hared \textsc{ch}annels (\textsc{dsch}s), arguably the most efficient \textsc{umts} transport channel for medium-to-large data transfers. It is our objective to provide qualitative insight in the different aspects that influence the data \textsc{q}uality \textsc{o}f \textsc{s}ervice (\textsc{qos}). As a most principal factor, the data traffic load affects the data \textsc{qos} in two distinct manners: {\em (i)} a heavier data traffic load implies a greater competition for \textsc{dsch} resources and thus longer transfer delays; and {\em (ii)} since each data call served on a \textsc{dsch} must maintain an \textsc{a}ssociated \textsc{d}edicated \textsc{ch}annel (\textsc{a}-\textsc{dch}) for signalling purposes, a heavier data traffic load implies a higher interference level, a higher frame error rate and thus a lower effective aggregate \textsc{dsch} throughput: {\em the greater the demand for service, the smaller the aggregate service capacity.} The latter effect is further amplified in a multicellular scenario, where a \textsc{dsch} experiences additional interference from the \textsc{dsch}s and \textsc{a}-\textsc{dch}s in surrounding cells, causing a further degradation of its effective throughput. Following an insightful two-stage performance evaluation approach, which segregates the interference aspects from the traffic dynamics, a set of numerical experiments is executed in order to demonstrate these effects and obtain qualitative insight in the impact of various system aspects on the data \textsc{qos}
- âŠ