6,242 research outputs found

    Lattice QCD Thermodynamics on the Grid

    Full text link
    We describe how we have used simultaneously O(103){\cal O}(10^3) nodes of the EGEE Grid, accumulating ca. 300 CPU-years in 2-3 months, to determine an important property of Quantum Chromodynamics. We explain how Grid resources were exploited efficiently and with ease, using user-level overlay based on Ganga and DIANE tools above standard Grid software stack. Application-specific scheduling and resource selection based on simple but powerful heuristics allowed to improve efficiency of the processing to obtain desired scientific results by a specified deadline. This is also a demonstration of combined use of supercomputers, to calculate the initial state of the QCD system, and Grids, to perform the subsequent massively distributed simulations. The QCD simulation was performed on a 163×416^3\times 4 lattice. Keeping the strange quark mass at its physical value, we reduced the masses of the up and down quarks until, under an increase of temperature, the system underwent a second-order phase transition to a quark-gluon plasma. Then we measured the response of this system to an increase in the quark density. We find that the transition is smoothened rather than sharpened. If confirmed on a finer lattice, this finding makes it unlikely for ongoing experimental searches to find a QCD critical point at small chemical potential

    Discovering Job Preemptions in the Open Science Grid

    Full text link
    The Open Science Grid(OSG) is a world-wide computing system which facilitates distributed computing for scientific research. It can distribute a computationally intensive job to geo-distributed clusters and process job's tasks in parallel. For compute clusters on the OSG, physical resources may be shared between OSG and cluster's local user-submitted jobs, with local jobs preempting OSG-based ones. As a result, job preemptions occur frequently in OSG, sometimes significantly delaying job completion time. We have collected job data from OSG over a period of more than 80 days. We present an analysis of the data, characterizing the preemption patterns and different types of jobs. Based on observations, we have grouped OSG jobs into 5 categories and analyze the runtime statistics for each category. we further choose different statistical distributions to estimate probability density function of job runtime for different classes.Comment: 8 page

    Enhancing Job Scheduling of an Atmospheric Intensive Data Application

    Get PDF
    Nowadays, e-Science applications involve great deal of data to have more accurate analysis. One of its application domains is the Radio Occultation which manages satellite data. Grid Processing Management is a physical infrastructure geographically distributed based on Grid Computing, that is implemented for the overall processing Radio Occultation analysis. After a brief description of algorithms adopted to characterize atmospheric profiles, the paper presents an improvement of job scheduling in order to decrease processing time and optimize resource utilization. Extension of grid computing capacity is implemented by virtual machines in existing physical Grid in order to satisfy temporary job requests. Also scheduling plays an important role in the infrastructure that is handled by a couple of schedulers which are developed to manage data automaticall

    Libra: An Economy driven Job Scheduling System for Clusters

    Full text link
    Clusters of computers have emerged as mainstream parallel and distributed platforms for high-performance, high-throughput and high-availability computing. To enable effective resource management on clusters, numerous cluster managements systems and schedulers have been designed. However, their focus has essentially been on maximizing CPU performance, but not on improving the value of utility delivered to the user and quality of services. This paper presents a new computational economy driven scheduling system called Libra, which has been designed to support allocation of resources based on the users? quality of service (QoS) requirements. It is intended to work as an add-on to the existing queuing and resource management system. The first version has been implemented as a plugin scheduler to the PBS (Portable Batch System) system. The scheduler offers market-based economy driven service for managing batch jobs on clusters by scheduling CPU time according to user utility as determined by their budget and deadline rather than system performance considerations. The Libra scheduler ensures that both these constraints are met within an O(n) run-time. The Libra scheduler has been simulated using the GridSim toolkit to carry out a detailed performance analysis. Results show that the deadline and budget based proportional resource allocation strategy improves the utility of the system and user satisfaction as compared to system-centric scheduling strategies.Comment: 13 page

    Compositional competitiveness for distributed algorithms

    Full text link
    We define a measure of competitive performance for distributed algorithms based on throughput, the number of tasks that an algorithm can carry out in a fixed amount of work. This new measure complements the latency measure of Ajtai et al., which measures how quickly an algorithm can finish tasks that start at specified times. The novel feature of the throughput measure, which distinguishes it from the latency measure, is that it is compositional: it supports a notion of algorithms that are competitive relative to a class of subroutines, with the property that an algorithm that is k-competitive relative to a class of subroutines, combined with an l-competitive member of that class, gives a combined algorithm that is kl-competitive. In particular, we prove the throughput-competitiveness of a class of algorithms for collect operations, in which each of a group of n processes obtains all values stored in an array of n registers. Collects are a fundamental building block of a wide variety of shared-memory distributed algorithms, and we show that several such algorithms are competitive relative to collects. Inserting a competitive collect in these algorithms gives the first examples of competitive distributed algorithms obtained by composition using a general construction.Comment: 33 pages, 2 figures; full version of STOC 96 paper titled "Modular competitiveness for distributed algorithms.

    Checkpointing as a Service in Heterogeneous Cloud Environments

    Get PDF
    A non-invasive, cloud-agnostic approach is demonstrated for extending existing cloud platforms to include checkpoint-restart capability. Most cloud platforms currently rely on each application to provide its own fault tolerance. A uniform mechanism within the cloud itself serves two purposes: (a) direct support for long-running jobs, which would otherwise require a custom fault-tolerant mechanism for each application; and (b) the administrative capability to manage an over-subscribed cloud by temporarily swapping out jobs when higher priority jobs arrive. An advantage of this uniform approach is that it also supports parallel and distributed computations, over both TCP and InfiniBand, thus allowing traditional HPC applications to take advantage of an existing cloud infrastructure. Additionally, an integrated health-monitoring mechanism detects when long-running jobs either fail or incur exceptionally low performance, perhaps due to resource starvation, and proactively suspends the job. The cloud-agnostic feature is demonstrated by applying the implementation to two very different cloud platforms: Snooze and OpenStack. The use of a cloud-agnostic architecture also enables, for the first time, migration of applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201

    Performance analysis of downlink shared channels in a UMTS network

    Get PDF
    In light of the expected growth in wireless data communications and the commonly anticipated up/downlink asymmetry, we present a performance analysis of downlink data transfer over \textsc{d}ownlink \textsc{s}hared \textsc{ch}annels (\textsc{dsch}s), arguably the most efficient \textsc{umts} transport channel for medium-to-large data transfers. It is our objective to provide qualitative insight in the different aspects that influence the data \textsc{q}uality \textsc{o}f \textsc{s}ervice (\textsc{qos}). As a most principal factor, the data traffic load affects the data \textsc{qos} in two distinct manners: {\em (i)} a heavier data traffic load implies a greater competition for \textsc{dsch} resources and thus longer transfer delays; and {\em (ii)} since each data call served on a \textsc{dsch} must maintain an \textsc{a}ssociated \textsc{d}edicated \textsc{ch}annel (\textsc{a}-\textsc{dch}) for signalling purposes, a heavier data traffic load implies a higher interference level, a higher frame error rate and thus a lower effective aggregate \textsc{dsch} throughput: {\em the greater the demand for service, the smaller the aggregate service capacity.} The latter effect is further amplified in a multicellular scenario, where a \textsc{dsch} experiences additional interference from the \textsc{dsch}s and \textsc{a}-\textsc{dch}s in surrounding cells, causing a further degradation of its effective throughput. Following an insightful two-stage performance evaluation approach, which segregates the interference aspects from the traffic dynamics, a set of numerical experiments is executed in order to demonstrate these effects and obtain qualitative insight in the impact of various system aspects on the data \textsc{qos}
    • 

    corecore