Search CORE

260,937 research outputs found

3D analytical modelling and iterative solution for high performance computing clusters

Author: Kirsal Ever Y.
Kirsal Ever Y.
Kirsal Y.
Kirsal Y.
Mapp G.
Mapp G.
Raza M.
Raza M.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2021
Field of study

Mobile Cloud Computing enables the migration of services to the edge of the Internet. Therefore, high-performance computing clusters are widely deployed to improve computational capabilities of such environments. However, they are prone to failures and need analytical models to predict their behaviour in order to deliver desired quality-of-service and quality-of-experience to mobile users. This paper proposes a 3D analytical model and a problem-solving approach for sustainability evaluation of high-performance computing clusters. The proposed solution uses an iterative approach to obtain performance measurements to overcome the state space explosion problem. The availability modelling and evaluation of master and computing nodes are performed using a multi-repairman approach. The optimum number of repairmen is also obtained to get realistic results and reduce the overall cost. The proposed model is validated using discrete event simulation. The analytical approach is much faster and in good agreement with the simulations. The analysis focuses on mean queue length, throughput, and mean response time outputs. The maximum differences between analytical and simulation results in the considered scenarios of up to a billion states are less than1.149%,3.82%, and3.76%respectively. These differences are well within the5%of confidence interval of the simulation and the proposed model

Middlesex University Research Repository

Architecture of Job scheduling simulator for demand response based resource provisioning

Author: Date Susumu
Harumoto Kaname
Liu Jason
Matsui Shogo
Shimojo Shinji
Watashiba Yasuhiro
Publication venue: FIU Digital Commons
Publication date: 22/10/2021
Field of study

We study a new service model based on the Demand Response (DR) resource provisioning at High Performance Computing (HPC) centers. This DR-based resource provisioning model allows administrators of HPC centers to provide computing services with incentives to users to compensate for the performance loss due to power saving operations. In a power conservation mode, a job’s performance may decrease, both in terms of a job waiting time and a job execution time. With DR-based resource provisioning, the submitted jobs are divided into two categories, allowed jobs and disallowed jobs, depending on the user’s tolerance in the performance degradation. The allowed jobs, if indeed affected by the power saving operations, will receive compensation in accordance with an incentive system which determines the reward to the user. For designing an appropriate demand response model, we need to focus on the increase in the job’s execution time and the job’s waiting time, and the corresponding decrease in the power consumption. These are important factors in deriving an incentive system. Currently, no existing approaches can reliably quantify the effectiveness and the contribution of these factors in HPC job scheduling and resource provisioning. In this paper, we propose a newly developed job scheduling simulator that can evaluate DR-based resource provisioning approach under various operating conditions. We designed and implemented the job scheduling simulator for HPC demand-response resource provisioning using a general-purpose discrete-event simulator. Our experiments show that the job scheduling simulator can properly represent the demand response resource provisioning using different job scheduling scenarios

DigitalCommons@Florida International University

Optimised access to user analysis data using the gLite DPM

Author: Cowan Greig
Kenyon Mike
Purdie Stuart
Skipsey Sam
Stewart Graeme
Publication venue: 'IOP Publishing'
Publication date: 23/10/2009
Field of study

The ScotGrid distributed Tier-2 now provides more that 4MSI2K and 500TB for LHC computing, which is spread across three sites at Durham, Edinburgh and Glasgow. Tier-2 sites have a dual role to play in the computing models of the LHC VOs. Firstly, their CPU resources are used for the generation of Monte Carlo event data. Secondly, the end user analysis data is distributed across the grid to the site's storage system and held on disk ready for processing by physicists' analysis jobs. In this paper we show how we have designed the ScotGrid storage and data management resources in order to optimise access by physicists to LHC data. Within ScotGrid, all sites use the gLite DPM storage manager middleware. Using the EGEE grid to submit real ATLAS analysis code to process VO data stored on the ScotGrid sites, we present an analysis of the performance of the architecture at one site, and procedures that may be undertaken to improve such. The results will be presented from the point of view of the end user (in terms of number of events processed/second) and from the point of view of the site, which wishes to minimise load and the impact that analysis activity has on other users of the system

arXiv.org e-Print Archive

CiteSeerX

Enlighten

CMS Monte Carlo production in the WLCG computing Grid

Author: Abbrescia M.
Bacchi W.
Caballero J.
Codispoti G.
De Filippis N.
De Weirdt S.
Donvito G.
Elmer P.
Eulisse G.
Evans D.
Fanfani A.
Flossdorf A.
Guan W.
Hammad G.
Hernandez J. M.
Hof C.
Kalini S.
Kavka C.
Khomitch A.
Kreuzer P.
Lazaridis C.
Maes J.
Maggi G.
Mohapatra A.
Myers S.
Pompili A.
Sanches J. A.
Sarkar S.
Van Lingen F.
van Mulders P.
Villella I.
Wakefield S.
Publication venue: 'AIP Publishing'
Publication date: 01/01/2008
Field of study

Monte Carlo production in CMS has received a major boost in performance and scale since the past CHEP06 conference. The production system has been re-engineered in order to incorporate the experience gained in running the previous system and to integrate production with the new CMS event data model, data management system and data processing framework. The system is interfaced to the two major computing Grids used by CMS, the LHC Computing Grid (LCG) and the Open Science Grid (OSG). Operational experience and integration aspects of the new CMS Monte Carlo production system is presented together with an analysis of production statistics. The new system automatically handles job submission, resource monitoring, job queuing, job distribution according to the available resources, data merging, registration of data into the data bookkeeping, data location, data transfer and placement systems. Compared to the previous production system automation, reliability and performance have been considerably improved. A more efficient use of computing resources and a better handling of the inherent Grid unreliability have resulted in an increase of production scale by about an order of magnitude, capable of running in parallel at the order of ten thousand jobs and yielding more than two million events per day

Caltech Authors

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

DI-fusion

CERN Document Server

Recommended from our members

Computing infrastructure issues in distributed communications systems : a survey of operating system transport system architectures

Author: Schmidt Douglas C.
Suda Tatsuya
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

The performance of distributed applications (such as file transfer, remote login, tele-conferencing, full-motion video, and scientific visualization) is influenced by several factors that interact in complex ways. In particular, application performance is significantly affected both by communication infrastructure factors and computing infrastructure factors. Several communication infrastructure factors include channel speed, bit-error rate, and congestion at intermediate switching nodes. Computing infrastructure factors include (among other things) both protocol processing activities (such as connection management, flow control, error detection, and retransmission) and general operating system factors (such as memory latency, CPU speed, interrupt and context switching overhead, process architecture, and message buffering). Due to a several orders of magnitude increase in network channel speed and an increase in application diversity, performance bottlenecks are shifting from the network factors to the transport system factors.This paper defines an abstraction called an "Operating System Transport System Architecture" (OSTSA) that is used to classify the major components and services in the computing infrastructure. End-to-end network protocols such as TCP, TP4, VMTP, XTP, and Delta-t typically run on general-purpose computers, where they utilize various operating system resources such as processors, virtual memory, and network controllers. The OSTSA provides services that integrate these resources to support distributed applications running on local and wide area networks.A taxonomy is presented to evaluate OSTSAs in terms of their support for protocol processing activities. We use this taxonomy to compare and contrast five general-purpose commercial and experimental operating systems including System V UNIX, BSD UNIX, the x-kernel, Choices, and Xinu

eScholarship - University of California

Mobile Computing in Physics Analysis - An Indicator for eScience

Author: Ali A.
Anjum A.
Azim T.
Bunn J.
Ikram A.
McClatchey R.
Newman H.
Steenberg C.
Thomas M.
Willers I.
Publication venue
Publication date: 05/07/2007
Field of study

This paper presents the design and implementation of a Grid-enabled physics analysis environment for handheld and other resource-limited computing devices as one example of the use of mobile devices in eScience. Handheld devices offer great potential because they provide ubiquitous access to data and round-the-clock connectivity over wireless links. Our solution aims to provide users of handheld devices the capability to launch heavy computational tasks on computational and data Grids, monitor the jobs status during execution, and retrieve results after job completion. Users carry their jobs on their handheld devices in the form of executables (and associated libraries). Users can transparently view the status of their jobs and get back their outputs without having to know where they are being executed. In this way, our system is able to act as a high-throughput computing environment where devices ranging from powerful desktop machines to small handhelds can employ the power of the Grid. The results shown in this paper are readily applicable to the wider eScience community.Comment: 8 pages, 7 figures. Presented at the 3rd Int Conf on Mobile Computing & Ubiquitous Networking (ICMU06. London October 200

arXiv.org e-Print Archive

CiteSeerX

A parallel grid-based implementation for real time processing of event log data in collaborative applications

Author: Barolli Leonard
Caballé Llobet Santi
Paniagua Macià Claudi
Xhafa Fatos
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2010
Field of study

Collaborative applications usually register user interaction in the form of semi-structured plain text event log data. Extracting and structuring of data is a prerequisite for later key processes such as the analysis of interactions, assessment of group activity, or the provision of awareness and feedback. Yet, in real situations of online collaborative activity, the processing of log data is usually done offline since structuring event log data is, in general, a computationally costly process and the amount of log data tends to be very large. Techniques to speed and scale up the structuring and processing of log data with minimal impact on the performance of the collaborative application are thus desirable to be able to process log data in real time. In this paper, we present a parallel grid-based implementation for processing in real time the event log data generated in collaborative applications. Our results show the feasibility of using grid middleware to speed and scale up the process of structuring and processing semi-structured event log data. The Grid prototype follows the Master-Worker (MW) paradigm. It is implemented using the Globus Toolkit (GT) and is tested on the Planetlab platform

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

The Oberta in open access