Search CORE

1,795 research outputs found

A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

Author: Buyya Rajkumar
Ramamohanarao Kotagiri
Venugopal Srikumar
Publication venue
Publication date: 10/06/2005
Field of study

Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

arXiv.org e-Print Archive

CiteSeerX

University of Melbourne Institutional Repository

CMS Monte Carlo production in the WLCG computing Grid

Author: Abbrescia M.
Bacchi W.
Caballero J.
Codispoti G.
De Filippis N.
De Weirdt S.
Donvito G.
Elmer P.
Eulisse G.
Evans D.
Fanfani A.
Flossdorf A.
Guan W.
Hammad G.
Hernandez J. M.
Hof C.
Kalini S.
Kavka C.
Khomitch A.
Kreuzer P.
Lazaridis C.
Maes J.
Maggi G.
Mohapatra A.
Myers S.
Pompili A.
Sanches J. A.
Sarkar S.
Van Lingen F.
van Mulders P.
Villella I.
Wakefield S.
Publication venue: 'AIP Publishing'
Publication date: 01/01/2008
Field of study

Monte Carlo production in CMS has received a major boost in performance and scale since the past CHEP06 conference. The production system has been re-engineered in order to incorporate the experience gained in running the previous system and to integrate production with the new CMS event data model, data management system and data processing framework. The system is interfaced to the two major computing Grids used by CMS, the LHC Computing Grid (LCG) and the Open Science Grid (OSG). Operational experience and integration aspects of the new CMS Monte Carlo production system is presented together with an analysis of production statistics. The new system automatically handles job submission, resource monitoring, job queuing, job distribution according to the available resources, data merging, registration of data into the data bookkeeping, data location, data transfer and placement systems. Compared to the previous production system automation, reliability and performance have been considerably improved. A more efficient use of computing resources and a better handling of the inherent Grid unreliability have resulted in an increase of production scale by about an order of magnitude, capable of running in parallel at the order of ten thousand jobs and yielding more than two million events per day

Caltech Authors

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

DI-fusion

CERN Document Server

Any Data, Any Time, Anywhere: Global Data Access for Science

Author: Bloom Kenneth
Boccali Tommaso
Bockelman Brian
Bradley Daniel
Dasu Sridhara
Dost Jeff
Fanzago Federica
Sfiligoi Igor
Tadel Alja Mrak
Tadel Matevz
Vuosalo Carl
Würthwein Frank
Yagil Avi
Zvada Marian
Publication venue
Publication date: 06/08/2015
Field of study

Data access is key to science driven by distributed high-throughput computing (DHTC), an essential technology for many major research projects such as High Energy Physics (HEP) experiments. However, achieving efficient data access becomes quite difficult when many independent storage sites are involved because users are burdened with learning the intricacies of accessing each system and keeping careful track of data location. We present an alternate approach: the Any Data, Any Time, Anywhere infrastructure. Combining several existing software products, AAA presents a global, unified view of storage systems - a "data federation," a global filesystem for software delivery, and a workflow management system. We present how one HEP experiment, the Compact Muon Solenoid (CMS), is utilizing the AAA infrastructure and some simple performance metrics.Comment: 9 pages, 6 figures, submitted to 2nd IEEE/ACM International Symposium on Big Data Computing (BDC) 201

arXiv.org e-Print Archive

Crossref

LDM: Lineage-Aware Data Management in Multi-tier Storage Systems

Author: Anton Spivak
C Ré
J Dean
JN Khasnabish
M Grund
M Zaharia
R Balasubramonian
R Bose
Sangwhan Moon
W Lu
Yingyi Bu
Publication venue: Iowa State University Digital Repository
Publication date: 02/02/2019
Field of study

We design and develop LDM, a novel data management solution to cater the needs of applications exhibiting the lineage property, i.e. in which the current writes are future reads. In such a class of applications, slow writes significantly hurt the over-all performance of jobs, i.e. current writes determine the fate of next reads. We believe that in a large scale shared production cluster, the issues associated due to data management can be mitigated at a way higher layer in the hierarchy of the I/O path, even before requests to data access are made. Contrary to the current solutions to data management which are mostly reactive and/or based on heuristics, LDM is both deterministic and pro-active. We develop block-graphs, which enable LDM to capture the complete time-based data-task dependency associations, therefore use it to perform life-cycle management through tiering of data blocks. LDM amalgamates the information from the entire data center ecosystem, right from the application code, to file system mappings, the compute and storage devices topology, etc. to make oracle-like deterministic data management decisions. With trace-driven experiments, LDM is able to achieve 29–52% reduction in over-all data center workload execution time. Moreover, by deploying LDM with extensive pre-processing creates efficient data consumption pipelines, which also reduces write and read delays significantly

Digital Repository @ Iowa State University (ISU)

Crossref

Evolutionary Game Theoretic Multi-Objective Optimization Algorithms and Their Applications

Author: Ren Cheng Yi
Publication venue: ScholarWorks at UMass Boston
Publication date: 31/05/2017
Field of study

Multi-objective optimization problems require more than one objective functions to be optimized simultaneously. They are widely applied in many science fields, including engineering, economics and logistics where optimal decisions need to be taken in the presence of trade-offs between two or more conicting objectives. Most of the real world multi-objective optimization problems are NP-Hard problems. It may be too computationally costly to find an exact solution but sometimes a near optimal solution is sufficient. In these cases, Multi-Objective Evolutionary Algorithms (MOEAs) provide good approximate solutions to problems that cannot be solved easily using other techniques. However Evolutionary Algorithm is not stable due to its random nature, it may produce very different results every time it runs. This dissertation proposes an Evolutionary Game Theory (EGT) framework based algorithm (EGTMOA) that provides optimality and stability at the same time. EGTMOA combines the notion of stability from EGT and optimality from MOEA to form a novel and promising algorithm to solve multi-objective optimization problems. This dissertation studies three different multi-objective optimization applications, Cloud Virtual Machine Placement, Body Sensor Networks, and Multi-Hub Molecular Communication along with their proposed EGTMOA framework based algorithms. Experiment results show that EGTMOAs outperform many well known multi-objective evolutionary algorithms in stability, performance and runtime

University of Massachusetts Boston: ScholarWorks at UMass

Grid Computing in CMS

Author: Hernandez Jose
Publication venue: 'Sissa Medialab'
Publication date: 01/01/2006
Field of study

CERN Document Server

Two levels autonomic resource management in virtualized IaaS

Author: Broto Laurent
De Palma Noël
Hagimont Daniel
Tchana Alain
Tran Giang Son
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

International audienceVirtualized cloud infrastructures are very popular as they allow resource mutualization and therefore cost reduction. For cloud providers, minimizing the number of used resources is one of the main services that such environments must ensure. Cloud customers are also concerned with the minimization of used resources in the cloud since they want to reduce their invoice. Thus, resource management in the cloud should be considered by the cloud provider at the virtualization level and by the cloud customers at the application level. Many research works investigate resource management strategies in these two levels. Most of them study virtual machine consolidation (according to the virtualized infrastructure utilization rate) at the virtualized level and dynamic application sizing (according to its workload) at the application level. However, these strategies are studied separately. In this article, we show that virtual machine consolidation and dynamic application sizing are complementary. We show the efficiency of the combination of these two strategies, in reducing resource usage and keeping an application’s Quality of Service. Our demonstration is done by comparing the evaluation of three resource management strategies (implemented at the virtualization level only, at the application level only, or complementary at both levels) in a private cloud infrastructure, hosting typical JEE web applications (evaluated with the RUBiS benchmark)

Scientific Publications of the University of Toulouse II Le Mirail

Hal - Université Grenoble Alpes

Open Archive Toulouse Archive Ouverte

Cost and Performance-Based Resource Selection Scheme for Asynchronous Replicated System in Utility-Based Computing Environment

Author: Abawajy Jemal H.
Wan Nik Wan Nor Shuhadah
Zhou Bing Bing
Zomaya Albert Y.
Publication venue: 'Insight Society'
Publication date: 22/04/2017
Field of study

A resource selection problem for asynchronous replicated systems in utility-based computing environment is addressed in this paper. The needs for a special attention on this problem lies on the fact that most of the existing replication scheme in this computing system whether implicitly support synchronous replication and/or only consider read-only job. The problem is undoubtedly complex to be solved as two main issues need to be concerned simultaneously, i.e. 1) the difficulty on predicting the performance of the resources in terms of job response time, and 2) an efficient mechanism must be employed in order to measure the trade-off between the performance and the monetary cost incurred on resources so that minimum cost is preserved while providing low job response time. Therefore, a simple yet efficient algorithm that deals with the complexity of resource selection problem in utility-based computing systems is proposed in this paper. The problem is formulated as a Multi Criteria Decision Making (MCDM) problem. The advantages of the algorithm are two-folds. On one fold, it hides the complexity of resource selection process without neglecting important components that affect job response time. The difficulty on estimating job response time is captured by representing them in terms of different QoS criteria levels at each resource. On the other fold, this representation further relaxed the complexity in measuring the trade-offs between the performance and the monetary cost incurred on resources. The experiments proved that our proposed resource selection scheme achieves an appealing result with good system performance and low monetary cost as compared to existing algorithms

International Journal on Advanced Science, Engineering and Information Technology