Search CORE

64,368 research outputs found

Many-Task Computing and Blue Waters

Author: Armstrong Timothy G.
Katz Daniel S.
Wilde Michael
Wozniak Justin M.
Zhang Zhao
Publication venue
Publication date: 01/01/2012
Field of study

This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Leveraging simulation practice in industry through use of desktop grid middleware

Author: Mustafee N
Taylor S J E
Publication venue: 'IGI Global'
Publication date: 01/01/2009
Field of study

This chapter focuses on the collaborative use of computing resources to support decision making in industry. Through the use of middleware for desktop grid computing, the idle CPU cycles available on existing computing resources can be harvested and used for speeding-up the execution of applications that have “non-trivial” processing requirements. This chapter focuses on the desktop grid middleware BOINC and Condor, and discusses the integration of commercial simulation software together with free-to-download grid middleware so as to offer competitive advantage to organizations that opt for this technology. It is expected that the low-intervention integration approach presented in this chapter (meaning no changes to source code required) will appeal to both simulation practitioners (as simulations can be executed faster, which in turn would mean that more replications and optimization is possible in the same amount of time) and the management (as it can potentially increase the return on investment on existing resources)

Brunel University Research Archive

DualTable: A Hybrid Storage Model for Update Optimization in Hive

Author: Hu Songlin
Huang Shuo
Jacobsen Hans-Arno
Liang Ying
Liu Wantao
Pei Xubin
Rabl Tilmann
Wang Jiye
Xiao Zheng
Publication venue
Publication date: 01/12/2014
Field of study

Hive is the most mature and prevalent data warehouse tool providing SQL-like interface in the Hadoop ecosystem. It is successfully used in many Internet companies and shows its value for big data processing in traditional industries. However, enterprise big data processing systems as in Smart Grid applications usually require complicated business logics and involve many data manipulation operations like updates and deletes. Hive cannot offer sufficient support for these while preserving high query performance. Hive using the Hadoop Distributed File System (HDFS) for storage cannot implement data manipulation efficiently and Hive on HBase suffers from poor query performance even though it can support faster data manipulation.There is a project based on Hive issue Hive-5317 to support update operations, but it has not been finished in Hive's latest version. Since this ACID compliant extension adopts same data storage format on HDFS, the update performance problem is not solved. In this paper, we propose a hybrid storage model called DualTable, which combines the efficient streaming reads of HDFS and the random write capability of HBase. Hive on DualTable provides better data manipulation support and preserves query performance at the same time. Experiments on a TPC-H data set and on a real smart grid data set show that Hive on DualTable is up to 10 times faster than Hive when executing update and delete operations.Comment: accepted by industry session of ICDE201

arXiv.org e-Print Archive

Crossref

Next-Generation EU DataGrid Data Management Services

Author: Akos Frohner
David Cameron
Diana Bosio
Domenici A
Erwin Laure
Federico Dicarlo
Floriano Zini
Gavin Mccance
Gianluca Volpato
Heinz Stockinger
Joni Hahkala
Kurt Stockinger
Leanne Guy
Levi Lucio
Livio Salconi
Mika Sil
Niklas Karlsson
Olle Mulmo
Paul Millar
Peter Kunszt
Ruben Carvajal-Schiaffino
Sophie Lemaitre
Ville Nenonen
William Bell
Publication venue
Publication date: 01/01/2003
Field of study

We describe the architecture and initial implementation of the next-generation of Grid Data Management Middleware in the EU DataGrid (EDG) project. The new architecture stems out of our experience and the users requirements gathered during the two years of running our initial set of Grid Data Management Services. All of our new services are based on the Web Service technology paradigm, very much in line with the emerging Open Grid Services Architecture (OGSA). We have modularized our components and invested a great amount of effort towards a secure, extensible and robust service, starting from the design but also using a streamlined build and testing framework. Our service components are: Replica Location Service, Replica Metadata Service, Replica Optimization Service, Replica Subscription and high-level replica management. The service security infrastructure is fully GSI-enabled, hence compatible with the existing Globus Toolkit 2-based services; moreover, it allows for fine-grained authorization mechanisms that can be adjusted depending on the service semantics.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla,Ca, USA, March 2003 8 pages, LaTeX, the file contains all LaTeX sources - figures are in the directory "figures

arXiv.org e-Print Archive

CiteSeerX

Archivio della Ricerca - Università di Pisa

CERN Document Server

Enhancing Job Scheduling of an Atmospheric Intensive Data Application

Author: Caragnano G.
Goga K.
Mossucca L.
Ruiu Pietro
Terzo Olivier
Publication venue: International Journal On Advances in Telecommunications, IARIA
Publication date: 01/01/2012
Field of study

Nowadays, e-Science applications involve great deal of data to have more accurate analysis. One of its application domains is the Radio Occultation which manages satellite data. Grid Processing Management is a physical infrastructure geographically distributed based on Grid Computing, that is implemented for the overall processing Radio Occultation analysis. After a brief description of algorithms adopted to characterize atmospheric profiles, the paper presents an improvement of job scheduling in order to decrease processing time and optimize resource utilization. Extension of grid computing capacity is implemented by virtual machines in existing physical Grid in order to satisfy temporary job requests. Also scheduling plays an important role in the infrastructure that is handled by a couple of schedulers which are developed to manage data automaticall

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino