Search CORE

181 research outputs found

Investigating grid computing technologies for use with commercial simulation packages

Author: Mustafee N
Taylor SJE
Publication venue: 'The Operational Research Society'
Publication date: 01/01/2008
Field of study

As simulation experimentation in industry become more computationally demanding, grid computing can be seen as a promising technology that has the potential to bind together the computational resources needed to quickly execute such simulations. To investigate how this might be possible, this paper reviews the grid technologies that can be used together with commercial-off-the-shelf simulation packages (CSPs) used in industry. The paper identifies two specific forms of grid computing (Public Resource Computing and Enterprise-wide Desktop Grid Computing) and the middleware associated with them (BOINC and Condor) as being suitable for grid-enabling existing CSPs. It further proposes three different CSP-grid integration approaches and identifies one of them to be the most appropriate. It is hoped that this research will encourage simulation practitioners to consider grid computing as a technologically viable means of executing CSP-based experiments faster

CiteSeerX

Brunel University Research Archive

Quantum Split Neural Network Learning using Cross-Channel Pooling

Author: Baek Hankyul
Kim Joongheon
Yun Won Joon
Publication venue
Publication date: 08/04/2023
Field of study

In recent years, the field of quantum science has attracted significant interest across various disciplines, including quantum machine learning, quantum communication, and quantum computing. Among these emerging areas, quantum federated learning (QFL) has gained particular attention due to the integration of quantum neural networks (QNNs) with traditional federated learning (FL) techniques. In this study, a novel approach entitled quantum split learning (QSL) is presented, which represents an advanced extension of classical split learning. Previous research in classical computing has demonstrated numerous advantages of split learning, such as accelerated convergence, reduced communication costs, and enhanced privacy protection. To maximize the potential of QSL, cross-channel pooling is introduced, a technique that capitalizes on the distinctive properties of quantum state tomography facilitated by QNNs. Through rigorous numerical analysis, evidence is provided that QSL not only achieves a 1.64\% higher top-1 accuracy compared to QFL but also demonstrates robust privacy preservation in the context of the MNIST classification task

arXiv.org e-Print Archive

NASA high performance computing and communications program

Author: Holcomb Lee
Hunter Paul
Smith Paul
Publication venue
Publication date
Field of study

The National Aeronautics and Space Administration's HPCC program is part of a new Presidential initiative aimed at producing a 1000-fold increase in supercomputing speed and a 100-fold improvement in available communications capability by 1997. As more advanced technologies are developed under the HPCC program, they will be used to solve NASA's 'Grand Challenge' problems, which include improving the design and simulation of advanced aerospace vehicles, allowing people at remote locations to communicate more effectively and share information, increasing scientist's abilities to model the Earth's climate and forecast global environmental trends, and improving the development of advanced spacecraft. NASA's HPCC program is organized into three projects which are unique to the agency's mission: the Computational Aerosciences (CAS) project, the Earth and Space Sciences (ESS) project, and the Remote Exploration and Experimentation (REE) project. An additional project, the Basic Research and Human Resources (BRHR) project exists to promote long term research in computer science and engineering and to increase the pool of trained personnel in a variety of scientific disciplines. This document presents an overview of the objectives and organization of these projects as well as summaries of individual research and development programs within each project

NASA Technical Reports Server

CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance

Author: Hager Georg
Kreutzer Moritz
Shahzad Faisal
Thies Jonas
Wellein Gerhard
Zeiser Thomas
Publication venue
Publication date: 07/08/2017
Field of study

In order to efficiently use the future generations of supercomputers, fault tolerance and power consumption are two of the prime challenges anticipated by the High Performance Computing (HPC) community. Checkpoint/Restart (CR) has been and still is the most widely used technique to deal with hard failures. Application-level CR is the most effective CR technique in terms of overhead efficiency but it takes a lot of implementation effort. This work presents the implementation of our C++ based library CRAFT (Checkpoint-Restart and Automatic Fault Tolerance), which serves two purposes. First, it provides an extendable library that significantly eases the implementation of application-level checkpointing. The most basic and frequently used checkpoint data types are already part of CRAFT and can be directly used out of the box. The library can be easily extended to add more data types. As means of overhead reduction, the library offers a build-in asynchronous checkpointing mechanism and also supports the Scalable Checkpoint/Restart (SCR) library for node level checkpointing. Second, CRAFT provides an easier interface for User-Level Failure Mitigation (ULFM) based dynamic process recovery, which significantly reduces the complexity and effort of failure detection and communication recovery mechanism. By utilizing both functionalities together, applications can write application-level checkpoints and recover dynamically from process failures with very limited programming effort. This work presents the design and use of our library in detail. The associated overheads are thoroughly analyzed using several benchmarks

arXiv.org e-Print Archive

Institute of Transport Research:Publications

A survey of techniques and technologies for web-based real-time interactive rendering

Author: Pacheco Filipe
Tovar Eduardo
Publication venue: IPP-Hurray Group
Publication date: 01/01/2001
Field of study

When exploring a virtual environment, realism depends mainly on two factors: realistic images and real-time feedback (motions, behaviour etc.). In this context, photo realism and physical validity of computer generated images required by emerging applications, such as advanced e-commerce, still impose major challenges in the area of rendering research whereas the complexity of lighting phenomena further requires powerful and predictable computing if time constraints must be attained. In this technical report we address the state-of-the-art on rendering, trying to put the focus on approaches, techniques and technologies that might enable real-time interactive web-based clientserver rendering systems. The focus is on the end-systems and not the networking technologies used to interconnect client(s) and server(s).Siemens; Bertelsmann mediaSystems GmbH; Eptron Multimedia; Instituto Politécnico do Porto - ISEP-IPP; Institute Laboratory for Mixed Realities at the Academy of Media Arts Cologne, LMR; Mälardalen Real-Time Research Centre (MRTC) at Mälardalen University in Västerås; Q-Systems

Repositório Científico do Instituto Politécnico do Porto

Checkpointing of parallel applications in a Grid environment

Author: Sajadah K.
Sajadah K.
Publication venue
Publication date: 01/01/2011
Field of study

The Grid environment is generic, heterogeneous, and dynamic with lots of unreliable resources making it very exposed to failures. The environment is unreliable because it is geographically dispersed involving multiple autonomous administrative domains and it is composed of a large number of components. Examples of failures in the Grid environment can be: application crash, Grid node crash, network failures, and Grid system component failures. These types of failures can affect the execution of parallel/distributed application in the Grid environment and so, protections against these faults are crucial. Therefore, it is essential to develop efficient fault tolerant mechanisms to allow users to successfully execute Grid applications. One of the research challenges in Grid computing is to be able to develop a fault tolerant solution that will ensure Grid applications are executed reliably with minimum overhead incurred. While checkpointing is the most common method to achieve fault tolerance, there is still a lot of work to be done to improve the efficiency of the mechanism. This thesis provides an in-depth description of a novel solution for checkpointing parallel applications executed on a Grid. The checkpointing mechanism implemented allows to checkpoint an application at regions where there is no interprocess communication involved and therefore reducing the checkpointing overhead and checkpoint size

WestminsterResearch

Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches

Author: Alexandrov Vassil
McKee Gerard
Varghese Blesson
Publication venue: 'Elsevier BV'
Publication date: 03/03/2014
Field of study

Background: Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively detect computing core failures and take action to relocate the computing core's job onto reliable cores can make a significant step towards automating fault tolerance. Method: This paper describes an experimental investigation into the use of multi-agent approaches for fault tolerance. Two approaches are studied, the first at the job level and the second at the core level. The approaches are investigated for single core failure scenarios that can occur in the execution of parallel reduction algorithms on computer clusters. A third approach is proposed that incorporates multi-agent technology both at the job and core level. Experiments are pursued in the context of genome searching, a popular computational biology application. Result: The key conclusion is that the approaches proposed are feasible for automating fault tolerance in high-performance computing systems with minimal human intervention. In a typical experiment in which the fault tolerance is studied, centralised and decentralised checkpointing approaches on an average add 90% to the actual time for executing the job. On the other hand, in the same experiment the multi-agent approaches add only 10% to the overall execution time.Comment: Computers in Biology and Medicin

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

University of St. Andrews - Pure

St Andrews Research Repository

A hybrid framework of iterative MapReduce and MPI for molecular dynamics applications

Author: Bai Shuju
Publication venue: LSU Digital Commons
Publication date: 01/01/2013
Field of study

Developing platforms for large scale data processing has been a great interest to scientists. Hadoop is a widely used computational platform which is a fault-tolerant distributed system for data storage due to HDFS (Hadoop Distributed File System) and performs fault-tolerant distributed data processing in parallel due to MapReduce framework. It is quite often that actual computations require multiple MapReduce cycles, which needs chained MapReduce jobs. However, Design by Hadoop is poor in addressing problems with iterative structures. In many iterative problems, some invariant data is required by every MapReduce cycle. The same data is uploaded to Hadoop file system in every MapReduce cycle, causing repeated data delivering and unnecessary time cost in transferring this data. In addition, although Hadoop can process data in parallel, it does not support MPI in computing. In any Map/Reduce task, the computation must be serial. This results in inefficient scientific computations wrapped in Map/Reduce tasks because the computation can not be distributed over a Hadoop cluster, especially a Hadoop cluster on a traditional high performance computing cluster. Computational technologies have been extensively investigated to be applied into many application domains. Since the presence of Hadoop, scientists have applied the MapReduce framework to biological sciences, chemistry, medical sciences, and other areas to efficiently process huge data sets. In our research, we proposed a hybrid framework of iterative MapReduce and MPI for molecular dynamics applications. We carried out molecular dynamics simulations with the implemented hybrid framework. We improved the capability and performance of Hadoop by adding a MPI module to Hadoop. The MPI module enables Hadoop to monitor and manage the resources of Hadoop cluster so that computations incurred in Map/Reduce tasks can be performed in a parallel manner. We also applied the local caching mechanism to avoid data delivery redundancy to make the computing more efficient. Our hybrid framework inherits features of Hadoop and improves computing efficiency of Hadoop. The targeting application domain of our research is molecular dynamics simulation. However, the potential use of our iterative MapReduce framework with MPI is broad. It can be used by any applications which contain single or multiple MapReduce iterations, invoke serial or parallel (MPI) computations in Map phase or Reduce phase of Hadoop

Louisiana State University