196 research outputs found
MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface
Application development for distributed computing "Grids" can benefit from
tools that variously hide or enable application-level management of critical
aspects of the heterogeneous environment. As part of an investigation of these
issues, we have developed MPICH-G2, a Grid-enabled implementation of the
Message Passing Interface (MPI) that allows a user to run MPI programs across
multiple computers, at the same or different sites, using the same commands
that would be used on a parallel computer. This library extends the Argonne
MPICH implementation of MPI to use services provided by the Globus Toolkit for
authentication, authorization, resource allocation, executable staging, and
I/O, as well as for process creation, monitoring, and control. Various
performance-critical operations, including startup and collective operations,
are configured to exploit network topology information. The library also
exploits MPI constructs for performance management; for example, the MPI
communicator construct is used for application-level discovery of, and
adaptation to, both network topology and network quality-of-service mechanisms.
We describe the MPICH-G2 design and implementation, present performance
results, and review application experiences, including record-setting
distributed simulations.Comment: 20 pages, 8 figure
A shared-disk parallel cluster file system
Dissertação apresentada para obtenção do Grau de Doutor em Informática Pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaToday, clusters are the de facto cost effective platform both for high performance
computing (HPC) as well as IT environments. HPC and IT are quite different environments
and differences include, among others, their choices on file systems and storage: HPC favours parallel file systems geared towards maximum I/O bandwidth, but which are not fully POSIX-compliant and were devised to run on top of (fault prone) partitioned storage; conversely, IT data centres favour both external disk arrays (to provide highly available storage) and POSIX compliant file systems, (either general purpose or shared-disk cluster file systems, CFSs).
These specialised file systems do perform very well in their target environments provided that applications do not require some lateral features, e.g., no file locking on parallel file systems, and no high performance writes over cluster-wide shared files on CFSs. In brief, we can say
that none of the above approaches solves the problem of providing high levels of reliability and performance to both worlds.
Our pCFS proposal makes a contribution to change this situation: the rationale is to take advantage on the best of both – the reliability of cluster file systems and the high performance of parallel file systems. We don’t claim to provide the absolute best of each, but we aim at full POSIX compliance, a rich feature set, and levels of reliability and performance good enough
for broad usage – e.g., traditional as well as HPC applications, support of clustered DBMS engines that may run over regular files, and video streaming. pCFS’ main ideas include:
· Cooperative caching, a technique that has been used in file systems for distributed disks but, as far as we know, was never used either in SAN based cluster file systems or in parallel file systems. As a result, pCFS may use all infrastructures (LAN and SAN) to move data.
· Fine-grain locking, whereby processes running across distinct nodes may define nonoverlapping byte-range regions in a file (instead of the whole file) and access them in parallel, reading and writing over those regions at the infrastructure’s full speed (provided that no major metadata changes are required).
A prototype was built on top of GFS (a Red Hat shared disk CFS): GFS’ kernel code was
slightly modified, and two kernel modules and a user-level daemon were added. In the
prototype, fine grain locking is fully implemented and a cluster-wide coherent cache is maintained through data (page fragments) movement over the LAN.
Our benchmarks for non-overlapping writers over a single file shared among processes
running on different nodes show that pCFS’ bandwidth is 2 times greater than NFS’ while
being comparable to that of the Parallel Virtual File System (PVFS), both requiring about 10 times more CPU. And pCFS’ bandwidth also surpasses GFS’ (600 times for small record sizes, e.g., 4 KB, decreasing down to 2 times for large record sizes, e.g., 4 MB), at about the same CPU usage.Lusitania, Companhia de Seguros S.A, Programa
IBM Shared University Research (SUR
ASCR/HEP Exascale Requirements Review Report
This draft report summarizes and details the findings, results, and
recommendations derived from the ASCR/HEP Exascale Requirements Review meeting
held in June, 2015. The main conclusions are as follows. 1) Larger, more
capable computing and data facilities are needed to support HEP science goals
in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of
the demand at the 2025 timescale is at least two orders of magnitude -- and in
some cases greater -- than that available currently. 2) The growth rate of data
produced by simulations is overwhelming the current ability, of both facilities
and researchers, to store and analyze it. Additional resources and new
techniques for data analysis are urgently needed. 3) Data rates and volumes
from HEP experimental facilities are also straining the ability to store and
analyze large and complex data volumes. Appropriately configured
leadership-class facilities can play a transformational role in enabling
scientific discovery from these datasets. 4) A close integration of HPC
simulation and data analysis will aid greatly in interpreting results from HEP
experiments. Such an integration will minimize data movement and facilitate
interdependent workflows. 5) Long-range planning between HEP and ASCR will be
required to meet HEP's research needs. To best use ASCR HPC resources the
experimental HEP program needs a) an established long-term plan for access to
ASCR computational and data resources, b) an ability to map workflows onto HPC
resources, c) the ability for ASCR facilities to accommodate workflows run by
collaborations that can have thousands of individual members, d) to transition
codes to the next-generation HPC platforms that will be available at ASCR
facilities, e) to build up and train a workforce capable of developing and
using simulations and analysis to support HEP scientific research on
next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio
Proceedings of the Salford Postgraduate Annual Research Conference (SPARC) 2011
These proceedings bring together a selection of papers from the 2011 Salford Postgraduate Annual Research Conference(SPARC). It includes papers from PhD students in the arts and social sciences, business, computing, science and engineering, education, environment, built environment and health sciences. Contributions from Salford researchers are published here alongside papers from students at the Universities of Anglia Ruskin, Birmingham City, Chester,De Montfort, Exeter, Leeds, Liverpool, Liverpool John Moores and Manchester
Agent-based resource management for grid computing
A computational grid is a hardware and software infrastructure that provides
dependable, consistent, pervasive, and inexpensive access to high-end
computational capability. An ideal grid environment should provide access to the
available resources in a seamless manner. Resource management is an important
infrastructural component of a grid computing environment. The overall aim of
resource management is to efficiently schedule applications that need to utilise the
available resources in the grid environment. Such goals within the high
performance community will rely on accurate performance prediction capabilities.
An existing toolkit, known as PACE (Performance Analysis and Characterisation
Environment), is used to provide quantitative data concerning the performance of
sophisticated applications running on high performance resources. In this thesis an
ASCI (Accelerated Strategic Computing Initiative) kernel application, Sweep3D,
is used to illustrate the PACE performance prediction capabilities. The validation
results show that a reasonable accuracy can be obtained, cross-platform
comparisons can be easily undertaken, and the process benefits from a rapid
evaluation time. While extremely well-suited for managing a locally distributed
multi-computer, the PACE functions do not map well onto a wide-area
environment, where heterogeneity, multiple administrative domains, and communication irregularities dramatically complicate the job of resource
management. Scalability and adaptability are two key challenges that must be
addressed.
In this thesis, an A4 (Agile Architecture and Autonomous Agents) methodology is
introduced for the development of large-scale distributed software systems with
highly dynamic behaviours. An agent is considered to be both a service provider
and a service requestor. Agents are organised into a hierarchy with service
advertisement and discovery capabilities. There are four main performance
metrics for an A4 system: service discovery speed, agent system efficiency,
workload balancing, and discovery success rate.
Coupling the A4 methodology with PACE functions, results in an Agent-based
Resource Management System (ARMS), which is implemented for grid
computing. The PACE functions supply accurate performance information (e. g.
execution time) as input to a local resource scheduler on the fly. At a meta-level,
agents advertise their service information and cooperate with each other to
discover available resources for grid-enabled applications. A Performance
Monitor and Advisor (PMA) is also developed in ARMS to optimise the
performance of the agent behaviours.
The PMA is capable of performance modelling and simulation about the agents in
ARMS and can be used to improve overall system performance. The PMA can
monitor agent behaviours in ARMS and reconfigure them with optimised
strategies, which include the use of ACTs (Agent Capability Tables), limited
service lifetime, limited scope for service advertisement and discovery, agent
mobility and service distribution, etc.
The main contribution of this work is that it provides a methodology and
prototype implementation of a grid Resource Management System (RMS). The
system includes a number of original features that cannot be found in existing
research solutions
The development of object oriented Bayesian networks to evaluate the social, economic and environmental impacts of solar PV
Domestic and community low carbon technologies are widely heralded as valuable means for delivering sustainability outcomes in the form of social, economic and environmental (SEE) policy objectives. To accelerate their diffusion they have benefited from a significant number and variety of subsidies worldwide. Considerable aleatory and epistemic uncertainties exist, however, both with regard to their net energy contribution and their SEE impacts. Furthermore the socio-economic contexts themselves exhibit enormous variability, and commensurate uncertainties in their parameterisation. This represents a significant risk for policy makers and technology adopters.
This work describes an approach to these problems using Bayesian Network models. These are utilised to integrate extant knowledge from a variety of disciplines to quantify SEE impacts and endogenise uncertainties. A large-scale Object Oriented Bayesian network has been developed to model the specific case of solar photovoltaics (PV) installed on UK domestic roofs. Three specific model components have been developed. The PV component characterises the yield of UK systems, the building energy component characterises the energy consumption of the dwellings and their occupants and a third component characterises the building stock in four English urban communities.
Three representative SEE indicators, fuel affordability, carbon emission reduction and discounted cash flow are integrated and used to test the model s ability to yield meaningful outputs in response to varying inputs. The variability in the percentage of the three indicators is highly responsive to the dwellings built form, age and orientation, but is not just due to building and solar physics but also to socio-economic factors. The model can accept observations or evidence in order to create scenarios which facilitate deliberative decision making.
The BN methodology contributes to the synthesis of new knowledge from extant knowledge located between disciplines . As well as insights into the impacts of high PV penetration, an epistemic contribution has been made to transdisciplinary building energy modelling which can be replicated with a variety of low carbon interventions
- …