109 research outputs found
A shared-disk parallel cluster file system
Dissertação apresentada para obtenção do Grau de Doutor em Informática Pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaToday, clusters are the de facto cost effective platform both for high performance
computing (HPC) as well as IT environments. HPC and IT are quite different environments
and differences include, among others, their choices on file systems and storage: HPC favours parallel file systems geared towards maximum I/O bandwidth, but which are not fully POSIX-compliant and were devised to run on top of (fault prone) partitioned storage; conversely, IT data centres favour both external disk arrays (to provide highly available storage) and POSIX compliant file systems, (either general purpose or shared-disk cluster file systems, CFSs).
These specialised file systems do perform very well in their target environments provided that applications do not require some lateral features, e.g., no file locking on parallel file systems, and no high performance writes over cluster-wide shared files on CFSs. In brief, we can say
that none of the above approaches solves the problem of providing high levels of reliability and performance to both worlds.
Our pCFS proposal makes a contribution to change this situation: the rationale is to take advantage on the best of both – the reliability of cluster file systems and the high performance of parallel file systems. We don’t claim to provide the absolute best of each, but we aim at full POSIX compliance, a rich feature set, and levels of reliability and performance good enough
for broad usage – e.g., traditional as well as HPC applications, support of clustered DBMS engines that may run over regular files, and video streaming. pCFS’ main ideas include:
· Cooperative caching, a technique that has been used in file systems for distributed disks but, as far as we know, was never used either in SAN based cluster file systems or in parallel file systems. As a result, pCFS may use all infrastructures (LAN and SAN) to move data.
· Fine-grain locking, whereby processes running across distinct nodes may define nonoverlapping byte-range regions in a file (instead of the whole file) and access them in parallel, reading and writing over those regions at the infrastructure’s full speed (provided that no major metadata changes are required).
A prototype was built on top of GFS (a Red Hat shared disk CFS): GFS’ kernel code was
slightly modified, and two kernel modules and a user-level daemon were added. In the
prototype, fine grain locking is fully implemented and a cluster-wide coherent cache is maintained through data (page fragments) movement over the LAN.
Our benchmarks for non-overlapping writers over a single file shared among processes
running on different nodes show that pCFS’ bandwidth is 2 times greater than NFS’ while
being comparable to that of the Parallel Virtual File System (PVFS), both requiring about 10 times more CPU. And pCFS’ bandwidth also surpasses GFS’ (600 times for small record sizes, e.g., 4 KB, decreasing down to 2 times for large record sizes, e.g., 4 MB), at about the same CPU usage.Lusitania, Companhia de Seguros S.A, Programa
IBM Shared University Research (SUR
Supporting distributed computation over wide area gigabit networks
The advent of high bandwidth fibre optic links that may be used over very large distances
has lead to much research and development in the field of wide area gigabit networking. One
problem that needs to be addressed is how loosely coupled distributed systems may be built over
these links, allowing many computers worldwide to take part in complex calculations in order
to solve "Grand Challenge" problems. The research conducted as part of this PhD has looked
at the practicality of implementing a communication mechanism proposed by Craig Partridge
called Late-binding Remote Procedure Calls (LbRPC).
LbRPC is intended to export both code and data over the network to remote machines for
evaluation, as opposed to traditional RPC mechanisms that only send parameters to pre-existing
remote procedures. The ability to send code as well as data means that LbRPC requests can
overcome one of the biggest problems in Wide Area Distributed Computer Systems (WADCS):
the fixed latency due to the speed of light. As machines get faster, the fixed multi-millisecond
round trip delay equates to ever increasing numbers of CPU cycles. For a WADCS to be
efficient, programs should minimise the number of network transits they incur. By allowing the
application programmer to export arbitrary code to the remote machine, this may be achieved.
This research has looked at the feasibility of supporting secure exportation of arbitrary
code and data in heterogeneous, loosely coupled, distributed computing environments. It has
investigated techniques for making placement decisions for the code in cases where there are a
large number of widely dispersed remote servers that could be used. The latter has resulted in
the development of a novel prototype LbRPC using multicast IP for implicit placement and a
sequenced, multi-packet saturation multicast transport protocol. These prototypes show that
it is possible to export code and data to multiple remote hosts, thereby removing the need to
perform complex and error prone explicit process placement decisions
The ATLAS Data Acquisition and High Level Trigger system
This paper describes the data acquisition and high level trigger system of the ATLAS experiment at the Large Hadron Collider at CERN, as deployed during Run 1. Data flow as well as control, configuration and monitoring aspects are addressed. An overview of the functionality of the system and of its performance is presented and design choices are discussed.Facultad de Ciencias Exacta
The ATLAS Data Acquisition and High Level Trigger system
This paper describes the data acquisition and high level trigger system of the ATLAS experiment at the Large Hadron Collider at CERN, as deployed during Run 1. Data flow as well as control, configuration and monitoring aspects are addressed. An overview of the functionality of the system and of its performance is presented and design choices are discussed.Facultad de Ciencias Exacta
Literature review of the remote sensing of natural resources
A bibliography is presented concerning remote sensing techniques. Abstracts of recent periodicals are included along with author, and keyword indexes
Static and Dynamic Scheduling for Effective Use of Multicore Systems
Multicore systems have increasingly gained importance in high performance computers. Compared to the traditional microarchitectures, multicore architectures have a simpler design, higher performance-to-area ratio, and improved power efficiency. Although the multicore architecture has various advantages, traditional parallel programming techniques do not apply to the new architecture efficiently. This dissertation addresses how to determine optimized thread schedules to improve data reuse on shared-memory multicore systems and how to seek a scalable solution to designing parallel software on both shared-memory and distributed-memory multicore systems.
We propose an analytical cache model to predict the number of cache misses on the time-sharing L2 cache on a multicore processor. The model provides an insight into the impact of cache sharing and cache contention between threads. Inspired by the model, we build the framework of affinity based thread scheduling to determine optimized thread schedules to improve data reuse on all the levels in a complex memory hierarchy. The affinity based thread scheduling framework includes a model to estimate the cost of a thread schedule, which consists of three submodels: an affinity graph submodel, a memory hierarchy submodel, and a cost submodel. Based on the model, we design a hierarchical graph partitioning algorithm to determine near-optimal solutions. We have also extended the algorithm to support threads with data dependences. The algorithms are implemented and incorporated into a feedback directed optimization prototype system. The prototype system builds upon a binary instrumentation tool and can improve program performance greatly on shared-memory multicore architectures.
We also study the dynamic data-availability driven scheduling approach to designing new parallel software on distributed-memory multicore architectures. We have implemented a decentralized dynamic runtime system. The design of the runtime system is focused on the scalability metric. At any time only a small portion of a task graph exists in memory. We propose an algorithm to solve data dependences without process cooperation in a distributed manner. Our experimental results demonstrate the scalability and practicality of the approach for both shared-memory and distributed-memory multicore systems. Finally, we present a scalable nonblocking topology-aware multicast scheme for distributed DAG scheduling applications
- …