104 research outputs found
Computing in the RAIN: a reliable array of independent nodes
The RAIN project is a research collaboration between Caltech and NASA-JPL on distributed computing and data-storage systems for future spaceborne missions. The goal of the project is to identify and develop key building blocks for reliable distributed systems built with inexpensive off-the-shelf components. The RAIN platform consists of a heterogeneous cluster of computing and/or storage nodes connected via multiple interfaces to networks configured in fault-tolerant topologies. The RAIN software components run in conjunction with operating system services and standard network protocols. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. The RAIN-technology has been transferred to Rainfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers. In this paper, we describe the following contributions: 1) fault-tolerant interconnect topologies and communication protocols providing consistent error reporting of link failures, 2) fault management techniques based on group membership, and 3) data storage schemes based on computationally efficient error-control codes. We present several proof-of-concept applications: a highly-available video server, a highly-available Web server, and a distributed checkpointing system. Also, we describe a commercial product, Rainwall, built with the RAIN technology
Performance evaluation of Fast Ethernet, ATM and Myrinet under PVM
Congestion in network switches can limit the communication traffic between Parallel Virtual Machine (PVM) nodes in a parallel computation. The research introduces a new benchmark to evaluate the performance of PVM in various networking environments. The benchmark is used to achieve a better understanding of performance limitations in parallel computing that are imposed by the choice of the network. The networks considered here are Fast Ethernet, Asynchronous Transfer Mode (ATM) OC-3c (155Mb/s) and Myrinet. Together, they represent an interesting range of alternatives for parallel cluster computing. A characterization of network delays and throughput and a comparison of the expected costs of the three environments are developed to provide a basis for an informed decision on the networking methods and topology for a parallel database that is being considered for FBI\u27s National DNA Indexing System (NDIS)[17]. This network is used for communications among the nodes of the parallel machine; thus the security requirements defined for the FBI\u27s Criminal Justice Information Services Division Wide Area Network (CJIS-WAN) [12] are not a concern
Cluster Computing in the Classroom: Topics, Guidelines, and Experiences
With the progress of research on cluster computing, more and more universities have begun to offer various courses covering cluster computing. A wide variety of content can be taught in these courses. Because of this, a difficulty that arises is the selection of appropriate course material. The selection is complicated by the fact that some content in cluster computing is also covered by other courses such as operating systems, networking, or computer architecture. In addition, the background of students enrolled in cluster computing courses varies. These aspects of cluster computing make the development of good course material difficult. Combining our experiences in teaching cluster computing in several universities in the USA and Australia and conducting tutorials at many international conferences all over the world, we present prospective topics in cluster computing along with a wide variety of information sources (books, software, and materials on the web) from which instructors can choose. The course material described includes system architecture, parallel programming, algorithms, and applications. Instructors are advised to choose selected units in each of the topical areas and develop their own syllabus to meet course objectives. For example, a full course can be taught on system architecture for core computer science students. Or, a course on parallel programming could contain a brief coverage of system architecture and then devote the majority of time to programming methods. Other combinations are also possible. We share our experiences in teaching cluster computing and the topics we have chosen depending on course objectives
Recommended from our members
Intelligent multimedia communication for enhanced medical e-collaboration in back pain treatment
This is the post-print version of the Article. The official published version can be accessed from the link below - Copyright @ 2004 SAGE PublicationsRemote, multimedia-based, collaboration in back pain treatment is an option which only recently has come to the attention of clinicians and IT providers. The take-up of such applications will inevitably depend on their ability to produce an acceptable level of service over congested and unreliable public networks. However, although the problem of multimedia application-level performance is closely linked to both the user perspective of the experience as well as to the service provided by the underlying network, it is rarely studied from an integrated viewpoint. To alleviate this problem, we propose an intelligent mechanism that integrates user-related requirements with the more technical characterization of quality of service, obtaining a priority order of low-level quality of service parameters, which would ensure that user-centred quality of perception is maintained at an optimum level. We show how our framework is capable of suggesting appropriately tailored transmission protocols, by incorporating user requirements in the remote delivery of e-health solutions
Design and Evaluation of Low-Latency Communication Middleware on High Performance Computing Systems
[Resumen]El interés en Java para computación paralela está motivado por sus interesantes
características, tales como su soporte multithread, portabilidad, facilidad de aprendizaje,alta productividad y el aumento significativo en su rendimiento omputacional.
No obstante, las aplicaciones paralelas en Java carecen generalmente de mecanismos
de comunicación eficientes, los cuales utilizan a menudo protocolos basados
en sockets incapaces de obtener el máximo provecho de las redes de baja latencia,
obstaculizando la adopción de Java en computación de altas prestaciones (High Per-
formance Computing, HPC). Esta Tesis Doctoral presenta el diseño, implementación
y evaluación de soluciones de comunicación en Java que superan esta limitación. En
consecuencia, se desarrollaron múltiples dispositivos de comunicación a bajo nivel
para paso de mensajes en Java (Message-Passing in Java, MPJ) que aprovechan al
máximo el hardware de red subyacente mediante operaciones de acceso directo a memoria remota que proporcionan comunicaciones de baja latencia. También se incluye una biblioteca de paso de mensajes en Java totalmente funcional, FastMPJ, en la
cual se integraron los dispositivos de comunicación. La evaluación experimental ha
mostrado que las primitivas de comunicación de FastMPJ son competitivas en comparación con bibliotecas nativas, aumentando significativamente la escalabilidad de
aplicaciones MPJ. Por otro lado, esta Tesis analiza el potencial de la computación en
la nube (cloud computing) para HPC, donde el modelo de distribución de infraestructura
como servicio (Infrastructure as a Service, IaaS) emerge como una alternativa
viable a los sistemas HPC tradicionales. La evaluación del rendimiento de recursos
cloud específicos para HPC del proveedor líder, Amazon EC2, ha puesto de manifiesto el impacto significativo que la virtualización impone en la red, impidiendo
mover las aplicaciones intensivas en comunicaciones a la nube. La clave reside en un soporte de virtualización apropiado, como el acceso directo al hardware de red, junto
con las directrices para la optimización del rendimiento sugeridas en esta Tesis.[Resumo]O interese en Java para computación paralela está motivado polas súas interesantes características, tales como o seu apoio multithread, portabilidade, facilidade de aprendizaxe, alta produtividade e o aumento signi cativo no seu rendemento computacional. No entanto, as aplicacións paralelas en Java carecen xeralmente de mecanismos de comunicación e cientes, os cales adoitan usar protocolos baseados en sockets que son incapaces de obter o máximo proveito das redes de baixa latencia, obstaculizando a adopción de Java na computación de altas prestacións (High
Performance Computing, HPC). Esta Tese de Doutoramento presenta o deseño, implementaci
ón e avaliación de solucións de comunicación en Java que superan esta limitación. En consecuencia, desenvolvéronse múltiples dispositivos de comunicación a baixo nivel para paso de mensaxes en Java (Message-Passing in Java, MPJ) que aproveitan ao máaximo o hardware de rede subxacente mediante operacións de acceso
directo a memoria remota que proporcionan comunicacións de baixa latencia.
Tamén se inclúe unha biblioteca de paso de mensaxes en Java totalmente funcional,
FastMPJ, na cal foron integrados os dispositivos de comunicación. A avaliación experimental amosou que as primitivas de comunicación de FastMPJ son competitivas
en comparación con bibliotecas nativas, aumentando signi cativamente a escalabilidade
de aplicacións MPJ. Por outra banda, esta Tese analiza o potencial da computación na nube (cloud computing) para HPC, onde o modelo de distribución de infraestrutura como servizo (Infrastructure as a Service, IaaS) xorde como unha alternativa viable aos sistemas HPC tradicionais. A ampla avaliación do rendemento de recursos cloud específi cos para HPC do proveedor líder, Amazon EC2, puxo de manifesto o impacto signi ficativo que a virtualización impón na rede, impedindo mover as aplicacións intensivas en comunicacións á nube. A clave atópase no soporte de virtualización apropiado, como o acceso directo ao hardware de rede, xunto coas directrices para a optimización do rendemento suxeridas nesta Tese.[Abstract]The use of Java for parallel computing is becoming more promising owing to
its appealing features, particularly its multithreading support, portability, easy-tolearn properties, high programming productivity and the noticeable improvement in its computational performance. However, parallel Java applications generally su er
from inefficient communication middleware, most of which use socket-based protocols
that are unable to take full advantage of high-speed networks, hindering the
adoption of Java in the High Performance Computing (HPC) area. This PhD Thesis
presents the design, development and evaluation of scalable Java communication
solutions that overcome these constraints. Hence, we have implemented several lowlevel
message-passing devices that fully exploit the underlying network hardware while taking advantage of Remote Direct Memory Access (RDMA) operations to provide low-latency communications. Moreover, we have developed a productionquality Java message-passing middleware, FastMPJ, in which the devices have been integrated seamlessly, thus allowing the productive development of Message-Passing in Java (MPJ) applications. The performance evaluation has shown that FastMPJ communication primitives are competitive with native message-passing libraries, improving signi cantly the scalability of MPJ applications. Furthermore, this Thesis
has analyzed the potential of cloud computing towards spreading the outreach of
HPC, where Infrastructure as a Service (IaaS) o erings have emerged as a feasible
alternative to traditional HPC systems. Several cloud resources from the leading
IaaS provider, Amazon EC2, which speci cally target HPC workloads, have been
thoroughly assessed. The experimental results have shown the signi cant impact
that virtualized environments still have on network performance, which hampers
porting communication-intensive codes to the cloud. The key is the availability of
the proper virtualization support, such as the direct access to the network hardware,
along with the guidelines for performance optimization suggested in this Thesis
Design of efficient Java communications for high performance computing
[Abstract] There is an increasing interest to adopt Java as the parallel programming language for the multi-core
era. Although Java offers important advantages, such as built-in multithreading and networking support,
productivity and portability, the lack of efficient communication middleware is an important drawback
for its uptake in High Performance Computing (HPC). This PhD Thesis presents the design, implementation
and evaluation of several solutions to improve this situation: (1) a high performance Java sockets
implementation (JFS, Java Fast Sockets) on high-speed networks (e.g., Myrinet, InfiniBand) and shared
memory (e.g., multi-core) machines; (2) a low-level messaging device, iodev, which efficiently overlaps
communication and computation; and (3) a more scalable Java message-passing library, Fast MPJ (F-MPJ).
Furthermore, new Java parallel benchmarks have been implemented and used for the performance evaluation
of the developed middleware. The final and main conclusion is that the use of Java for HPC is feasible
and even advisable when looking for productive development, provided that efficient communication
middleware is made available, such as the projects presented in this Thesis.[Resumen] La tesis doctoral "Design of Efficient Java Communications for High Performance Computing"
parte de la hipótesis inicial de que es posible desarrollar aplicaciones Java en computación
de altas prestaciones, un ámbito en el que el rendimiento es crucial, siempre que esté
disponible un middleware de comunicación eficiente. Así, se han diseñado, desarrollado y
evaluado diferentes bibliotecas de comunicación en Java, desde el nivel de sockets al de
paso de mensajes, obteniendo notables incrementos de eficiencia, confirmando que la hipótesis
inicial es factible
- …