51 research outputs found

    Improving Scalability and Usability of Parallel Runtime Environments for High Availability and High Performance Systems

    Get PDF
    The number of processors embedded in high performance computing platforms is growing daily to solve larger and more complex problems. Hence, parallel runtime environments have to support and adapt to the underlying platforms that require scalability and fault management in more and more dynamic environments. This dissertation aims to analyze, understand and improve the state of the art mechanisms for managing highly dynamic, large scale applications. This dissertation demonstrates that the use of new scalable and fault-tolerant topologies, combined with rerouting techniques, builds parallel runtime environments, which are able to efficiently and reliably deliver sets of information to a large number of processes. Several important graph properties are provided to illustrate the theoretical capability of these topologies in terms of both scalability and fault-tolerance, such as reasonable degree, regular graph, low diameter, symmetric graph, low cost factor, low message traffic density, optimal connectivity, low fault-diameter and strongly resilient. The dissertation builds a communication framework based on these topologies to support parallel runtime environments. Such a framework can handle multiple types of messages, e.g., unicast, multicast, broadcast and all-gather. Additionally, the communication framework has been formally verified to work in both normal and failure circumstances without creating any of the common problems such as broadcast storm, deadlock and non-progress cycle

    Performance-aware energy optimizations in networks for HPC

    Get PDF
    Energy efficiency is an important challenge in the field of High Performance Computing (HPC). High energy requirements not only limit the potential to realize next-generation machines but are also an increasing part of the total cost of ownership of an HPC system. While at large HPC systems are becoming increasingly energy proportional in an effort to reduce energy costs, interconnect links stand out for their inefficiency. Commodity interconnect links remain ¿always-on¿, consuming full power even when no data is being transmitted. Although various techniques have been proposed towards energy- proportional interconnects, they are often too conservative or are not focused toward HPC. Aggressive techniques for interconnect energy savings are often not applied to HPC, in particular, because they may incur excessive performance overheads. Any energy-saving technique will only be adopted in HPC if there is no significant impact on performance, which is still the primary design objective. This thesis explores interconnect energy proportionality from a performance perspective. In this thesis, first a characterization of HPC applications is presented, making a case for the enormous potential for interconnect energy proportionality with HPC applications. Next, an HPC interconnect with on/off based links, modeled after the IEEE Energy Efficient Ethernet protocol, is evaluated. This evaluation while presenting a relationship between performance impact and energy over HPC applications also emphasizes the need for performance focused designs in energy efficient interconnects. Next, an adaptive mechanism, PerfBound, is presented that saves link energy subject to a bound on application performance overheads. Finally this evaluation structure is applied into an intermediate link power state, in addition to the traditional on and off states. Results of this study, over 15 production HPC applications show that, compared to current day always-on HPC interconnects, link energy can be reduced by unto 70%, while application performance overhead is bounded to only 1%.La eficiencia energética es un gran reto en el área de la Supercomputación (HPC), las grandes necesidades de energía no solo limitan el potencial de las computadoras de nueva generación, sino que también aumentan el coste de funcionamiento de estos sistemas. Mientras que los sistemas HPC tienden a ser cada vez más energéticamente proporcionales en un empeño por reducir costes, los enlaces de interconexión siguen siendo muy ineficientes. Los enlaces de interconexión comunes funcionan en modo "always-on", es decir, consumiendo energía incluso cuando no transmiten. Aunque se han propuesto algunas técnicas que ayuden a la proporcionalidad energética de los enlaces de interconexión, éstas han sido muy agresivas o poco enfocadas hacia su uso con sistemas HPC. Las técnicas de ahorro energético para los enlaces más agresivas no suelen ser utilizadas en HPC, particularmente porque degradan excesivamente el rendimiento. Cualquier técnica de ahorro energético solo será adoptada en sistemas HPC si no hay un impacto excesivo en el rendimiento, el cual es el principal objetivo de estos sistemas. En esta tesis, primeramente se presenta una nueva caracterización de aplicaciones HPC, remarcando el enorme potencial de la proporcionalidad en los enlaces de interconexión proporcionales para aplicaciones HPC. Seguidamente, se evaluará siguiendo el protocolo "IEEE Energy Efficient Ethernet" un link de interconexión on/off. Esta evaluación presentará una relación de impacto energético y rendimiento en aplicaciones HPC, enfatizando en la necesidad de usar un enlace de interconexión enfocados a la eficiencia. Se continuará con la presentación de un mecanismo adaptivo, PerfBound, que ahorra energía respetando unos límites máximos de impacto en el rendimiento. Finalmente, esta estructura es aplicada a un nuevo estado intermedio de funcionamiento adicional a los estados tradicionales on/off. Los resultados de este estudio, muestran que en más de 15 aplicaciones HPC la energía en los enlaces puede ser reducida en un 70% en comparación con enlaces "always-on", mientras que el impacto en el rendimiento es de tan solo un 1%.Postprint (published version

    New Foundation in the Sciences: Physics without sweeping infinities under the rug

    Get PDF
    It is widely known among the Frontiers of physics, that “sweeping under the rug” practice has been quite the norm rather than exception. In other words, the leading paradigms have strong tendency to be hailed as the only game in town. For example, renormalization group theory was hailed as cure in order to solve infinity problem in QED theory. For instance, a quote from Richard Feynman goes as follows: “What the three Nobel Prize winners did, in the words of Feynman, was to get rid of the infinities in the calculations. The infinities are still there, but now they can be skirted around . . . We have designed a method for sweeping them under the rug. [1] And Paul Dirac himself also wrote with similar tune: “Hence most physicists are very satisfied with the situation. They say: Quantum electrodynamics is a good theory, and we do not have to worry about it any more. I must say that I am very dissatisfied with the situation, because this so-called good theory does involve neglecting infinities which appear in its equations, neglecting them in an arbitrary way. This is just not sensible mathematics. Sensible mathematics involves neglecting a quantity when it turns out to be small—not neglecting it just because it is infinitely great and you do not want it!”[2] Similarly, dark matter and dark energy were elevated as plausible way to solve the crisis in prevalent Big Bang cosmology. That is why we choose a theme here: New Foundations in the Sciences, in order to emphasize the necessity to introduce a new set of approaches in the Sciences, be it Physics, Cosmology, Consciousness etc

    Discrete Event Simulations

    Get PDF
    Considered by many authors as a technique for modelling stochastic, dynamic and discretely evolving systems, this technique has gained widespread acceptance among the practitioners who want to represent and improve complex systems. Since DES is a technique applied in incredibly different areas, this book reflects many different points of view about DES, thus, all authors describe how it is understood and applied within their context of work, providing an extensive understanding of what DES is. It can be said that the name of the book itself reflects the plurality that these points of view represent. The book embraces a number of topics covering theory, methods and applications to a wide range of sectors and problem areas that have been categorised into five groups. As well as the previously explained variety of points of view concerning DES, there is one additional thing to remark about this book: its richness when talking about actual data or actual data based analysis. When most academic areas are lacking application cases, roughly the half part of the chapters included in this book deal with actual problems or at least are based on actual data. Thus, the editor firmly believes that this book will be interesting for both beginners and practitioners in the area of DES

    The Second Conference on Lunar Bases and Space Activities of the 21st Century, volume 1

    Get PDF
    These papers comprise a peer-review selection of presentations by authors from NASA, LPI industry, and academia at the Second Conference (April 1988) on Lunar Bases and Space Activities of the 21st Century, sponsored by the NASA Office of Exploration and the Lunar Planetary Institute. These papers go into more technical depth than did those published from the first NASA-sponsored symposium on the topic, held in 1984. Session topics covered by this volume include (1) design and operation of transportation systems to, in orbit around, and on the Moon, (2) lunar base site selection, (3) design, architecture, construction, and operation of lunar bases and human habitats, and (4) lunar-based scientific research and experimentation in astronomy, exobiology, and lunar geology

    The 1995 Goddard Conference on Space Applications of Artificial Intelligence and Emerging Information Technologies

    Get PDF
    This publication comprises the papers presented at the 1995 Goddard Conference on Space Applications of Artificial Intelligence and Emerging Information Technologies held at the NASA/Goddard Space Flight Center, Greenbelt, Maryland, on May 9-11, 1995. The purpose of this annual conference is to provide a forum in which current research and development directed at space applications of artificial intelligence can be presented and discussed

    Proceedings of the NASA Conference on Space Telerobotics, volume 2

    Get PDF
    These proceedings contain papers presented at the NASA Conference on Space Telerobotics held in Pasadena, January 31 to February 2, 1989. The theme of the Conference was man-machine collaboration in space. The Conference provided a forum for researchers and engineers to exchange ideas on the research and development required for application of telerobotics technology to the space systems planned for the 1990s and beyond. The Conference: (1) provided a view of current NASA telerobotic research and development; (2) stimulated technical exchange on man-machine systems, manipulator control, machine sensing, machine intelligence, concurrent computation, and system architectures; and (3) identified important unsolved problems of current interest which can be dealt with by future research

    First Annual Workshop on Space Operations Automation and Robotics (SOAR 87)

    Get PDF
    Several topics relative to automation and robotics technology are discussed. Automation of checkout, ground support, and logistics; automated software development; man-machine interfaces; neural networks; systems engineering and distributed/parallel processing architectures; and artificial intelligence/expert systems are among the topics covered
    corecore