44 research outputs found

    Performance Analysis Of Pde Based Parallel Algorithms On Different Computer Architectures

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Bilişim Enstitüsü, 2009Thesis (M.Sc.) -- İstanbul Technical University, Institute of Informatics, 2009Son yıllarda dağıtık algoritmaların farklı platformlarda kullanılabilmesi platform ve uygulama bağımsız performans analizi uygulamaları ihtiyacını arttırmıştır. Farklı donanımları ve haberleşme metodlarını destekleyen uygulamalar kullanıcılara donanım ve yazılımdan bağımsız ortak bir zemin hazırladıkları için kolaylık sağlamaktadır. Kısmi fark denklemleri hesaplamalı bilim ve mühendisliğin bir çok alanında kullanılmaktadır (ısı, dalga yayılımı gibi). Bu denklemlerin sayısal çözümü yinelemeli yöntemler kullanılarak yapılmaktadır. Problemin boyutu ve hata değerine göre çözüme ulaşmak için gereken yineleme sayısı ve buna bağlı olarak süresi değişmektedir. Kısmi fark denklemelerinin tek işlemcili bilgisayarlardaki çözümü uzun sürdüğü ve yüksek boyutlarda hafızaları yetersiz kaldığı için paralelleştirilerek birden fazla bilgisayarın işlemcisi ve hafızası kullanılarak çözülmektedir. Tezimde eliptik kısmi fark denklemlerini Gauss-Seidel ve Successive Over-Relaxation (SOR) metodlarını kullanarak çözen paralel algoritmalar kullanılmıştır. Performans analizi ve eniyilemesi kabaca üç adımdan oluşmaktadır; ölçüm, sonuçların analizi, darboğazların tespit edilip yazılımda iyileştirme yapılması. Ölçüm aşamasında programın koşarken ürettiği performans bilgisi toplanır, toplanan bu veriler görselleştirme araçları ile anlaşılır hale getirilerek yorumlanır. Yorumlama aşamasında tespit edilen dar boğazlar belirlenir ve giderilme yöntemleri araştırılır. Gerekli iyileştirmeler yapılarak program yeniden analiz edilir. Bu aşamaların her birinde farklı uygulamalar kullanılabilir fakat tez çalışmamda uygulamaları tek çatı altında toplayan TAU kullanılmıştır. TAU (Tuning and Analysis Utilities) farklı donanımları ve işletim sistemlerini destekleyerek farklı paralelleştirme metodlarını analiz edebilmektedir. Açık kaynak kodlu olan TAU diğer açık kaynak kodlu uygulamalar ile uyumlu olup birçok seviyede bütünleşme sağlanmıştır. Bu tez çalışmasında, iki farklı platformda aynı uygulamanın performans analizi yapılarak platform farkının getirdiği farklılıklar incelenmektedir. Performans analizinde bir algoritmanın eniyilemesini yapmak için genel bir kural olmadığından her algoritma her platformda incelenerek gerekli değişiklikler yapılmalıdır. Bu bağlamda kullandığım PDE algoritmasının her iki sistemdeki analizi sonucu elde edilen bilgiler yorumlanmıştır.In last two decades, use of parallel algorithms on different architectures increased the need of architecture and application independent performance analysis tools. Tools that support different communication methods and hardware prepare a common ground regardless of equipments provided. Partial differential equations (PDE) are used in several applications (such as propagation of heat, wave) in computational science and engineering. These equations can be solved using iterative numerical methods. Problem size and error tolerance effects iteration count and computation time to solve equation. PDE computations take long time using single processor computers with sequential algorithms, and if data size gets bigger single processors memory may be insufficient. Thus, PDE?s are solved using parallel algorithms on multiple processors. In this thesis, elliptic partial differential equation is solved using Gauss-Seidel and Successive Over-Relaxation (SOR) methods parallel algorithms. Performance analysis and optimization basically has three steps; evaluation, analysis of gathered information, defining and optimizing bottlenecks. In evaluation, performance information is gathered while program runs, then observations are made on gathered information by using visualization tools. Bottlenecks are defined and optimization techniques are researched. Necessary improvements are made to analyze the program again. Different applications in each of these stages can be used but in this thesis TAU is used, which collects these applications under one roof. TAU (Tuning and Analysis Utilities) supports many hardware, operating systems and parallelization methods. TAU is an open source application and collaborates with other open source applications at different levels. In this thesis, differences based on performance analysis of an algorithm in different two architectures are investigated. In performance analysis and optimization there is no golden rule to speed up algorithm. Each algorithm must be analyzed on that specific architecture. In this context, the performance analysis of a PDE algorithm on two architectures has been interpreted.Yüksek LisansM.Sc

    Comparison of leading parallel NAS file systems on commodity hardware

    Full text link

    Hot-Spot Avoidance With Multi-Pathing Over Infiniband: An MPI Perspective

    Get PDF
    Large scale InfiniBand clusters are becoming increasingly popular, as reflected by the TOP 500 Supercomputer rankings. At the same time, fat tree has become a popular interconnection topology for these clusters, since it allows multiple paths to be available in between a pair of nodes. However, even with fat tree, hot-spots may occur in the network depending upon the route configuration between end nodes and communication pattern(s) in the application. To make matters worse, the deterministic routing nature of InfiniBand limits the application from effective use of multiple paths transparently and avoid the hot-spots in the network. Simulation based studies for switches and adapters to implement congestion control have been proposed in the literature. However, these studies have focused on providing congestion control for the communication path, and not on utilizing multiple paths in the network for hot-spot avoidance. In this paper, we design an MPI functionality, which provides hot-spot avoidance for different communications, without a priori knowledge of the pattern. We leverage LMC (LID Mask Count) mechanism of InfiniBand to create multiple paths in the network and present the design issues (scheduling policies, selecting number of paths, scalability aspects) of our design. We implement our design and evaluate it with Pallas collective communication and MPI applications. On an InfiniBand cluster with 48 processes, collective operations like MPI All-to-all Personalized and MPI Reduce Scatter show an improvement of 27% and 19% respectively. Our evaluation with MPI applications like NAS Parallel Benchmarks and PSTSWM on 64 processes shows significant improvement in execution time with this functionality

    Performance-aware energy optimizations in networks for HPC

    Get PDF
    Energy efficiency is an important challenge in the field of High Performance Computing (HPC). High energy requirements not only limit the potential to realize next-generation machines but are also an increasing part of the total cost of ownership of an HPC system. While at large HPC systems are becoming increasingly energy proportional in an effort to reduce energy costs, interconnect links stand out for their inefficiency. Commodity interconnect links remain ¿always-on¿, consuming full power even when no data is being transmitted. Although various techniques have been proposed towards energy- proportional interconnects, they are often too conservative or are not focused toward HPC. Aggressive techniques for interconnect energy savings are often not applied to HPC, in particular, because they may incur excessive performance overheads. Any energy-saving technique will only be adopted in HPC if there is no significant impact on performance, which is still the primary design objective. This thesis explores interconnect energy proportionality from a performance perspective. In this thesis, first a characterization of HPC applications is presented, making a case for the enormous potential for interconnect energy proportionality with HPC applications. Next, an HPC interconnect with on/off based links, modeled after the IEEE Energy Efficient Ethernet protocol, is evaluated. This evaluation while presenting a relationship between performance impact and energy over HPC applications also emphasizes the need for performance focused designs in energy efficient interconnects. Next, an adaptive mechanism, PerfBound, is presented that saves link energy subject to a bound on application performance overheads. Finally this evaluation structure is applied into an intermediate link power state, in addition to the traditional on and off states. Results of this study, over 15 production HPC applications show that, compared to current day always-on HPC interconnects, link energy can be reduced by unto 70%, while application performance overhead is bounded to only 1%.La eficiencia energética es un gran reto en el área de la Supercomputación (HPC), las grandes necesidades de energía no solo limitan el potencial de las computadoras de nueva generación, sino que también aumentan el coste de funcionamiento de estos sistemas. Mientras que los sistemas HPC tienden a ser cada vez más energéticamente proporcionales en un empeño por reducir costes, los enlaces de interconexión siguen siendo muy ineficientes. Los enlaces de interconexión comunes funcionan en modo "always-on", es decir, consumiendo energía incluso cuando no transmiten. Aunque se han propuesto algunas técnicas que ayuden a la proporcionalidad energética de los enlaces de interconexión, éstas han sido muy agresivas o poco enfocadas hacia su uso con sistemas HPC. Las técnicas de ahorro energético para los enlaces más agresivas no suelen ser utilizadas en HPC, particularmente porque degradan excesivamente el rendimiento. Cualquier técnica de ahorro energético solo será adoptada en sistemas HPC si no hay un impacto excesivo en el rendimiento, el cual es el principal objetivo de estos sistemas. En esta tesis, primeramente se presenta una nueva caracterización de aplicaciones HPC, remarcando el enorme potencial de la proporcionalidad en los enlaces de interconexión proporcionales para aplicaciones HPC. Seguidamente, se evaluará siguiendo el protocolo "IEEE Energy Efficient Ethernet" un link de interconexión on/off. Esta evaluación presentará una relación de impacto energético y rendimiento en aplicaciones HPC, enfatizando en la necesidad de usar un enlace de interconexión enfocados a la eficiencia. Se continuará con la presentación de un mecanismo adaptivo, PerfBound, que ahorra energía respetando unos límites máximos de impacto en el rendimiento. Finalmente, esta estructura es aplicada a un nuevo estado intermedio de funcionamiento adicional a los estados tradicionales on/off. Los resultados de este estudio, muestran que en más de 15 aplicaciones HPC la energía en los enlaces puede ser reducida en un 70% en comparación con enlaces "always-on", mientras que el impacto en el rendimiento es de tan solo un 1%.Postprint (published version

    A Framework for Cyber Vulnerability Assessments of InfiniBand Networks

    Get PDF
    InfiniBand is a popular Input/Output interconnect technology used in High Performance Computing clusters. It is employed in over a quarter of the world’s 500 fastest computer systems. Although it was created to provide extremely low network latency with a high Quality of Service, the cybersecurity aspects of InfiniBand have yet to be thoroughly investigated. The InfiniBand Architecture was designed as a data center technology, logically separated from the Internet, so defensive mechanisms such as packet encryption were not implemented. Cyber communities do not appear to have taken an interest in InfiniBand, but that is likely to change as attackers branch out from traditional computing devices. This thesis considers the security implications of InfiniBand features and constructs a framework for conducting Cyber Vulnerability Assessments. Several attack primitives are tested and analyzed. Finally, new cyber tools and security devices for InfiniBand are proposed, and changes to existing products are recommended
    corecore