50 research outputs found

    Iso-energy-efficiency: An approach to power-constrained parallel computation

    Get PDF
    Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making

    Интеллектуальная поддержка оператора бортовой интеллектуальной системы с использованием имитационного моделирования

    Get PDF
    Обсуждается задача интеллектуальной поддержки оператора бортовой интеллектуальной системы. Особое внимание уделяется построению системы с использованием методов имитационного моделирования. Функционирование системы в режиме реального времени осуществляется на базе многопроцессорного вычислительного комплекса.Обговорюється задача інтелектуальної підтримки оператора бортової інтелектуальної системи. Особливу увагу приділено побудові системи з використанням методів імітаційного моделювання. Функціонування системи у режимі реального часу здійснюється на базі багатопроцесорного обчислювального комплексу.The task of intelligence support of the operator in onboard intellectual system is discussed. The special attention is given to construction of system with use of methods of imitating modeling. The functioning of system in a real time regime is carried out on the basis of the multiprocessor computer complex

    Understanding communication patterns in HPCG

    Get PDF
    Conjugate Gradient (CG) algorithms form a large part of many HPC applications, examples include bioinformatics and weather applications. These algorithms allow numerical solutions to complex linear systems. Understanding how distributed implementations of these algorithms use a network interconnect will allow system designers to gain a deeper insight into their exacting requirements for existing and future applications. This short paper documents our initial investigation into the communication patterns present in the High Performance Conjugate Gradient (HPCG) benchmark. Through our analysis, we identify patterns and features which may warrant further investigation to improve the performance of CG algorithms and applications which make extensive use of them. In this paper, we capture communication traces from runs of the HPCG benchmark at a variety of different processor counts and then examine this data to identify potential performance bottlenecks. Initial results show that there is a fall in the throughput of the network when more processes are communicating with each other, due to network contention

    Exploitation of Dynamic Communication Patterns through Static Analysis

    Full text link
    Abstract not provide

    In-depth Analysis On Parallel Processing Patterns for High-Performance Dataframes

    Full text link
    The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process terabytes of data. Typically, a significant amount of time is spent on data preprocessing in these pipelines, and hence improving its e fficiency directly impacts the overall pipeline performance. The community has recently embraced the concept of Dataframes as the de-facto data structure for data representation and manipulation. However, the most widely used serial Dataframes today (R, pandas) experience performance limitations while working on even moderately large data sets. We believe that there is plenty of room for improvement by taking a look at this problem from a high-performance computing point of view. In a prior publication, we presented a set of parallel processing patterns for distributed dataframe operators and the reference runtime implementation, Cylon [1]. In this paper, we are expanding on the initial concept by introducing a cost model for evaluating the said patterns. Furthermore, we evaluate the performance of Cylon on the ORNL Summit supercomputer
    corecore