1,503 research outputs found

    HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

    Full text link
    High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR

    Power efficient job scheduling by predicting the impact of processor manufacturing variability

    Get PDF
    Modern CPUs suffer from performance and power consumption variability due to the manufacturing process. As a result, systems that do not consider such variability caused by manufacturing issues lead to performance degradations and wasted power. In order to avoid such negative impact, users and system administrators must actively counteract any manufacturing variability. In this work we show that parallel systems benefit from taking into account the consequences of manufacturing variability when making scheduling decisions at the job scheduler level. We also show that it is possible to predict the impact of this variability on specific applications by using variability-aware power prediction models. Based on these power models, we propose two job scheduling policies that consider the effects of manufacturing variability for each application and that ensure that power consumption stays under a system-wide power budget. We evaluate our policies under different power budgets and traffic scenarios, consisting of both single- and multi-node parallel applications, utilizing up to 4096 cores in total. We demonstrate that they decrease job turnaround time, compared to contemporary scheduling policies used on production clusters, up to 31% while saving up to 5.5% energy.Postprint (author's final draft

    A methodology for full-system power modeling in heterogeneous data centers

    Get PDF
    The need for energy-awareness in current data centers has encouraged the use of power modeling to estimate their power consumption. However, existing models present noticeable limitations, which make them application-dependent, platform-dependent, inaccurate, or computationally complex. In this paper, we propose a platform-and application-agnostic methodology for full-system power modeling in heterogeneous data centers that overcomes those limitations. It derives a single model per platform, which works with high accuracy for heterogeneous applications with different patterns of resource usage and energy consumption, by systematically selecting a minimum set of resource usage indicators and extracting complex relations among them that capture the impact on energy consumption of all the resources in the system. We demonstrate our methodology by generating power models for heterogeneous platforms with very different power consumption profiles. Our validation experiments with real Cloud applications show that such models provide high accuracy (around 5% of average estimation error).This work is supported by the Spanish Ministry of Economy and Competitiveness under contract TIN2015-65316-P, by the Gener- alitat de Catalunya under contract 2014-SGR-1051, and by the European Commission under FP7-SMARTCITIES-2013 contract 608679 (RenewIT) and FP7-ICT-2013-10 contracts 610874 (AS- CETiC) and 610456 (EuroServer).Peer ReviewedPostprint (author's final draft

    Power Bounded Computing on Current & Emerging HPC Systems

    Get PDF
    Power has become a critical constraint for the evolution of large scale High Performance Computing (HPC) systems and commercial data centers. This constraint spans almost every level of computing technologies, from IC chips all the way up to data centers due to physical, technical, and economic reasons. To cope with this reality, it is necessary to understand how available or permissible power impacts the design and performance of emergent computer systems. For this reason, we propose power bounded computing and corresponding technologies to optimize performance on HPC systems with limited power budgets. We have multiple research objectives in this dissertation. They center on the understanding of the interaction between performance, power bounds, and a hierarchical power management strategy. First, we develop heuristics and application aware power allocation methods to improve application performance on a single node. Second, we develop algorithms to coordinate power across nodes and components based on application characteristic and power budget on a cluster. Third, we investigate performance interference induced by hardware and power contentions, and propose a contention aware job scheduling to maximize system throughput under given power budgets for node sharing system. Fourth, we extend to GPU-accelerated systems and workloads and develop an online dynamic performance & power approach to meet both performance requirement and power efficiency. Power bounded computing improves performance scalability and power efficiency and decreases operation costs of HPC systems and data centers. This dissertation opens up several new ways for research in power bounded computing to address the power challenges in HPC systems. The proposed power and resource management techniques provide new directions and guidelines to green exscale computing and other computing systems

    Many-Task Computing and Blue Waters

    Full text link
    This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

    Exploring Scheduling for On-demand File Systems and Data Management within HPC Environments

    Get PDF

    ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ… ํ™˜๊ฒฝ๊ธฐ๋ฐ˜์—์„œ ์ˆ˜์น˜ ๋ชจ๋ธ๋ง๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹์„ ํ†ตํ•œ ์ง€๊ตฌ๊ณผํ•™ ์ž๋ฃŒ์ƒ์„ฑ์— ๊ด€ํ•œ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ์ง€๊ตฌํ™˜๊ฒฝ๊ณผํ•™๋ถ€, 2022. 8. ์กฐ์–‘๊ธฐ.To investigate changes and phenomena on Earth, many scientists use high-resolution-model results based on numerical models or develop and utilize machine learning-based prediction models with observed data. As information technology advances, there is a need for a practical methodology for generating local and global high-resolution numerical modeling and machine learning-based earth science data. This study recommends data generation and processing using high-resolution numerical models of earth science and machine learning-based prediction models in a cloud environment. To verify the reproducibility and portability of high-resolution numerical ocean model implementation on cloud computing, I simulated and analyzed the performance of a numerical ocean model at various resolutions in the model domain, including the Northwest Pacific Ocean, the East Sea, and the Yellow Sea. With the containerization method, it was possible to respond to changes in various infrastructure environments and achieve computational reproducibility effectively. The data augmentation of subsurface temperature data was performed using generative models to prepare large datasets for model training to predict the vertical temperature distribution in the ocean. To train the prediction model, data augmentation was performed using a generative model for observed data that is relatively insufficient compared to satellite dataset. In addition to observation data, HYCOM datasets were used for performance comparison, and the data distribution of augmented data was similar to the input data distribution. The ensemble method, which combines stand-alone predictive models, improved the performance of the predictive model compared to that of the model based on the existing observed data. Large amounts of computational resources were required for data synthesis, and the synthesis was performed in a cloud-based graphics processing unit environment. High-resolution numerical ocean model simulation, predictive model development, and the data generation method can improve predictive capabilities in the field of ocean science. The numerical modeling and generative models based on cloud computing used in this study can be broadly applied to various fields of earth science.์ง€๊ตฌ์˜ ๋ณ€ํ™”์™€ ํ˜„์ƒ์„ ์—ฐ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ๋งŽ์€ ๊ณผํ•™์ž๋“ค์€ ์ˆ˜์น˜ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๊ณ ํ•ด์ƒ๋„ ๋ชจ๋ธ ๊ฒฐ๊ณผ๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ๊ด€์ธก๋œ ๋ฐ์ดํ„ฐ๋กœ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์˜ˆ์ธก ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•˜๊ณ  ํ™œ์šฉํ•œ๋‹ค. ์ •๋ณด๊ธฐ์ˆ ์ด ๋ฐœ์ „ํ•จ์— ๋”ฐ๋ผ ์ง€์—ญ ๋ฐ ์ „ ์ง€๊ตฌ์ ์ธ ๊ณ ํ•ด์ƒ๋„ ์ˆ˜์น˜ ๋ชจ๋ธ๋ง๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ง€๊ตฌ๊ณผํ•™ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์„ ์œ„ํ•œ ์‹ค์šฉ์ ์ธ ๋ฐฉ๋ฒ•๋ก ์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ง€๊ตฌ๊ณผํ•™์˜ ๊ณ ํ•ด์ƒ๋„ ์ˆ˜์น˜ ๋ชจ๋ธ๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์˜ˆ์ธก ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋ฐ ์ฒ˜๋ฆฌ๊ฐ€ ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์—์„œ ํšจ๊ณผ์ ์œผ๋กœ ๊ตฌํ˜„๋  ์ˆ˜ ์žˆ์Œ์„ ์ œ์•ˆํ•œ๋‹ค. ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ…์—์„œ ๊ณ ํ•ด์ƒ๋„ ์ˆ˜์น˜ ํ•ด์–‘ ๋ชจ๋ธ ๊ตฌํ˜„์˜ ์žฌํ˜„์„ฑ๊ณผ ์ด์‹์„ฑ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ๋ถ์„œํƒœํ‰์–‘, ๋™ํ•ด, ํ™ฉํ•ด ๋“ฑ ๋ชจ๋ธ ์˜์—ญ์˜ ๋‹ค์–‘ํ•œ ํ•ด์ƒ๋„์—์„œ ์ˆ˜์น˜ ํ•ด์–‘ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๊ณ  ๋ถ„์„ํ•˜์˜€๋‹ค. ์ปจํ…Œ์ด๋„ˆํ™” ๋ฐฉ์‹์„ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ์ธํ”„๋ผ ํ™˜๊ฒฝ ๋ณ€ํ™”์— ๋Œ€์‘ํ•˜๊ณ  ๊ณ„์‚ฐ ์žฌํ˜„์„ฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์˜ ์ ์šฉ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ์ƒ์„ฑ ๋ชจ๋ธ์„ ์ด์šฉํ•œ ํ‘œ์ธต ์ดํ•˜ ์˜จ๋„ ๋ฐ์ดํ„ฐ์˜ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์‹คํ–‰ํ•˜์—ฌ ํ•ด์–‘์˜ ์ˆ˜์ง ์˜จ๋„ ๋ถ„ํฌ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ ํ›ˆ๋ จ์„ ์œ„ํ•œ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ค€๋น„ํ–ˆ๋‹ค. ์˜ˆ์ธก๋ชจ๋ธ ํ›ˆ๋ จ์„ ์œ„ํ•ด ์œ„์„ฑ ๋ฐ์ดํ„ฐ์— ๋น„ํ•ด ์ƒ๋Œ€์ ์œผ๋กœ ๋ถ€์กฑํ•œ ๊ด€์ธก ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์ƒ์„ฑ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ฑ๋Šฅ ๋น„๊ต์—๋Š” ๊ด€์ธก ๋ฐ์ดํ„ฐ ์™ธ์—๋„ HYCOM ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ์ฆ๊ฐ• ๋ฐ์ดํ„ฐ์˜ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์™€ ์œ ์‚ฌํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋…๋ฆฝํ˜• ์˜ˆ์ธก ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•œ ์•™์ƒ๋ธ” ๋ฐฉ์‹์€ ๊ธฐ์กด ๊ด€์ธก ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ์˜ˆ์ธก ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์— ๋น„ํ•ด ํ–ฅ์ƒ๋˜์—ˆ๋‹ค. ๋ฐ์ดํ„ฐํ•ฉ์„ฑ์„ ์œ„ํ•ด ๋งŽ์€ ์–‘์˜ ๊ณ„์‚ฐ ์ž์›์ด ํ•„์š”ํ–ˆ์œผ๋ฉฐ, ๋ฐ์ดํ„ฐ ํ•ฉ์„ฑ์€ ํด๋ผ์šฐ๋“œ ๊ธฐ๋ฐ˜ GPU ํ™˜๊ฒฝ์—์„œ ์ˆ˜ํ–‰๋˜์—ˆ๋‹ค. ๊ณ ํ•ด์ƒ๋„ ์ˆ˜์น˜ ํ•ด์–‘ ๋ชจ๋ธ ์‹œ๋ฎฌ๋ ˆ์ด์…˜, ์˜ˆ์ธก ๋ชจ๋ธ ๊ฐœ๋ฐœ, ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋ฐฉ๋ฒ•์€ ํ•ด์–‘ ๊ณผํ•™ ๋ถ„์•ผ์—์„œ ์˜ˆ์ธก ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ ์‚ฌ์šฉ๋œ ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ… ๊ธฐ๋ฐ˜์˜ ์ˆ˜์น˜ ๋ชจ๋ธ๋ง ๋ฐ ์ƒ์„ฑ ๋ชจ๋ธ์€ ์ง€๊ตฌ ๊ณผํ•™์˜ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์— ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์ ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค.1. General Introduction 1 2. Performance of numerical ocean modeling on cloud computing 6 2.1. Introduction 6 2.2. Cloud Computing 9 2.2.1. Cloud computing overview 9 2.2.2. Commercial cloud computing services 12 2.3. Numerical model for performance analysis of commercial clouds 15 2.3.1. High Performance Linpack Benchmark 15 2.3.2. Benchmark Sustainable Memory Bandwidth and Memory Latency 16 2.3.3. Numerical Ocean Model 16 2.3.4. Deployment of Numerical Ocean Model and Benchmark Packages on Cloud Clusters 19 2.4. Simulation results 21 2.4.1. Benchmark simulation 21 2.4.2. Ocean model simulation 24 2.5. Analysis of ROMS performance on commercial clouds 26 2.5.1. Performance of ROMS according to H/W resources 26 2.5.2. Performance of ROMS according to grid size 34 2.6. Summary 41 3. Reproducibility of numerical ocean model on the cloud computing 44 3.1. Introduction 44 3.2. Containerization of numerical ocean model 47 3.2.1. Container virtualization 47 3.2.2. Container-based architecture for HPC 49 3.2.3. Container-based architecture for hybrid cloud 53 3.3. Materials and Methods 55 3.3.1. Comparison of traditional and container based HPC cluster workflows 55 3.3.2. Model domain and datasets for numerical simulation 57 3.3.3. Building the container image and registration in the repository 59 3.3.4. Configuring a numeric model execution cluster 64 3.4. Results and Discussion 74 3.4.1. Reproducibility 74 3.4.2. Portability and Performance 76 3.5. Conclusions 81 4. Generative models for the prediction of ocean temperature profile 84 4.1. Introduction 84 4.2. Materials and Methods 87 4.2.1. Model domain and datasets for predicting the subsurface temperature 87 4.2.2. Model architecture for predicting the subsurface temperature 90 4.2.3. Neural network generative models 91 4.2.4. Prediction Models 97 4.2.5. Accuracy 103 4.3. Results and Discussion 104 4.3.1. Data Generation 104 4.3.2. Ensemble Prediction 109 4.3.3. Limitations of this study and future works 111 4.4. Conclusion 111 5. Summary and conclusion 114 6. References 118 7. Abstract (in Korean) 140๋ฐ•
    • โ€ฆ
    corecore