85 research outputs found
21st Century Simulation: Exploiting High Performance Computing and Data Analysis
This paper identifies, defines, and analyzes the limitations imposed on Modeling and Simulation by outmoded
paradigms in computer utilization and data analysis. The authors then discuss two emerging capabilities to
overcome these limitations: High Performance Parallel Computing and Advanced Data Analysis. First, parallel
computing, in supercomputers and Linux clusters, has proven effective by providing users an advantage in
computing power. This has been characterized as a ten-year lead over the use of single-processor computers.
Second, advanced data analysis techniques are both necessitated and enabled by this leap in computing power.
JFCOM's JESPP project is one of the few simulation initiatives to effectively embrace these concepts. The
challenges facing the defense analyst today have grown to include the need to consider operations among non-combatant
populations, to focus on impacts to civilian infrastructure, to differentiate combatants from non-combatants,
and to understand non-linear, asymmetric warfare. These requirements stretch both current
computational techniques and data analysis methodologies. In this paper, documented examples and potential
solutions will be advanced. The authors discuss the paths to successful implementation based on their experience.
Reviewed technologies include parallel computing, cluster computing, grid computing, data logging, OpsResearch,
database advances, data mining, evolutionary computing, genetic algorithms, and Monte Carlo sensitivity analyses.
The modeling and simulation community has significant potential to provide more opportunities for training and
analysis. Simulations must include increasingly sophisticated environments, better emulations of foes, and more
realistic civilian populations. Overcoming the implementation challenges will produce dramatically better insights,
for trainees and analysts. High Performance Parallel Computing and Advanced Data Analysis promise increased
understanding of future vulnerabilities to help avoid unneeded mission failures and unacceptable personnel losses.
The authors set forth road maps for rapid prototyping and adoption of advanced capabilities. They discuss the
beneficial impact of embracing these technologies, as well as risk mitigation required to ensure success
Evaluation of GPU Acceleration for WRF–SFIRE
WRF–SFIRE is an open source, atmospheric–wildfire model that couples the WRF model with the level set fire spread model to simulate wildfires in real time. This model has many applications and more scientific questions can be asked and answered if the model can be run faster. Nvidia has put a lot of effort into easing the barrier of entry for accelerating applications with their tools to be run on GPUs. Various physical simulations have been successfully ported to utilize GPUs and have benefited from the speed increase. In this research, we take a look at WRF-SFIRE and try to use the Nvida tools to accelerate portions of code. We were successful in offloading work to the GPU. However, the WRF-SFIRE codebase contains too many data dependencies, deeply nested function calls and I/O to effectively utilize the GPU’s resources. We look at specific examples and try to run them on a Titan V GPU. In the end, the compute intensive portions of WRF-SFIRE need to be rewritten to avoid data dependencies in order to leverage GPUs to improve the execution time
Single system image: A survey
Single system image is a computing paradigm where a number of distributed computing resources are aggregated and presented via an interface that maintains the illusion of interaction with a single system. This approach encompasses decades of research using a broad variety of techniques at varying levels of abstraction, from custom hardware and distributed hypervisors to specialized operating system kernels and user-level tools. Existing classification schemes for SSI technologies are reviewed, and an updated classification scheme is proposed. A survey of implementation techniques is provided along with relevant examples. Notable deployments are examined and insights gained from hands-on experience are summarized. Issues affecting the adoption of kernel-level SSI are identified and discussed in the context of technology adoption literature
Performance evaluation of Fast Ethernet, ATM and Myrinet under PVM
Congestion in network switches can limit the communication traffic between Parallel Virtual Machine (PVM) nodes in a parallel computation. The research introduces a new benchmark to evaluate the performance of PVM in various networking environments. The benchmark is used to achieve a better understanding of performance limitations in parallel computing that are imposed by the choice of the network. The networks considered here are Fast Ethernet, Asynchronous Transfer Mode (ATM) OC-3c (155Mb/s) and Myrinet. Together, they represent an interesting range of alternatives for parallel cluster computing. A characterization of network delays and throughput and a comparison of the expected costs of the three environments are developed to provide a basis for an informed decision on the networking methods and topology for a parallel database that is being considered for FBI\u27s National DNA Indexing System (NDIS)[17]. This network is used for communications among the nodes of the parallel machine; thus the security requirements defined for the FBI\u27s Criminal Justice Information Services Division Wide Area Network (CJIS-WAN) [12] are not a concern
Effective Computation Resilience in High Performance and Distributed Environments
The work described in this paper aims at effective computation resilience for complex simulations in high performance and distributed environments. Computation resilience is a complicated and delicate area; it deals with many types of simulation cores, many types of data on various input levels and also with many types of end-users, which have different requirements and expectations. Predictions about system and computation behaviors must be done based on deep knowledge about underlying infrastructures, and simulations' mathematical and realization backgrounds. Our conceptual framework is intended to allow independent collaborations between domain experts as end-users and providers of the computational power by taking on all of the deployment troubles arising within a given computing environment. The goal of our work is to provide a generalized approach for effective scalable usage of the computing power and to help domain-experts, so that they could concentrate more intensive on their domain solutions without the need of investing efforts in learning and adapting to the new IT backbone technologies
ASCR/HEP Exascale Requirements Review Report
This draft report summarizes and details the findings, results, and
recommendations derived from the ASCR/HEP Exascale Requirements Review meeting
held in June, 2015. The main conclusions are as follows. 1) Larger, more
capable computing and data facilities are needed to support HEP science goals
in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of
the demand at the 2025 timescale is at least two orders of magnitude -- and in
some cases greater -- than that available currently. 2) The growth rate of data
produced by simulations is overwhelming the current ability, of both facilities
and researchers, to store and analyze it. Additional resources and new
techniques for data analysis are urgently needed. 3) Data rates and volumes
from HEP experimental facilities are also straining the ability to store and
analyze large and complex data volumes. Appropriately configured
leadership-class facilities can play a transformational role in enabling
scientific discovery from these datasets. 4) A close integration of HPC
simulation and data analysis will aid greatly in interpreting results from HEP
experiments. Such an integration will minimize data movement and facilitate
interdependent workflows. 5) Long-range planning between HEP and ASCR will be
required to meet HEP's research needs. To best use ASCR HPC resources the
experimental HEP program needs a) an established long-term plan for access to
ASCR computational and data resources, b) an ability to map workflows onto HPC
resources, c) the ability for ASCR facilities to accommodate workflows run by
collaborations that can have thousands of individual members, d) to transition
codes to the next-generation HPC platforms that will be available at ASCR
facilities, e) to build up and train a workforce capable of developing and
using simulations and analysis to support HEP scientific research on
next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio
Recommended from our members
Heterogeneous Cloud Systems Based on Broadband Embedded Computing
Computing systems continue to evolve from homogeneous systems of commodity-based servers within a single data-center towards modern Cloud systems that consist of numerous data-center clusters virtualized at the infrastructure and application layers to provide scalable, cost-effective and elastic services to devices connected over the Internet. There is an emerging trend towards heterogeneous Cloud systems driven from growth in wired as well as wireless devices that incorporate the potential of millions, and soon billions, of embedded devices enabling new forms of computation and service delivery. Service providers such as broadband cable operators continue to contribute towards this expansion with growing Cloud system infrastructures combined with deployments of increasingly powerful embedded devices across broadband networks. Broadband networks enable access to service provider Cloud data-centers and the Internet from numerous devices. These include home computers, smart-phones, tablets, game-consoles, sensor-networks, and set-top box devices. With these trends in mind, I propose the concept of broadband embedded computing as the utilization of a broadband network of embedded devices for collective computation in conjunction with centralized Cloud infrastructures. I claim that this form of distributed computing results in a new class of heterogeneous Cloud systems, service delivery and application enablement. To support these claims, I present a collection of research contributions in adapting distributed software platforms that include MPI and MapReduce to support simultaneous application execution across centralized data-center blade servers and resource-constrained embedded devices. Leveraging these contributions, I develop two complete prototype system implementations to demonstrate an architecture for heterogeneous Cloud systems based on broadband embedded computing. Each system is validated by executing experiments with applications taken from bioinformatics and image processing as well as communication and computational benchmarks. This vision, however, is not without challenges. The questions on how to adapt standard distributed computing paradigms such as MPI and MapReduce for implementation on potentially resource-constrained embedded devices, and how to adapt cluster computing runtime environments to enable heterogeneous process execution across millions of devices remain open-ended. This dissertation presents methods to begin addressing these open-ended questions through the development and testing of both experimental broadband embedded computing systems and in-depth characterization of broadband network behavior. I present experimental results and comparative analysis that offer potential solutions for optimal scalability and performance for constructing broadband embedded computing systems. I also present a number of contributions enabling practical implementation of both heterogeneous Cloud systems and novel application services based on broadband embedded computing
CACHE OPTIMIZATION AND PERFORMANCE EVALUATION OF A STRUCTURED CFD CODE - GHOST
This research focuses on evaluating and enhancing the performance of an in-house, structured, 2D CFD code - GHOST, on modern commodity clusters. The basic philosophy of this work is to optimize the cache performance of the code by splitting up the grid into smaller blocks and carrying out the required calculations on these smaller blocks. This in turn leads to enhanced code performance on commodity clusters. Accordingly, this work presents a discussion along with a detailed description of two techniques: external and internal blocking, for data access optimization. These techniques have been tested on steady, unsteady, laminar, and turbulent test cases and the results are presented. The critical hardware parameters which influenced the code performance were identified. A detailed study investigating the effect of these parameters on the code performance was conducted and the results are presented. The modified version of the code was also ported to the current state-of-art architectures with successful results
Categorization And Visualization Of Parallel Programming Systems
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2005Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2005Yükesek kazanımlı programlama olarak da bilinen paralel programlama, bir problemi daha hızlı çözmek için aynı anda birden çok işlemci kullanılmasına denir. Günümüzde, ağır işlemler içeren birçok problem paralel olarak uygulanmaya çalışılmaktadır, buna örnek olarak nehir sularının simüle edilmesi, fizik veya kimya problemleri, astrolojik simülasyonlar verilebilir. Bu tezin amacı, bilimsel hesaplama veya mühendislik amaçlı kullanılan yüksek kazanımlı yazılımları tartışmaktır. Paralel programlama sistemleri ile kastedilen kütüphaneler, diller, derleyiciler, derleyici yönlendiricileri veya bunun dışında kalan, programcının paralel algoritmasını ifade edebileceği yapılardır. Yükesek kazanımlı program tasarımı için programcının dikkat etmesi gereken iki önemli nokta vardır: problemi iyi kavrayıp uygun bir çözüm önermek, doğru sisteme karar verebilmek. Doğru karar verebilmek için kullanıcının sistemler hakkında oldukça iyi bilgiye sahip olması gerekir. Bazen, birden çok yazılım ve donanımı bir arada kullanmak da gerekebilir. Bu tezde var olan paralel programlama sistemleri tanımlanır ve sınıflandırılır, bunun için güncel bildiriler esas alınmıştır. Özellikle algoritmik taslaklar ve fonsiyonel paralel programlama üzerinde durulmuştur.Ayrica güncel bilgileri depolamak ve bir kaynak yaratmak için wiki temelli bir web kaynağı oluşturulmuştur. Sistemlerin grafik gösterimini sağlayıp daha anlaşılır bir sınıflandırma yapabilmek için yeni bir sözdizimi tasarlanıp dinamik ağ çizebilecek webdot aracı ile bir araya getirilerek sistemleri temsil edecek ağı çizecek araç geliştirilmiştir. Bu sözdiziminin öğrenilmesi ve kullanılması son derece kolaydır. Son olarak iki temel paralel programlama tipi, paylaşılan bellek ve mesajlaşma, iki farklı tipte algoritma kullanılarak karşılaştırılmıştır. Programlar OpenMP ve MPI ile gerçeklenmiştir, farklı paralel makinelerde koşturulup sonuçları karşılaştırılmıştır. Paralel makineler için Almanya nın Aachen Üniversitesi nin SMP ağı ve Ulakbim in dağıtık bellekli paralel makineleri kullanılmıştır.Parallel computing, also called high-performance computing, refers to solving problems faster by using multiple processors simultaneously. Nowadays, almost every computationally-intensive problem that one could imagine is tried to be implemented in parallel. This thesis is aimed at discussing high-performance software for scientific or engineering applications. The term parallel programming systems here means libraries, languages, compiler directives or other means through which a programmer can express a parallel algorithm. To design high performance programs, there are two keys for the programmer: to understand the problem and find a solution for parallelization, and to decide on the right system for the implementation, which requires a good knowledge about existing parallel programming systems. The programmer, after having understood the problem, has to choose between many systems, some of which are closely related, whereas others have big differences. This thesis describes and classifies existing parallel programming systems, thus bringing existing surveys up to date. It describes a wiki-based web portal for collecting information about most recent systems, which has been developed as part of the thesis. A special syntax and a visualization tool has been developed. This syntax and tool allow users to have their own categorization scheme. Fourth, it compares two major programming styles message passing and shared memory with two different algorithms in order show performance differences of these styles. Algorithms are implemented in OpenMP and MPI, performance of both programs are measured on the SMP Cluster of Aachen University, Germany and on the Beowulf Cluster of Ulakbim, Ankara.Yüksek LisansM.Sc
- …