23,375 research outputs found

    Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors

    Full text link
    This paper presents a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. Architectural diversity among different processors together with structural diversity among different sparse matrices lead to bottleneck diversity. This justifies an SpMV optimizer that is both matrix- and architecture-adaptive through runtime specialization. To this direction, we present an approach that first identifies the performance bottlenecks of SpMV for a given sparse matrix on the target platform either through profiling or by matrix property inspection, and then selects suitable optimizations to tackle those bottlenecks. Our optimization pool is based on the widely used Compressed Sparse Row (CSR) sparse matrix storage format and has low preprocessing overheads, making our overall approach practical even in cases where fast decision making and optimization setup is required. We evaluate our optimizer on three x86-based computing platforms and demonstrate that it is able to distinguish and appropriately optimize SpMV for the majority of matrices in a representative test suite, leading to significant speedups over the CSR and Inspector-Executor CSR SpMV kernels available in the latest release of the Intel MKL library.Comment: 10 pages, 7 figures, ICPP 201

    Parallelization of an object-oriented FEM dynamics code: influence of the strategies on the Speedup

    Get PDF
    This paper presents an implementation in C++ of an explicit parallel finite element code dedicated to the simulation of impacts. We first present a brief overview of the kinematics and the explicit integration scheme with details concerning some particular points. Then we present the OpenMP parallelization toolkit used in order to parallelize our FEM code, and we focus on how the parallelization of the DynELA FEM code has been conducted for a shared memory system using OpenMP. Some examples are then presented to demonstrate the efficiency and accuracy of the proposed implementations concerning the Speedup of the code. Finally, an impact simulation application is presented and results are compared with the ones obtained by the commercial Abaqus explicit FEM code

    System software for the finite element machine

    Get PDF
    The Finite Element Machine is an experimental parallel computer developed at Langley Research Center to investigate the application of concurrent processing to structural engineering analysis. This report describes system-level software which has been developed to facilitate use of the machine by applications researchers. The overall software design is outlined, and several important parallel processing issues are discussed in detail, including processor management, communication, synchronization, and input/output. Based on experience using the system, the hardware architecture and software design are critiqued, and areas for further work are suggested

    Design, development and use of the finite element machine

    Get PDF
    Some of the considerations that went into the design of the Finite Element Machine, a research asynchronous parallel computer are described. The present status of the system is also discussed along with some indication of the type of results that were obtained

    Structural Analysis of Test Flight Vehicles with Multifunctional Energy Storage

    Get PDF
    Under the NASA Aeronautics Research Mission Directorate (ARMD) Convergent Aeronautical Solutions (CAS) project, NASA Glenn Research Center has been leading Multifunctional Structures for High Energy Lightweight Load-bearing Storage (M-SHELLS) research efforts. The technology of integrating load-carrying structures with electrical energy storage capacity has the potential to reduce the overall weight of future electric aircraft. The proposed project goals were to develop M-SHELLS in the form of honeycomb coupons and subcomponents, integrate them into the structure, and conduct low-risk flight tests onboard a remotely piloted small aircraft. Experimental M-SHELLS energy-storing coupons were fabricated and tested in the laboratory for their electrical and mechanical properties. In this paper, finite element model development and structural analyses of two small test aircraft candidates are presented. The finite element analysis of the initial two-spar wing is described for strain, deflection, and weight estimation. After a test aircraft Tempest was acquired, a load- deflection test of the wing was conducted. A finite element model of the Tempest was then developed based on the test aircraft dimensions and construction detail. The component weight analysis from the finite element model and test measurements were correlated. Structural analysis results with multifunctional energy storage panels in the fuselage of the test vehicle are presented. Although the flight test was cancelled because of programmatic reasons and time constraints, the structural analysis results indicate that the mid-fuselage floor composite panel could provide structural integrity with minimal weight penalty while supplying electrical energy. To explore potential future applications of the multifunctional structure, analyses of the NASA X-57 Maxwell electric aircraft and a NASA N+3 Technology Conventional Configuration (N3CC) fuselage are presented. Secondary aluminum structure in the fuselage sub-floor and cargo area were partially replaced with reinforced five-layer composite panels with M-SHELLS honeycomb core. The N3CC fuselage weight reduction associated with each design without risking structural integrity are described. The structural analysis and weight estimation with the application of composite M-SHELLS panels to the N3CC fuselage indicate a 3.2% reduction in the fuselage structural weight, prior to accounting for the additional weight of core material required to complete the energy storage functionality

    Adaptive Partitioning for Large-Scale Dynamic Graphs

    Get PDF
    Abstract—In the last years, large-scale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notable way to achieve this goal is to partition the graph by minimizing the num-ber of edges that connect vertices assigned to different machines, while keeping the load balanced. However, real-world graphs are highly dynamic, with vertices and edges being constantly added and removed. Carefully updating the partitioning of the graph to reflect these changes is necessary to avoid the introduction of an extensive number of cut edges, which would gradually worsen computation performance. In this paper we show that performance degradation in dynamic graph processing systems can be avoided by adapting continuously the graph partitions as the graph changes. We present a novel highly scalable adaptive partitioning strategy, and show a number of refinements that make it work under the constraints of a large-scale distributed system. The partitioning strategy is based on iterative vertex migrations, relying only on local information. We have implemented the technique in a graph processing system, and we show through three real-world scenarios how adapting graph partitioning reduces execution time by over 50 % when compared to commonly used hash-partitioning. I

    Modeling transport of charged species in pore networks: solution of the Nernst-Planck equations coupled with fluid flow and charge conservation equations

    Get PDF
    A pore network modeling (PNM) framework for the simulation of transport of charged species, such as ions, in porous media is presented. It includes the Nernst-Planck (NP) equations for each charged species in the electrolytic solution in addition to a charge conservation equation which relates the species concentration to each other. Moreover, momentum and mass conservation equations are adopted and there solution allows for the calculation of the advective contribution to the transport in the NP equations. The proposed framework is developed by first deriving the numerical model equations (NMEs) corresponding to the partial differential equations (PDEs) based on several different time and space discretization schemes, which are compared to assess solutions accuracy. The derivation also considers various charge conservation scenarios, which also have pros and cons in terms of speed and accuracy. Ion transport problems in arbitrary pore networks were considered and solved using both PNM and finite element method (FEM) solvers. Comparisons showed an average deviation, in terms of ions concentration, between PNM and FEM below 5%5\% with the PNM simulations being over 104{10}^{4} times faster than the FEM ones for a medium including about 104{10}^{4} pores. The improved accuracy is achieved by utilizing more accurate discretization schemes for both the advective and migrative terms, adopted from the CFD literature. The NMEs were implemented within the open-source package OpenPNM based on the iterative Gummel algorithm with relaxation. This work presents a comprehensive approach to modeling charged species transport suitable for a wide range of applications from electrochemical devices to nanoparticle movement in the subsurface

    System Identification of Constructed Facilities: Challenges and Opportunities Across Hazards

    Get PDF
    The motivation, success and prevalence of full-scale monitoring of constructed buildings vary considerably across the hazard of concern (earthquakes, strong winds, etc.), due in part to various fiscal and life safety motivators. Yet while the challenges of successful deployment and operation of large-scale monitoring initiatives are significant, they are perhaps dwarfed by the challenges of data management, interrogation and ultimately system identification. Practical constraints on everything from sensor density to the availability of measured input has driven the development of a wide array of system identification and damage detection techniques, which in many cases become hazard-specific. In this study, the authors share their experiences in fullscale monitoring of buildings across hazards and the associated challenges of system identification. The study will conclude with a brief agenda for next generation research in the area of system identification of constructed facilities
    • …
    corecore