378,451 research outputs found

    Lanczos eigensolution method for high-performance computers

    Get PDF
    The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors

    Advanced Architectures for Astrophysical Supercomputing

    Full text link
    Astronomers have come to rely on the increasing performance of computers to reduce, analyze, simulate and visualize their data. In this environment, faster computation can mean more science outcomes or the opening up of new parameter spaces for investigation. If we are to avoid major issues when implementing codes on advanced architectures, it is important that we have a solid understanding of our algorithms. A recent addition to the high-performance computing scene that highlights this point is the graphics processing unit (GPU). The hardware originally designed for speeding-up graphics rendering in video games is now achieving speed-ups of O(100×)O(100\times) in general-purpose computation -- performance that cannot be ignored. We are using a generalized approach, based on the analysis of astronomy algorithms, to identify the optimal problem-types and techniques for taking advantage of both current GPU hardware and future developments in computing architectures.Comment: 4 pages, 1 figure, to appear in the proceedings of ADASS XIX, Oct 4-8 2009, Sapporo, Japan (ASP Conf. Series

    Preliminary Evaluation of MapReduce for High-Performance Climate Data Analysis

    Get PDF
    MapReduce is an approach to high-performance analytics that may be useful to data intensive problems in climate research. It offers an analysis paradigm that uses clusters of computers and combines distributed storage of large data sets with parallel computation. We are particularly interested in the potential of MapReduce to speed up basic operations common to a wide range of analyses. In order to evaluate this potential, we are prototyping a series of canonical MapReduce operations over a test suite of observational and climate simulation datasets. Our initial focus has been on averaging operations over arbitrary spatial and temporal extents within Modern Era Retrospective- Analysis for Research and Applications (MERRA) data. Preliminary results suggest this approach can improve efficiencies within data intensive analytic workflows

    Grid enabled data analysis on handheld devices

    Get PDF
    The requirement for information on portable, handheld devices demands the realization of increasingly complex applications for increasingly small and ubiquitous devices. This trend promotes the migration of technologies that were originally developed for desktop computers to handheld devices. With the onset of grid computing, users of handheld devices should be able to accomplish much more complex tasks, by accessing the processing and storage resources of the grid. This paper describes the development, features, and performance aspects of a grid enabled analysis environment designed for handheld devices. We also describe some differences in the technologies required to run these applications on desktop machines and handheld devices. In addition, we propose a prototype agent-based distributed architecture for carrying out high-speed analysis of physics data on handheld devices

    Energy Saving Potential of Idle Pacman Supercomputing Nodes

    Get PDF
    To determine the energy saving potential of suspending idle supercomputing nodes without sacrificing efficiency, my research involved the setup of a compute node power usage monitoring system. This system measures how much power each node draws at its diff erent levels of operation using an automated Expect script. The script automates tasks with interactive command line interfaces, to perform the power measurement readings. Steps required for the power usage monitoring system include remotely logging into the Pacman Penguin compute cluster power distribution units (PDUs), feeding commands to the PDUs, and storing the returned data. Using a Python script the data is then parsed into a more coherent format and written to a common file format for analysis. With this system, the Arctic Region Supercomputing Center (ARSC) will be able to determine how much energy is used during diff erent levels of load intensity on the Pacman supercomputer and how much energy can be saved by suspending unnecessary nodes during levels of reduced activity. Power utilization by supercomputers is of major interest to those who design and purchase them. Since 2008, the leading source of worldwide supercomputer speed rankings has also included power consumption and power efficiency values. Because digital computers utilize electricity to perform computation, larger computers tend to utilize more energy and produce more heat. Pacman, an acronym for Pacific Area Climate Monitoring and Analysis Network, is a high performance supercomputer designed for large compute and memory intensive jobs. Pacman is composed of the following general computational nodes: • 256 four-core compute nodes containing two dual core 2.6 GHz AMD Opteron processors each • 20 twelve-core compute nodes containing two six core 2.6 GHz AMD Opteron processors each • 88 sixteen-core compute nodes containing two eight core 2.3 GHz AMD Opteron processors eac

    GraPE: fast and scalable Graph Processing and Embedding

    Full text link
    Graph Representation Learning methods have enabled a wide range of learning problems to be addressed for data that can be represented in graph form. Nevertheless, several real world problems in economy, biology, medicine and other fields raised relevant scaling problems with existing methods and their software implementation, due to the size of real world graphs characterized by millions of nodes and billions of edges. We present GraPE, a software resource for graph processing and random walk based embedding, that can scale with large and high-degree graphs and significantly speed up-computation. GraPE comprises specialized data structures, algorithms, and a fast parallel implementation that displays everal orders of magnitude improvement in empirical space and time complexity compared to state of the art software resources, with a corresponding boost in the performance of machine learning methods for edge and node label prediction and for the unsupervised analysis of graphs.GraPE is designed to run on laptop and desktop computers, as well as on high performance computing cluster
    • …
    corecore