587,387 research outputs found

    Lanczos eigensolution method for high-performance computers

    Get PDF
    The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors

    Open-architecture Implementation of Fragment Molecular Orbital Method for Peta-scale Computing

    Full text link
    We present our perspective and goals on highperformance computing for nanoscience in accordance with the global trend toward "peta-scale computing." After reviewing our results obtained through the grid-enabled version of the fragment molecular orbital method (FMO) on the grid testbed by the Japanese Grid Project, National Research Grid Initiative (NAREGI), we show that FMO is one of the best candidates for peta-scale applications by predicting its effective performance in peta-scale computers. Finally, we introduce our new project constructing a peta-scale application in an open-architecture implementation of FMO in order to realize both goals of highperformance in peta-scale computers and extendibility to multiphysics simulations.Comment: 6 pages, 9 figures, proceedings of the 2nd IEEE/ACM international workshop on high performance computing for nano-science and technology (HPCNano06

    Using A Nameserver to Enhance Control System Efficiency

    Full text link
    The Thomas Jefferson National Accelerator Facility (Jefferson Lab) control system uses a nameserver to reduce system response time and to minimize the impact of client name resolution on front-end computers. The control system is based on the Experimental Physics and Industrial Control System (EPICS), which uses name-based broadcasts to initiate data communication. By default, when EPICS process variables (PV) are requested by client applications, all front-end computers receive the broadcasts and perform name resolution processing against local channel name lists. The nameserver is used to offload the name resolution task to a single node. This processing, formerly done on all front-end computers, is now done only by the nameserver. In a control system with heavily loaded front-end computers and high peak client connection loads, a significant performance improvement is seen. This paper describes the name server in more detail, and discusses the strengths and weaknesses of making name resolution a centralized service.Comment: ICALEPCS 200

    Distributed OpenGL Rendering in Network Bandwidth Constrained Environments

    Get PDF
    Display walls made from multiple monitors are often used when very high resolution images are required. To utilise a display wall, rendering information must be sent to each computer that the monitors are connect to. The network is often the performance bottleneck for demanding applications, like high performance 3D animations. This paper introduces ClusterGL; a distribution library for OpenGL applications. ClusterGL reduces network traffic by using compression, frame differencing and multi-cast. Existing applications can use ClusterGL without recompilation. Benchmarks show that, for most applications, ClusterGL outperforms other systems that support unmodified OpenGL applications including Chromium and BroadcastGL. The difference is larger for more complex scene geometries and when there are more display machines. For example, when rendering OpenArena, ClusterGL outperforms Chromium by over 300% on the Symphony display wall at The University of Waikato, New Zealand. This display has 20 monitors supported by five computers connected by gigabit Ethernet, with a full resolution of over 35 megapixels. ClusterGL is freely available via Google Code

    Performance Portable High Performance Conjugate Gradients Benchmark

    Get PDF
    The High Performance Conjugate Gradient Benchmark (HPCG) is an international project to create a more appropriate benchmark test for the world\u27s most powerful computers. The current LINPACK benchmark, which is the standard for measuring the performance of the top 500 fastest computers in the world, is moving computers in a direction that is no longer beneficial to many important parallel applications. HPCG is designed to exercise computations and data access patterns more commonly found in applications. The reference version of HPCG exploits only some parallelism available on existing supercomputers and the main focus of this work was to create a performance portable version of HPCG that gives reasonable performance on hybrid architectures

    Enhanced Face Recognition Method Performance on Android vs Windows Platform

    Get PDF
    Android is becoming one of the most popular operating systems on smartphones, tablet computers and similar mobile devices. With the quick development in mobile device specifications, it is worthy to think about mobile devices as current or - at least - near future replacement of personal computers. This paper presents an enhanced face recognition method. The method is tested on two different platforms using Windows and Android operating systems. This is done to evaluate the method and to compare the platforms. The platforms are compared according to two factors: development simplicity and performance. The target is evaluating the possibility of replacing personal computers using Windows operating system by mobile devices using Android operating system. Face recognition has been chosen because of the relatively high computing cost of image processing and pattern recognition applications comparing with other applications. The experiment results show acceptable performance of the method on Android platform

    High-Performance Cloud Computing: A View of Scientific Applications

    Full text link
    Scientific computing often requires the availability of a massive number of computers for performing large scale experiments. Traditionally, these needs have been addressed by using high-performance computing solutions and installed facilities such as clusters and super computers, which are difficult to setup, maintain, and operate. Cloud computing provides scientists with a completely new model of utilizing the computing infrastructure. Compute resources, storage resources, as well as applications, can be dynamically provisioned (and integrated within the existing infrastructure) on a pay per use basis. These resources can be released when they are no more needed. Such services are often offered within the context of a Service Level Agreement (SLA), which ensure the desired Quality of Service (QoS). Aneka, an enterprise Cloud computing solution, harnesses the power of compute resources by relying on private and public Clouds and delivers to users the desired QoS. Its flexible and service based infrastructure supports multiple programming paradigms that make Aneka address a variety of different scenarios: from finance applications to computational science. As examples of scientific computing in the Cloud, we present a preliminary case study on using Aneka for the classification of gene expression data and the execution of fMRI brain imaging workflow.Comment: 13 pages, 9 figures, conference pape
    corecore