587,387 research outputs found
Lanczos eigensolution method for high-performance computers
The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors
Open-architecture Implementation of Fragment Molecular Orbital Method for Peta-scale Computing
We present our perspective and goals on highperformance computing for
nanoscience in accordance with the global trend toward "peta-scale computing."
After reviewing our results obtained through the grid-enabled version of the
fragment molecular orbital method (FMO) on the grid testbed by the Japanese
Grid Project, National Research Grid Initiative (NAREGI), we show that FMO is
one of the best candidates for peta-scale applications by predicting its
effective performance in peta-scale computers. Finally, we introduce our new
project constructing a peta-scale application in an open-architecture
implementation of FMO in order to realize both goals of highperformance in
peta-scale computers and extendibility to multiphysics simulations.Comment: 6 pages, 9 figures, proceedings of the 2nd IEEE/ACM international
workshop on high performance computing for nano-science and technology
(HPCNano06
Using A Nameserver to Enhance Control System Efficiency
The Thomas Jefferson National Accelerator Facility (Jefferson Lab) control
system uses a nameserver to reduce system response time and to minimize the
impact of client name resolution on front-end computers. The control system is
based on the Experimental Physics and Industrial Control System (EPICS), which
uses name-based broadcasts to initiate data communication. By default, when
EPICS process variables (PV) are requested by client applications, all
front-end computers receive the broadcasts and perform name resolution
processing against local channel name lists. The nameserver is used to offload
the name resolution task to a single node. This processing, formerly done on
all front-end computers, is now done only by the nameserver. In a control
system with heavily loaded front-end computers and high peak client connection
loads, a significant performance improvement is seen. This paper describes the
name server in more detail, and discusses the strengths and weaknesses of
making name resolution a centralized service.Comment: ICALEPCS 200
Distributed OpenGL Rendering in Network Bandwidth Constrained Environments
Display walls made from multiple monitors are often used when very high resolution images are required. To utilise a display wall, rendering information must be sent to each computer that the monitors are connect to. The network is often the performance bottleneck for demanding applications, like high performance 3D animations. This paper introduces ClusterGL; a distribution library for OpenGL applications. ClusterGL reduces network traffic by using compression, frame differencing and multi-cast. Existing applications can use ClusterGL without recompilation. Benchmarks show that, for most applications, ClusterGL outperforms other systems that support unmodified OpenGL applications including Chromium and BroadcastGL. The difference is larger for more complex scene geometries and when there are more display machines. For example, when rendering OpenArena, ClusterGL outperforms Chromium by over 300% on the Symphony display wall at The University of Waikato, New Zealand. This display has 20 monitors supported by five computers connected by gigabit Ethernet, with a full resolution of over 35 megapixels. ClusterGL is freely available via Google Code
Performance Portable High Performance Conjugate Gradients Benchmark
The High Performance Conjugate Gradient Benchmark (HPCG) is an international project to create a more appropriate benchmark test for the world\u27s most powerful computers. The current LINPACK benchmark, which is the standard for measuring the performance of the top 500 fastest computers in the world, is moving computers in a direction that is no longer beneficial to many important parallel applications. HPCG is designed to exercise computations and data access patterns more commonly found in applications. The reference version of HPCG exploits only some parallelism available on existing supercomputers and the main focus of this work was to create a performance portable version of HPCG that gives reasonable performance on hybrid architectures
Enhanced Face Recognition Method Performance on Android vs Windows Platform
Android is becoming one of the most popular operating systems on smartphones, tablet computers and similar
mobile devices. With the quick development in mobile device specifications, it is worthy to think about mobile devices as current or - at least - near future replacement of personal computers. This paper presents an enhanced face recognition method. The method is tested on two different platforms using Windows and Android operating systems. This is done to evaluate the method and to compare the platforms. The platforms are compared according to two factors: development simplicity and performance. The target is evaluating the possibility of replacing personal computers using Windows operating system by mobile devices using Android operating system. Face recognition has been chosen because of the relatively high computing cost of image processing and pattern recognition applications comparing with other applications. The experiment results show acceptable performance of the method on Android platform
High-Performance Cloud Computing: A View of Scientific Applications
Scientific computing often requires the availability of a massive number of
computers for performing large scale experiments. Traditionally, these needs
have been addressed by using high-performance computing solutions and installed
facilities such as clusters and super computers, which are difficult to setup,
maintain, and operate. Cloud computing provides scientists with a completely
new model of utilizing the computing infrastructure. Compute resources, storage
resources, as well as applications, can be dynamically provisioned (and
integrated within the existing infrastructure) on a pay per use basis. These
resources can be released when they are no more needed. Such services are often
offered within the context of a Service Level Agreement (SLA), which ensure the
desired Quality of Service (QoS). Aneka, an enterprise Cloud computing
solution, harnesses the power of compute resources by relying on private and
public Clouds and delivers to users the desired QoS. Its flexible and service
based infrastructure supports multiple programming paradigms that make Aneka
address a variety of different scenarios: from finance applications to
computational science. As examples of scientific computing in the Cloud, we
present a preliminary case study on using Aneka for the classification of gene
expression data and the execution of fMRI brain imaging workflow.Comment: 13 pages, 9 figures, conference pape
- …