761 research outputs found

    Lattice QCD Production on Commodity Clusters at Fermilab

    Full text link
    We describe the construction and results to date of Fermilab's three Myrinet-networked lattice QCD production clusters (an 80-node dual Pentium III cluster, a 48-node dual Xeon cluster, and a 128-node dual Xeon cluster). We examine a number of aspects of performance of the MILC lattice QCD code running on these clusters.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 6 pages, LaTeX, 8 eps figures. PSN TUIT00

    Implementation and Evaluation of Fast Packet Filters

    Get PDF
    パケットフィルタリング処理はあらゆる種類のネットワーク機器に必要な機能になってきている.ハイエンドのルータやファイアウォールであればハードウェアベースの実装も可能である.さもなくば柔軟かつ安価な実現のために汎用CPUを使ってソフトウェア的に実装されるが,その場合には処理の高速性に欠点がある.そこで本研究ではパケットフィルタプログラムにコード最適化手法,特に条件分岐を含むループのためのソフトウェア・パイプライン化手法を適用し,インテルIA-64 Itanium 2プロセッサ上での高速化を試みる.著者らはすでにパケットモニタ・ツールtcpdumpについて高速化の効果を確認している.本研究ではその手法を一部変更して適用し,商用Cコンパイラによって最適化した場合の4倍の高速化,ソフトウェア・パイプライン化を用いない最適化の2倍の高速化を達成した.今回開発した最も高速なフィルタプログラムはItanium 2プロセッサの上限性能で動作する.Packet filters are essential for most areas of recent network technologies. While high-end expensive routers and firewalls are implemented in hardwarebased, flexible and cost-effective ones are usually in software-based solutions using general-purpose CPUs but have less performance. In order to solve this performace problem, we apply code optimization techniques to packet filter implementations, in particular the software pipelining techniques for a loop with conditional branches, on Intel IA-64 Itanium 2 processor. The authors have studied the method of applying the techniques to the packet monitoring tool tcpdump and reported their high effects. Using the revised method, we can obtain a software-pipelined packet filter implemetation which is four times faster than a C compiler based one and two times faster than an optimized code without software pipelining. The fastest filter program developed in this research can execute at the maximum speed of Itanium 2 processo

    Two research contributions in 64-bit computing: Testing and Applications

    No full text
    Following the release of Windows 64-bit and Redhat Linux 64-bit operating systems (OS) in late April 2005, this is the one of the first 64-bit OS research project completed in a British university. The objective is to investigate (1) the increase/decrease in performance compared to 32-bit computing; (2) the techniques used to develop 64-bit applications; and (3) how 64-bit computing should be used in IT and research organizations to improve their work. This paper summarizes research discoveries for this investigation, including two major research contributions in (1) testing and (2) application development. The first contribution includes performance, stress, application, multiplatform, JDK and compatibility testing for AMD and Intel models. Comprehensive testing results reveal that 64-bit computing has a better performance in application performance, system performance and stress testing, but a worse performance in compatibility testing than the traditional 32-bit computing. A 64-bit dual-core processor has been tested and the results show that it performs better than a 64-bit single-core processor, but only in application that requires very high demands of CPU and memory consumption. The second contribution is .NET 1.1 64-bit implementations. Without additional troubleshooting, .NET 1.1 does not work on 64-bit Windows operating systems in stable ways. After stabilizing .NET environment, the next step is the application development, which is a dynamic repository with functions such as registration, download, login-logout, product submissions, database storage and statistical reports. The technology is based on Visual Studio .NET 2003, .NET 1.1 Framework with Service Pack 1, SQL Server 2000 with Service Pack 4 and IIS Server 6.0 on the Windows Server 2003 Enterprise x64 platform with Service Pack 1

    Impact of parameter variations on circuits and microarchitecture

    Get PDF
    Parameter variations, which are increasing along with advances in process technologies, affect both timing and power. Variability must be considered at both the circuit and microarchitectural design levels to keep pace with performance scaling and to keep power consumption within reasonable limits. This article presents an overview of the main sources of variability and surveys variation-tolerant circuit and microarchitectural approaches.Peer ReviewedPostprint (published version

    Optimistic Parallelization of Floating-Point Accumulation

    Get PDF
    Floating-point arithmetic is notoriously non-associative due to the limited precision representation which demands intermediate values be rounded to fit in the available precision. The resulting cyclic dependency in floating-point accumulation inhibits parallelization of the computation, including efficient use of pipelining. In practice, however, we observe that floating-point operations are "mostly" associative. This observation can be exploited to parallelize floating-point accumulation using a form of optimistic concurrency. In this scheme, we first compute an optimistic associative approximation to the sum and then relax the computation by iteratively propagating errors until the correct sum is obtained. We map this computation to a network of 16 statically-scheduled, pipelined, double-precision floating-point adders on the Virtex-4 LX160 (-12) device where each floating-point adder runs at 296 MHz and has a pipeline depth of 10. On this 16 PE design, we demonstrate an average speedup of 6× with randomly generated data and 3-7× with summations extracted from Conjugate Gradient benchmarks

    Survey of the Itanium architecture from a programmer's perspective

    Get PDF
    Journal ArticleThe Itanium family of processors represents Intel;s foray into the world of Explicitly Parallel Instruction Computing and 64-bit system design. This survey contains an introduction to the Itanium architecture and instruction set, as well as some of the available implementations. Taking a programmer's perspective, we have attempted to distill the relevant information from a variety of sources, including the Intel Itanium architecture documentation
    corecore