6,172 research outputs found

    CellSim: a validated modular heterogeneous multiprocessor simulator

    Get PDF
    As the number of transistors on a chip continues increasing the power consumption has become the most important constraint in processors design. Therefore, to increase performance, computer architects have decided to use multiprocessors. Moreover, recent studies have shown that heterogeneous chip multiprocessors have greater potential than homogeneous ones. We have built a modular simulator for heterogeneous multiprocessors that can be configure to model IBM's Cell Processor. The simulator has been validated against the real machine to be used as a research tool.Peer ReviewedPostprint (published version

    Design and implementation of interface units for high speed fiber optics local area networks and broadband integrated services digital networks

    Get PDF
    The design and implementation of interface units for high speed Fiber Optic Local Area Networks and Broadband Integrated Services Digital Networks are discussed. During the last years, a number of network adapters that are designed to support high speed communications have emerged. This approach to the design of a high speed network interface unit was to implement package processing functions in hardware, using VLSI technology. The VLSI hardware implementation of a buffer management unit, which is required in such architectures, is described

    High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP

    Get PDF
    This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM’s Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs

    On the design of architecture-aware algorithms for emerging applications

    Get PDF
    This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.Ph.D.Committee Chair: Bader, David; Committee Member: Hong, Bo; Committee Member: Riley, George; Committee Member: Vuduc, Richard; Committee Member: Wills, Scot

    Implementing a Parallel Matrix Factorization Library on the Cell Broadband Engine

    Get PDF

    Monte Carlo Simulations of Spin Glasses on Cell Broadband Engine

    Get PDF
    Several large-scale computational scientific problems require high-end computing systems to be solved. In the recent years, design of multi-core architectures delivers on a single chip tens or hundreds Gflops of peak computing performance, with high power dissipation efficiency, and it makes available computational power previously available only on high-end multi-processor systems. The aim of this Ph.D. thesis is to study the capability of multi-core processors for scientific programming, analyzing sustained performance, issues related to multicore programming, data distribution, synchronization, in order to define a set of guideline rules to optimize scientific applications for this class of architectures. As an example of multi-core processor, we consider the Cell Broadband Engine (CBE), developed by Sony, IBM and Toshiba. The CBE is one of the most powerful multi-core CPU current available, integrating eight cores and delivering a peak performance of 200 Gflops in single precision and 100 Gflops in double precision. As case of study, we analyze the performances of CBE for Monte Carlo simulations of the Edwards-Anderson Spin Glass model, a paradigm in theoretical and condensed matter physics, used to describe complex systems characterized by phase transitions (such as the para-ferro transition in magnets) or model “frustrated” dynamics. We descrive several strategies for the distribution of data set among on-chip and off-chip memories and propose analytic models to find out the balance between computational and memory access time as a function of both algorithmic and architectural parameters. We use the analytic models to set the parameters of the algorithm, like for example size of data structures and scheduling of operations, to optimize execution of Monte Carlo spin glass simulations on the CBE architecture

    TechNews digests: Jan - Nov 2005

    Get PDF
    TechNews is a technology, news and analysis service aimed at anyone in the education sector keen to stay informed about technology developments, trends and issues. TechNews focuses on emerging technologies and other technology news. TechNews service : digests september 2004 till May 2010 Analysis pieces and News combined publish every 2 to 3 month
    corecore